Essentials Evidence: robust

How to Read Medical Studies: Design Hierarchy, Effect Size

5-minute read of How to Read Medical Studies: Effect Size, Evidence Hierarchy, p-Values.

A 5-minute version of the research-literacy piece: three concepts let you audit most health claims yourself. Study-design hierarchy, effect size, pre-specification. Everything else is detail.

BiologicalX Editorial Apr 24, 2026 3m read

You can audit most health claims yourself with three concepts: study-design hierarchy, effect size, and pre-specification. Everything else is detail. Most health journalism gets effect sizes wrong, conflates observational with interventional evidence, and treats unreplicated single trials as settled science. The smallest meaningful research literacy is enough to dismiss roughly 80% of viral health takes without reading the underlying paper.

Study-design hierarchy first. Randomized controlled trials with pre-registered endpoints sit at the top. Meta-analyses of multiple high-quality RCTs are stronger still. Below that: prospective cohorts, case-control studies, cross-sectional surveys. At the bottom: animal data, mechanistic plausibility, expert opinion. The hierarchy matters because confounders eat observational findings alive. Coffee-and-mortality studies for decades found coffee increased death rates because coffee drinkers were also smokers. Once smoking was controlled, the coffee signal flipped.

Effect size next. A statistically significant finding can be clinically trivial. Cohen's d, hazard ratios, risk differences, and number-needed-to-treat are how the literature talks about magnitude. A drug that lowers a relative risk by 30% sounds enormous; if the absolute risk goes from 2% to 1.4%, the number needed to treat over 10 years is around 170. Useful, but not the headline most coverage frames it as. Always look for the absolute number, not just the relative.

Pre-specification is the part most readers skip. Before a trial starts, the primary endpoint and the key analyses should be locked in writing. Trials that fail their primary endpoint and then report a positive secondary or subgroup finding are doing post-hoc analysis. Post-hoc findings are hypotheses, not conclusions. If you see a press release headlining a benefit that was not the primary endpoint, treat the result as preliminary regardless of how good the p-value looks.

The p-value itself is the most overweighted single number in coverage. A p-value <0.05 says only that the result is unlikely under the null hypothesis. It does not say the effect is large, real, replicable, or clinically meaningful. Two trials with identical p-values can have vastly different effect sizes and credibility. Read effect sizes before reading p-values.

Study-design hierarchy: RCT > prospective cohort > case-control > cross-sectional > mechanistic plausibility.
Effect size: look for absolute risk reduction, hazard ratio, or Cohen's d. Relative risk reductions without absolutes are misleading.
Pre-specification: primary endpoints lock the story. Secondary and subgroup findings are hypothesis-generating, not confirmatory.
P-values: binary significance is the wrong question. Magnitude and replication are the right ones.
Replication: a single positive trial is a hypothesis. Two converging RCTs are the start of a finding.

What to actually do

For any new health claim, ask three questions before reading the body. What is the study design? What is the absolute effect size? Was the primary endpoint pre-specified and met?
Read the registered protocol when you can. ClinicalTrials.gov entries lock primary endpoints in advance. If a press release talks about a different endpoint than the registered one, that is the story.
Default to skepticism on single-trial findings. A first-of-its-kind study is interesting. It is not a recommendation. Wait for replication before changing behavior on anything other than low-cost reversible levers.

Most health journalism gets effect sizes wrong because effect sizes are the part that does not fit into a headline. Building research literacy is the highest-leverage investment in this entire space, because it permanently inoculates you against the next decade of viral wellness claims. For the full breakdown of GRADE, the hierarchy of evidence, and worked examples, see the full article.