Learn entry

P-Hacking and HARKing: How Statistics Get Tortured

P-hacking means trying multiple analyses until p < 0.05 emerges. HARKing (Hypothesizing After Results are Known) involves retroactively labeling findings as hypothesized. Pre-registration prevents both; the garden of forking paths quantifies flexibility.

Evaluate10 min read
How this entry is structured
Definitions first, then mechanisms, then “so what?”. If you are in a hurry, skim the headings and callouts.
Not medical advice
Educational content only. If symptoms are severe, persistent, or worrying, see a clinician.

Flexibility as Deception

John Ioannidis termed it the "garden of forking paths." Every study involves analytical choices: which confounders to adjust, outliers to exclude, subgroup definitions, outcome transformations (raw values? log-transformed?), model specifications. With enough forks, researchers can justify nearly any conclusion—not through conscious fraud, but through unconscious bias navigating flexibility.

Simmons, Losin, and Monsieur demonstrated p-hacking's power elegantly. They analyzed made-up data where no effect existed, yet achieved p < 0.05 through reasonable analytical flexibility. Researchers, given modest flexibility (which confounders to include, whether to exclude outliers, how to define subgroups), produced false-positive findings. Expanding flexibility further guaranteed false positives.

P-hacking encompasses multiple practices: optional stopping (continue collecting data until p < 0.05 emerges), selective analysis (test multiple hypotheses, report significant ones), outcome switching (measure 10 outcomes, report the 3 that are significant), covariate fishing (add different confounders until effect appears).

Microbiome studies offer fertile p-hacking ground. Researchers measure hundreds of taxa. Testing associations between each taxon and an outcome creates 1,000s of comparisons. Bonferroni correction (dividing alpha by number of comparisons) would set p < 0.00005 significance threshold, requiring enormous effects to survive. Many researchers ignore this, reporting significant taxa at p < 0.05, knowing random chance guarantees false positives.

HARKing (Hypothesizing After Results are Known) represents a related sin. Researchers analyze data, discover unexpected associations, then claim these were hypothesized originally. Kerr noted this practice in 2003, showing how easy it is to disguise post-hoc exploration as hypothesis-testing.

Example: a microbiome study finds (post-hoc) that patients with high Prevotella abundance show reduced symptoms. The researchers frame this in the introduction as if predicted, despite no prior mechanism explanation. Readers cannot distinguish true predictions from post-hoc story-telling without access to pre-registration.

Consequences of p-hacking and HARKing: (1) false positive findings pollute literature; (2) replication attempts fail, damaging credibility; (3) research resources wasted pursuing non-existent effects; (4) patients potentially harmed by adopting ineffective treatments.

Pre-registration is the solution. Before collecting data, researchers post detailed analysis plans: primary outcomes, secondary outcomes, planned statistical adjustments, subgroup analyses, decision rules for excluding outliers. Once data collection begins, deviations from the plan are flagged as exploratory. This simple transparency reveals the difference between hypothesis-testing (confirmatory) and hypothesis-generating (exploratory) analyses.

Open Science Framework (osf.io) hosts pre-registrations free and publicly. ClinicalTrials.gov accepts trial registrations. Journals increasingly require pre-registration or registration reports. Pre-registered studies show dramatically lower false-positive rates compared to non-registered studies.

Exploration isn't bad—it's necessary for discovery. But exploration should be labeled as such. Confirmatory studies pre-registering hypotheses build on exploratory findings. This two-stage approach (explore, then test) is scientifically sound.

When reading research, check: is it pre-registered? Do reported analyses match pre-registration? Are deviations explained? Pre-registered studies offer far greater confidence that findings reflect reality, not p-hacking. Non-registered studies, especially with many measured outcomes, warrant skepticism.

Garden of forking paths describes legitimate flexibility inherent in research. Pre-registration doesn't eliminate flexibility—it documents it, so readers understand where exploration occurred.

Was this entry helpful?

Sources & references

  1. Szabo Z et al. (2021) HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions, and Data Dredging and Mining as Questionable Research Practices The Journal of Clinical Psychiatry PMID: 33999541
  2. Head ML et al. (2015) The extent and consequences of p-hacking in science PLoS Biology PMID: 25768323
  3. Godoy P et al. (2013) A critical evaluation of in vitro cell culture models for high-throughput drug screening and toxicity Journal of Internal Medicine PMID: 22252140
  4. Rennert K et al. (2015) Overview of in vitro cell culture technologies and pharmaco-toxicological applications Tissue Engineering Part B Reviews PMID: 20654357
  5. Viennois E et al. (2021) The gut microbiome of laboratory mice: considerations and best practices for translational research Mammalian Genome PMID: 33689000
Editorial standards
Every entry is grounded in peer-reviewed research and reviewed for accuracy. How we write →