Why does correlation not imply causation?

Two variables can move together because one causes the other, because the relationship runs the other way, because a third confounding variable drives both, or by coincidence. A correlation alone cannot tell these apart, so it cannot establish cause.

What is a confounding variable?

A confounding variable is an outside factor that influences both variables you are studying, creating an apparent relationship between them that is not causal. Hot weather raising both ice cream sales and drowning rates is the classic example.

How do you prove causation?

Causation usually requires a plausible mechanism, correct time order, ideally a dose-response pattern, and either a randomised controlled design or strong statistical control of confounders. A significant p-value by itself is not enough.

How should I word a correlation in my dissertation?

Unless you ran an experiment that controls confounders, describe variables as associated, related, or predictive of one another rather than causal. Matching your verbs to your design avoids overclaiming.

Correlation vs Causation: What Researchers Must Know

Correlation means two variables move together; causation means one actually produces the change in the other. A correlation can exist without any causal link at all, which is why "correlation does not imply causation" is the single most repeated warning in research. Confusing the two is the fastest way to have a conclusion challenged in your viva.

Why a strong correlation can still be meaningless

A correlation coefficient close to 1 or -1 tells you the relationship is strong and consistent, not that it is causal. Ice cream sales and drowning deaths rise together every summer, but neither causes the other; hot weather drives both. That hidden third factor is a confounding variable, and it is the reason a tight statistical relationship can be entirely misleading. Recognising potential confounders in your own data is part of designing an honest analysis, which we cover in independent versus dependent variables.

The classic confounder: hot weather drives both ice cream sales and drowning rates, so the two correlate without either causing the other.

The three explanations behind any correlation

When two variables, call them A and B, are correlated, there are always several possibilities to rule out before claiming cause:

A causes B: the relationship you hope to show.
B causes A: reverse causation, often overlooked.
A third variable causes both: confounding.
Coincidence: a spurious correlation, common in large datasets with many comparisons.

Your discussion section is stronger when you name these alternatives and explain why your design makes them unlikely, rather than ignoring them.

What it takes to argue causation

Establishing cause and effect requires more than a significant p-value. Researchers generally look for a plausible mechanism, the right time order (the cause precedes the effect), a dose-response pattern, and, ideally, a randomised controlled design or strong statistical control of confounders. A well-built regression with control variables strengthens a causal argument, but it does not prove it on its own.

Spurious correlations and the multiple-testing trap

The more variable pairs you test, the more spurious correlations you will find by chance alone. If you run twenty independent correlations at the .05 threshold, you should expect about one to come back significant even when nothing is truly related. This is the multiple-testing problem, and it is why a single significant correlation pulled from a screen of dozens is unreliable on its own. The result might be real, but it might equally be the one false positive the procedure was always going to produce.

The defence is to pre-specify your hypotheses. Deciding in advance which relationships you will test, and why, keeps you from rebranding a chance finding as a discovery after the fact. When you do test many pairs for exploratory reasons, say so plainly and apply a correction such as Bonferroni, then treat anything that survives as a lead to confirm rather than a conclusion. A finding that was hypothesised, not harvested, carries far more weight in your discussion.

A checklist before you write a causal sentence

Before any sentence in your draft claims that one thing causes another, run it against four questions:

Is the time order right? The presumed cause must come before the effect, not alongside or after it.
Is there a plausible mechanism? You should be able to explain how one variable would produce the change in the other.
Have you controlled the obvious confounders? The most likely third-variable explanations should be measured and adjusted for, not waved away.
Did the design allow a causal test at all? A cross-sectional survey cannot deliver the causal claim that a randomised experiment can, however large the sample.

How this should change your wording

The safest fix is in your language. Unless you ran an experiment that controls confounders, write that two variables are associated, related, or predict one another, not that one causes the other. Examiners read causal verbs closely, and an overclaim in the abstract is an easy point to lose. Phrase the finding to match the evidence your design actually produced.

Reporting correlation results correctly

When you do report a correlation, give the coefficient, the sample size, and the p-value, and interpret the strength rather than just declaring significance. You can compute the coefficient and its p-value with the correlation coefficient calculator before writing the sentence. The exact format is in the APA rules for a correlation result. If the relationships in your study feed a larger predictive model and you are at doctoral level, the modelling falls under doctoral-level statistical support; for the full analysis of a collected dataset, see help with dissertation data analysis.

Correlation vs causation in research