Correlations, *r*’s, are simple to interpret. If you have a correlation of 0.50 between an independent variable (IV) and a dependent variable (DV), then changing the IV by one standard deviation will result in a change of 0.50 standard deviations in the DV.

I believe a failure to understand this is at the root of the nearly universal misinterpretation of the coefficient *r²*, otherwise known as “variance explained”.

Find a correlation between any two variables like income and job satisfaction, theorize about the relationship, perform some additional tests, find relevant stuff income isn’t correlated with, build a theory, etc. etc. etc. But a time will come when someone who disagrees with your ideas comes along and says an argument along these lines

While interesting, your theoretically essential correlation of 0.30 only produces an r² of 0.09. It only explains a piddling 9% of the variance! You work barely explains anything, so there’s nothing significant about your theory.

This argument is based on a misunderstanding of *r²*. To demonstrate this, consider the case where I have this causal diagram:

If X and Y are perfectly related, the *r²* is one. As such, if you change X by one standard deviation, you change Y by the same amount. To better understand why this matters, let’s take a look at another causal diagram:

You now have ten independent variables (X₁ — X₁₀) and they each have the power to move the dependent variable Y by one tenth of a standard deviation. If you shift each X by one standard deviation, the cumulative effect on Y will be a shift of 0.10 * 10 = 1: a one standard deviation shift in Y. However, if you add up the *r²* values for each X, you get a very different interpretation — each X only has an *r²* of 0.01, so combined, they only explain 10% of the variance in Y.

The issue here is not with *r* - it’s consistently interpreted and shows that these scenarios are equivalent. The problem lies with *r²*, since it does not have a consistent meaning and can make effectively identical scenarios seem vastly different. In reality, the interpretation of *r² *depends on *r* rather than the other way around.

People who attack *r² *frequently attack variance explained when it’s mentioned in politically controversial areas. Take the high amount of variance in personality explained by genetics or in arrest rates explained by homicide perpetration rates as examples. These people clearly consider a high *r² *to be extremely important based on the attack on low values of *r²*, but they don’t seem to think the same thing when the variance explained by genetics is 80%, and thus must be… *four times as important as the environment*.

But here’s another misinterpretation: if 80% of the variance in something is explained by one factor, it’s only *twice *as important as the residual 20%. This is another case where the variance explained acts to obscure rather than enlighten because, to determine how much an explanatory factor matters, you need to know how much it can shift what it explains. To find this, you need to take its square root.

For the values 80% and 20%, their square roots are 0.8944 and 0.4472. The ratio between these two is two (2). The thing that statistically explains 80% of something is only *twice* as important as the thing that explains 20%.

Interpreting *r² *properly makes it much less usable as a bludgeon to handwave phenomena, but on the other side, it also properly contextualizes variance decompositions.

For visualizations that can help with better understanding the concept of variance explained, check out Joel Schneider’s blogpost, here.

One notion of variance explained I like:

Suppose X is a cause of Y that explains r^2 of the variance in Y. If you have someone who is z standard deviations about average in Y, then in expectation r^2 z of that is due to being above average in X. That is, the variance explained tells you how much of the Y is explained by the X.

The trouble is that people are mixing up two different questions: explanation and intervention. If you want to intervene on Y with some variable X, then it doesn't matter how well X explains the preexisting variance in Y. It just matters how big the effect of X is. But conversely, if someone wants to explain Y, then it doesn't matter how big the interventions are that it supports, it only matters how much variance it explains. It just so happens that there is a simple quadratic relationship between the effect size and the explanation validity (in the linear-Gaussian case).

Maybe this is a bit simplistic, but the way I think about it is that you use r^2 when comparing two indepedent variables to each other and you use r when comparing one independent variable to the depedent variable.