What is the difference between Pearson R and Simple Linear Regression?

In “simple linear regression” (ordinary least-squares regression with 1 variable), you fit a line

ŷ = a + b * x

in the attempt to predict the target variable y using the predictor x.

Let’s consider a simple example to illustrate how this is related to the linear correlation coefficient, a measure of how two variables are linearly related (or vary together).

x = [1.0, 1.8, 3.0, 3.7]
y = [0.5, 2.0, 3.0, 3.9]

Linear correlation between variables

The Pearson correlation coefficient is computed as:

As we can see, the correlation coefficient is just the covariance (cov) between 2 features x and y “standardized” by their standard deviations (σ), where

the standard deviation is computed as

Similarly, the covariance is computed as

In our simple example above, we get

cov(x, y) ≈ 1.3012
σ_x ≈ 1.0449
σ_y ≈ 1.2620
r = 0.9868

Simple Linear Regression

Now, for simple linear regression, we compute the slope as follows:

To show how the correlation coefficient r factors in, let’s rewrite it as

where the first term is equal to r, which we defined earlier; we can now see that we could use the “linear correlation coefficient” to compute the slope of the line as

Continuing with the example from above, we get b ≈ 0.8171.

So, essentially, the linear correlation coefficient (Pearson’s r) is just the standardized slope of a simple linear regression line (fit).
To continue with the example, we can now compute the y-axis intercept as

a ≈ 0.4298

Now, our linear regression fit would be

ŷ = 0.4298 + 0.8171 * x

Standardizing Variables

In practice, we often standardize our input variables:

After standardization, our variables have the properties of a standard normal distribution with mean=0, and standard deviation 1.

Or in other words, we center our variables at 0 so that we don’t need to compute the y-axis intercept.

ŷ = a * x = r * x

This is also useful if we use optimization algorithms for multiple linear regression, such as gradient descent, instead of the closed-form solution (handy for working with large datasets). Here, we want to standardize the variables so that the gradient descent learning algorithms learns the model coefficients “equally” in multiple linear regression. Another advantage of this approach is that the slope is then exactly the same as the correlation coefficient, which saves another computational step.

ŷ = a * x = r * x = a * 0.9868

If you like this content and you are looking for similar, more polished Q & A’s, check out my new book Machine Learning Q and AI.