Dustin K MacDonald

Menu
  • Home
  • About
  • Economic and Community Development
  • Nonprofit Management
    • Counselling and Service Delivery
    • Suicide Prevention / Crisis Intervention
  • Politics and Governance
  • Math and Statistics
  • Salesforce
Menu

Correlation (Calculating Pearson’s r)

Posted on January 25, 2015December 20, 2018 by Dustin

Correlation refers to the idea that two variables (x and y) impact each other. For instance, the grades in a statistics class may be related to, or correlated with the amount of time those students study. As study time goes up, grades go up. This would be a positive correlation. On the other hand, as time spent partying, grades go down. This is called a negative correlation.

A positive correlation doesn’t strictly refer to good things, though. As the percent of poverty in a community goes up, the amount of crime may also go up. This is a positive correlation, but certainly not a good thing!

Correlations are expressed from -1 (which is perfectly negative) and +1 (which is perfectly positive.) The number shows the strength, and the sign (positive or negative) shows the direction. Therefore, -0.75 is a stronger correlation (or connection) than 0.25.

One common expression is “Correlation is not causation”; this refers to the idea that items can be correlated without really being related to each other. For instance, there is a close connection between the rates of ice-cream consumption in the winter and the drowning rate, even though one really doesn’t affect the other.

How to Calculate Correlation
Pearson’s r (also known as the correlation coefficient) is a simple correlation tool to work with. (Technically the r is used for samples and p is used for populations, but we’ll be working with samples, a limited amount of the total so we will simply refer to it as Pearson’s r or r.)

The formula is here:

correl-formula

This formula may look complicated, but let’s step through it step by step.

The sum of the values of X subtracted from the mean of X multipled by the values of Y subtracted from the mean of Y divided by the square root of X subtracted from the mean of X-squared multiplied by Y subtracted from the mean of Y-squared.

Let’s look at the following set of data of student absences and their final grades:

Student # Absences Exam Grade
1 4 82
2 2 98
3 2 76
4 3 68
5 1 84
6 0 99
7 4 67
8 8 58
9 7 50
10 3 78

The first step is to create a scatterplot of the data to see if any patterns stick out:
scatterplot

This shows a moderately negative correlation, as absences go up, grades go down.

Moving to the equation, let’s look at the top part fist:

∑[(X-MX)(Y-MY)]

We have to calculate the mean of X and the mean of Y:

4 + 2 + 2 + 3 + 1 + 0 + 4 + 8 + 7 + 3 = 34 / 10 = Mean of X of 3.4
82 + 98 + 76 + 68 + 84 + 99 + 67 + 58 + 50 + 78 = 760 / 10 = Mean of Y of 76.
Next, we calculate X-Mx and Y-My, and sum them up.

X X – Mx Y Y – My
4 0,6 82 6
2 -1,4 98 22
2 -1,4 76 0
3 -0,4 68 -8
1 -2,4 84 8
0 -3,4 99 23
4 0,6 67 -9
8 4,6 58 -18
7 3,6 50 -26
3 -0,4 78 2

Next, we must multiply the values of each of these together:

X – Mx Y – My X-Mx * Y-My
0,6 6 3,6
-1,4 22 -30,8
-1,4 0 0
-0,4 -8 3,2
-2,4 8 -19,2
-3,4 23 -78,2
0,6 -9 -5,4
4,6 -18 -82,8
3,6 -26 -93,6
-0,4 2 -0,8

And the sum of these (3.6 + -30.8 + 0 + 3.2 and so on) is -304. Here’s our equation so far:

eq2

Next, let’s look at the bottom part of the equation:
correl-formula

X X – Mx (X-Mx)2 Y Y – My (Y-My)2
4 0,6 0.36 82 6 36
2 -1,4 1.96 98 22 484
2 -1,4 1.96 76 0 0
3 -0,4 0.16 68 -8 64
1 -2,4 5.76 84 8 64
0 -3,4 11.56 99 23 529
4 0,6 0.36 67 -9 81
8 4,6 21.16 58 -18 324
7 3,6 12.96 50 -26 676
3 -0,4 0.16 78 2 4

∑

56.4

∑

2262

We take the square of each of the X values and sum them up. We do the same for the Y values.

This results in: -304 / Sqrt(56.4*2262)

Next, we multiply the two bottoms together. 56.4 x 2262 = 127,576.8.

Taking the square root yields 357.179.

Our final calculation is -304 / 357.179 which equals -0.85.
-0.85 is our final correlation, which we can confirm using Excel’s CORREL function.

Cite this article as: MacDonald, D.K., (2015), "Correlation (Calculating Pearson’s r)," retrieved on March 29, 2023 from http://dustinkmacdonald.com/correlation-calculating-pearsons-r/.

1 thought on “Correlation (Calculating Pearson’s r)”

  1. Pingback: An Introduction to Intelligence Research and Analysis | Dustin K MacDonald Dustin K MacDonald

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Privacy Policy

See here for our privacy policy. This site uses affiliate links and Adsense ads to provide targeted advertising.

Tags

analytical technique assessment city council communication community development counselling crisis chat crisis intervention data science eastern university economic development education empathy evaluation forecasting fundraising governance humint intelligence intelligence analysis keokuk county language learning legal local government management peer support personal development politics professional development protective factors psychosocial risk factors safety planning salesforce sigourney social media statistics suicide suicide assessment suicide risk assessment technology terrorism training violence risk assessment youth

Recommended Posts

  • Conducting Psychosocial Assessments
  • DCIB Model of Suicide Risk Assessment
  • ABC Model of Crisis Intervention
  • My Friend is Suicidal - What do I do?

Recent Posts

  • University of the Cumberlands PhD in Information Technology
  • Joining the US Coast Guard Auxiliary
  • What is a Salesforce Business Analyst?
  • Why I Joined Mensa
  • NCCM Documentation

Archives

  • March 2023 (1)
  • February 2023 (2)
  • January 2023 (4)
  • December 2022 (2)
  • May 2022 (1)
  • April 2022 (2)
  • March 2022 (1)
  • February 2022 (1)
  • December 2021 (1)
  • October 2021 (1)
  • August 2021 (2)
  • May 2021 (3)
  • December 2020 (1)
  • November 2020 (4)
  • July 2020 (1)
  • June 2020 (1)
  • April 2020 (1)
  • March 2020 (4)
  • February 2020 (7)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (2)
  • September 2019 (4)
  • August 2019 (2)
  • March 2019 (1)
  • February 2019 (1)
  • January 2019 (1)
  • December 2018 (4)
  • November 2018 (3)
  • October 2018 (3)
  • September 2018 (19)
  • October 2017 (2)
  • September 2017 (2)
  • August 2017 (1)
  • July 2017 (39)
  • May 2017 (3)
  • April 2017 (4)
  • March 2017 (4)
  • February 2017 (4)
  • January 2017 (5)
  • December 2016 (4)
  • November 2016 (4)
  • October 2016 (5)
  • September 2016 (4)
  • August 2016 (5)
  • July 2016 (5)
  • June 2016 (5)
  • May 2016 (3)
  • April 2016 (2)
  • March 2016 (2)
  • February 2016 (2)
  • January 2016 (4)
  • December 2015 (2)
  • November 2015 (2)
  • October 2015 (2)
  • September 2015 (2)
  • August 2015 (1)
  • June 2015 (2)
  • May 2015 (5)
  • April 2015 (3)
  • March 2015 (8)
  • February 2015 (12)
  • January 2015 (28)

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Recent Comments

  • Dustin on Eastern University Master of Science in Data Science Half-way Review
  • Joly on Eastern University Master of Science in Data Science Half-way Review
  • Dustin on Crisis Triage Rating Scale (CTRS)
  • thomas mahek on Crisis Triage Rating Scale (CTRS)
  • Dustin on Eastern University MS in Data Science 2022 Review

Tags

analytical technique assessment city council communication community development counselling crisis chat crisis intervention data science eastern university economic development education empathy evaluation forecasting fundraising governance humint intelligence intelligence analysis keokuk county language learning legal local government management peer support personal development politics professional development protective factors psychosocial risk factors safety planning salesforce sigourney social media statistics suicide suicide assessment suicide risk assessment technology terrorism training violence risk assessment youth
© 2023 Dustin K MacDonald | Powered by Minimalist Blog WordPress Theme