Z-Scores

Z Scores were a concept I had trouble with in University. It’s actually not as difficult as they’re made out to be. I’ll spare you the complicated introduction (as I’m sure you got one from both your textbook and your Professor), but remember that z-scores show you the distance between your score and the mean, in standard deviations.

So a z-score of 1 is one standard deviation (approximately 84%) above the entire population, and 34% above the mean. Typically you’ll be asked to do a few things:

  • What percent above the mean is a particular z-score
  • What percent below the mean is a particular z-score
  • What percent is between two scores
  • How do you convert a raw score into a z-score
  • How do you convert a z-score into a raw score

So, let’s get too it. Remember that you’ll need a z-table (usually provided by your Professor or available in you textbook) for these exercises.

Raw Score into Z-Score

The formula for converting a raw score into a z-score is Z = (M – X) / SD, or Z Score = (Mean – Value) / Standard Deviation.

So, if you have a score of 80, and the mean is 75, with a Standard Deviation of 5, your equation will be:

(80 – 75) / 5 = 1. Therefore your Z score is 1.0

If you instead scored 73, it would be (73 – 75) / 5 = -0.4.

Percent Below a Score

You’ll be given a z-score like 0.66, and you’ll need to find out what percent of scores are above it. Simply go to your z-table, and find 0.66. Some tables list all the values sequentially (0.5, 0.51, 0.52) while others use a table like Wikipedia’s.

If your table includes both “% mean to z” and “% in tail” (like my textbook), just look a the “% mean to z.” If your table uses decimals (like 0.7454), multiply them by 100 to get the correct value.

When I look up 0.66 in my textbook’s z-table, I see 24.54%. When I look up the same value in Wikipedia’s table, I see 74.54%. What gives? The % mean to z is only half of the equation. In order to return a correct z-score, you take your 24.54% and add 50 to it, because it’s a positive z-score.

If you have a negative z-score, like -0.85, you take the z-score you’re given (30.23%) and you subtract 50 from it, which gives you -19.77 (ignore the negative.), or 19.77%.

Percent Above a Score

To find the percent above a score, you perform the same calculation for percent below, but you subtract from 100. For instance, 19.77% is percent below, so when you subtract that value from 100, you get 80.23% above.

Percent Between Scores

To calculate percent between scores, you simply take the difference between two scores and subtract them. For instance, if you want to know the difference between 0.6 and 0.7:

% to mean of 0.6 is 22.57 and of 0.7 is 25.80.

Because both numbers are above the mean, we add 50 to each, giving us percentages of 72.57 and 75.80.

Subtracting 72.57 from 75.80 gives a grand total of 3.23% between the two values.

Z-Score Back to Raw Score

The formula for converting a z-score back to a raw score is R = Z*SD + M. So if your z score is 0.8, the mean is 75 and the Standard Deviation is 5, your equation looks like:

Raw Score = 0.8*5 + 75

Raw Score = 4 + 75

Raw Score = 79



Cite this article as: MacDonald, D.K., (2015), "Z-Scores," retrieved on January 23, 2018 from http://dustinkmacdonald.com/z-scores/.

Facebooktwittergoogle_plusredditmailby feather

Dispersion and Variability (Standard Deviation)

The topics dispersion and variability (or variance) describes the “spread” of data in a distribution. This article explains how to compute the variance and the standard.

The first measure of dispersion to look at is the variance. Let’s look at the data set below:

X Values
4
5
2
7

 

Steps to Calculate Variance:

  1. Calculate mean
  2. Subtract each value in set from mean
  3. Square each number from 2)
  4. Sum the values from 3)
  5. Divide by the number of values in the set

Let’s work through these steps. First, let’s calculate the mean:

M = ∑X / n (the sum of X divided by N)
M = 4 + 5 + 2 + 7 / 4
M = 18 / 4
M = 4.5

Second, we subtract each value in the set from the mean.

X Values X – M
4 -0.5
5 0.5
2 2.5
7 2.5

 

Third, we square each value.

X Values X – M (X – M)2
4 -0.5 0.25
5 0.5 0.25
2 -2.5 6.25
7 2.5 6.25

 

Forth, we sum the values from the third.

X Values X – M (X – M)2
4 -0.5 0.25
5 0.5 0.25
2 -2.5 6.25
7 2.5 6.25
13

Finally, we divide by the number of values in the set:

Variance is 13 / 4 = 3.25

To calculate the standard deviation, you simply take the square root of the variance.

Sqrt(3.25) = 1.80

So, the standard deviation is 1.80. You can confirm this by going into Excel and using the STDEV.P formula



Cite this article as: MacDonald, D.K., (2015), "Dispersion and Variability (Standard Deviation)," retrieved on January 23, 2018 from http://dustinkmacdonald.com/dispersion-and-variability-standard-deviation/.

Facebooktwittergoogle_plusredditmailby feather

Measures of Central Tendency

The measures of central tendency are processes for determining what the central value in a dataset is. The most common is the arithmetic average, or mean – so this value has come to be known as simply the average.

The three measures of central tendency are mean, median and mode.

Mean

To calculate the mean (also known as the arithmetic mean or arithmetic average), you take all of the scores, add up  their values and divide them by the number you have. Let’s look at the following values of student values out of 10:

4 4 4 5 5 5 6 6
6 6 6 6 6 7 7 7
7 7 7 7 7 7 7 7
7 8 8 8 8 8 8 8
8 8 9 9 9 9 9 10

 

There are 40 values here. If we add them all up, we get the total 280. Dividing by the number of values, we get an average of:

  • / 40 = 7

Median

The mean is a very common distribution but can be affected by extreme sources. If any values are very high or very low compared to the majority, the mean can be affected. In situations like this, we use the median. The median is the middle value in the set of scores.

For instance, let’s look at a limited set of numbers from the above data set:

4 4 4 5 5 5 6

There are 7 values here, so the middle value, 5 becomes the median. In a situation like our full chart above where we have 40 values, we instead have two middle values.

17 18 19 20 21 22 23 24
7 7 7 7 7 7 7 7

 

Taking values 20 and 21 (7 + 7) and dividing them by 2 gives us the median 7.

Mode

Finally, the mode is simply the most common score occurring in a distribution. In the full data set above, we have the following values and frequencies:

Value Frequency
4 3
5 3
6 7
7 12
8 9
9 5
10 2

 

In this case, 7 appears twelve times, so it becomes our median.

Choosing a Measure of Central Tendency

The mean is most commonly used – it is the best for symmetric distributions (distributions without major outliers.) The median is best for a skewed distribution or one with outlier(s), while the mode is used in 3 cases:

  • One particular score dominates a distribution
  • Distribution is bimodal or multimodal
  • Data are nominal

Weighted Mean

One special case of the mean is the “weighted mean”, where some values are “weighted” or contribute more to the total value than others. The data set from above is presented here:

Value Frequency
4 3
5 3
6 7
7 12
8 9
9 5
10 2

 

To calculate the weighted mean, we multiply each value by its frequency, before dividing by the frequency. This is similar to the mean as you’ll see:

  • 3×4 + 3×5 + 7×6 + 12×7 + 9×8 + 5×9 + 10×2
    = 12 + 15 + 42 + 84 + 72 + 45 + 20
    = 290
  • We divide by the original frequencies:

3 + 3 + 7 + 12 + 9 + 5 + 2
= 41

  • And now we’ll divide the top by the bottom:290 / 41 = 7.073



Cite this article as: MacDonald, D.K., (2015), "Measures of Central Tendency," retrieved on January 23, 2018 from http://dustinkmacdonald.com/measures-of-central-tendency/.

Facebooktwittergoogle_plusredditmailby feather

Frequency Distributions

Frequency distributions are a simple way of organizing data based on how many each item has occurred. This can be used for individual values or for age ranges.

The steps for making a frequency distribution are pretty simple. Let’s take a set of 20 students and their ages:

18 17 17 18 15
15 17 14 15 16
16 16 17 14 17
14 15 14 16 17

 

Creating a Frequency Distribution:

  1. Determine the highest and lowest scores
  2. Create two columns, label the first with the variable name, label the second frequency
  3. List the full range of values that encompass all the scores in the data set from highest to lowest. Include all values in the range, even those for which the frequency is 0
  4. Count the number of scores at each value and write those numbers in the frequency column.

Let’s work through these steps one by one.

  1. The first step is to establish the highest and lowest scores. In this example, the highest score is 18 and the lowest score is 14.
  2. Creating two columns, we’ll call the first column “Student Age” and the second column “Frequency”.
  3. Next, we’ll add each of the ages, 14-18 to our chart and count how many times each one occurs
Student Age Frequency
14 4
15 4
16 4
17 5
18 2

 

As you can see, each of the values occurs four times, except for the age 17 which occurs five times, and the age 18 which only occurs twice. From this data we can begin drawing rudimentary conclusions about the individuals in the sample. For instance, this group is relatively evenly distributed, except for 18 year olds which are under-represented.

Group Frequency Table

If your data exists in a range, you can also create a grouped frequency table. This is similar to a regular frequency table but is often used for data where there can be many specific values (for instance, recording the speed at which a person performs a task can result in values that go into the millisecond.) In cases like these, grouped frequency tables are helpful.

It is slightly more complicated to put together. Before we start, let’s go over some definitions.

  • Lower Class Limit – are the smallest numbers that can actually belong to the different classes
  • Upper Class Limit – are the largest numbers that can actually belong to the different classes
  • Class Boundaries – are the numbers used to separate classes, but without gaps created by the class limits

Let’s use the following dataset that contains 16 people, using the amount of time it took people to perform a task to demonstrate these terms:

Lower Class Limit Upper Class Limit Frequency
0 Under 1.5 4
1.5 Under 3.0 5
3 Under 4.5 3
4.5 Under 6.0 3
6.0 Under 7.5 0
7.5 Under 9.0 1

 

What “under 1.5” means is that any value between 0 and 1.499 would qualify, but it is worded this way to simplify things.

How did we decide on the intervals here (e.g. 0 to 1.5, 1.5 to 3 and so on?) We used something called the 2k guideline.

The 2k guideline says that the square of the correct number of intervals should be greater than the number of items in the dataset. F6or instance, in our example we have 16 people, so let’s work through the squares:

  • 2^2 = 4
  • 2^3 = 8
  • 2^4 = 16
  • 2^5 = 32

So, in this example we could use only 4 intervals if we needed to, but the authors of the set (which came from a statistics textbook) chose to use 6 to make the data easier to see. Ideally you want the number to be between 5 and 10.

To determine the best distance between the intervals (in this case they’ve chosen 1.5), you can take the range (which is the highest value minus the lowest value) and divide it by the number of classes.

Let’s assume the highest value was 8 and the lowest value was 0.1. Our “distance calculation” would thus be: 8 – 0.1 / 6 = 1.31. Again, the authors chose to use a simpler value to make interpreting the data easier.

Histogram

Once you have your frequency distribution, you can turn it into a visual display with a histogram, which looks almost like a bar chart and enables us to see the data at a glance. An example of a histogram is below:

Histogram



Cite this article as: MacDonald, D.K., (2015), "Frequency Distributions," retrieved on January 23, 2018 from http://dustinkmacdonald.com/frequency-distributions/.

Facebooktwittergoogle_plusredditmailby feather