Frequency distributions are a simple way of organizing data based on how many each item has occurred. This can be used for individual values or for age ranges.
The steps for making a frequency distribution are pretty simple. Let’s take a set of 20 students and their ages:
Creating a Frequency Distribution:
- Determine the highest and lowest scores
- Create two columns, label the first with the variable name, label the second frequency
- List the full range of values that encompass all the scores in the data set from highest to lowest. Include all values in the range, even those for which the frequency is 0
- Count the number of scores at each value and write those numbers in the frequency column.
Let’s work through these steps one by one.
- The first step is to establish the highest and lowest scores. In this example, the highest score is 18 and the lowest score is 14.
- Creating two columns, we’ll call the first column “Student Age” and the second column “Frequency”.
- Next, we’ll add each of the ages, 14-18 to our chart and count how many times each one occurs
As you can see, each of the values occurs four times, except for the age 17 which occurs five times, and the age 18 which only occurs twice. From this data we can begin drawing rudimentary conclusions about the individuals in the sample. For instance, this group is relatively evenly distributed, except for 18 year olds which are under-represented.
Group Frequency Table
If your data exists in a range, you can also create a grouped frequency table. This is similar to a regular frequency table but is often used for data where there can be many specific values (for instance, recording the speed at which a person performs a task can result in values that go into the millisecond.) In cases like these, grouped frequency tables are helpful.
It is slightly more complicated to put together. Before we start, let’s go over some definitions.
- Lower Class Limit – are the smallest numbers that can actually belong to the different classes
- Upper Class Limit – are the largest numbers that can actually belong to the different classes
- Class Boundaries – are the numbers used to separate classes, but without gaps created by the class limits
Let’s use the following dataset that contains 16 people, using the amount of time it took people to perform a task to demonstrate these terms:
|Lower Class Limit||Upper Class Limit||Frequency|
What “under 1.5” means is that any value between 0 and 1.499 would qualify, but it is worded this way to simplify things.
How did we decide on the intervals here (e.g. 0 to 1.5, 1.5 to 3 and so on?) We used something called the 2k guideline.
The 2k guideline says that the square of the correct number of intervals should be greater than the number of items in the dataset. F6or instance, in our example we have 16 people, so let’s work through the squares:
- 2^2 = 4
- 2^3 = 8
- 2^4 = 16
- 2^5 = 32
So, in this example we could use only 4 intervals if we needed to, but the authors of the set (which came from a statistics textbook) chose to use 6 to make the data easier to see. Ideally you want the number to be between 5 and 10.
To determine the best distance between the intervals (in this case they’ve chosen 1.5), you can take the range (which is the highest value minus the lowest value) and divide it by the number of classes.
Let’s assume the highest value was 8 and the lowest value was 0.1. Our “distance calculation” would thus be: 8 – 0.1 / 6 = 1.31. Again, the authors chose to use a simpler value to make interpreting the data easier.
Once you have your frequency distribution, you can turn it into a visual display with a histogram, which looks almost like a bar chart and enables us to see the data at a glance. An example of a histogram is below: