Statistical Coding and Classification

Introduction to Classification

Oftentimes when performing research or intelligence analysis, the first step is to classify the available data. Classification provides a number of benefits that make later analysis easier. For one, they allow you to infer other qualities based on all items in a class sharing similar properties.

For instance, knowing that mammals have fur and all mammals give birth birth to live young (as opposed to laying eggs), you can infer that if you see a creature identified as a mammal you can predict these properties about the creature.

Another benefit to classification is that it allows you to see relationships among classes that you may not have been aware of before. The classic Periodic Table is a good example of this: elements along the right-hand side of the periodic table (so-called Noble Gases) all hold similar properties, while other columns ordered together also appear to have similar properties. It is not simply that the elements were organized this way after they were found to match, but in fact “holes” in the periodic table indicated where elements must exist but haven’t been discovered yet.

This brings us to the next benefit of classification, the ability to uncover missing information. Although this is sometimes exploited in military and diplomatic circles (for instance, SEAL Team Six is actually the 4th SEAL team – the number was incremented in order to mislead enemies about how many SEAL Teams there are), this is still a very useful technicque for discovering what you don’t know.

Finally, classification allows you to focus on the properties of group items rather than of individual ones, which can make analyzing large amounts of information much easier than it otherwise would be. We’re sometimes overwhelmed by information and these preliminary steps can help us drill down. This is also accomplished through coding, below.

Statistical Coding

Statistical coding is the form of classification that is perhaps most familiar to researchers. Coding is the task of taking data and assigning it to categories. This allows us to turn normally qualitative data into quantitative or numerical data. If you look at the example of Gender, assigning Male a value of “0” and Female a value of “1” is a form of coding that allows you to perform statistical analysis.

Coding is often used to group responses together. If asking someone what their first emotion is after a sudden loss or grief, you may have to translate disparate responses like, “I was overwhelmed”, “I didn’t know what to do”, and “I felt numb” into simple categories (“Overwhelmed”, “Confused/Shocked”, “Numb”) and later into numerical values (1, 2, 3.)

Make sure to store the results of your coding in a “codebook” so that later you can remember what variable was turned into what coding.

There are a few advantages of statistical coding. For one, it allows you to perform statistical analyses not possible on qualitative data and allows you to perform “blind” analyses without us knowing which variable corresponds to which value.

Cite this article as: MacDonald, D.K., (2016), "Statistical Coding and Classification," retrieved on December 8, 2022 from

Leave a Reply

Your email address will not be published.