by Jason Marshall
We’ve now talked about three methods for calculating average values: the mean, median, and mode (affectionately known as the three Ms). So, are we done? Is that everything we could ever want to know about interpreting average values? No, there’s a bit more to it. Today, we’re going to talk about two values that complement averages and help us interpret what they mean: range and standard deviation.
But first, the podcast edition of this tip was sponsored by Go To Meeting. Save time and money by hosting your meetings online. Visit GoToMeeting.com/podcast and sign up for a free 45 day trial of their web conferencing solution.
Do Average Values Tell Us Everything We Need to Know?
Why don’t average values alone tell us everything we need to know about a set of data? Let me answer this question with an example. Imagine you’ve been given the job of comparing the performances of students in high school math classes at two competing schools. Much to the disappointment of the administration at both schools (they’re always looking for good news to boast about), you find that the average math scores are identical for both schools.
Naïvely, it’s tempting to conclude from this that the students at the two schools are performing similarly. But you, being a clever and mathematically informed individual, decide to look a little further into the scores of the individual students. You find that the five math students at the first school (yes, it’s a very small school) received scores of 40, 55, 70, 85, and 100; while the five math students at the second school received scores of 68, 69, 70, 71, and 72. Both of these schools, therefore, have mean and median math scores of 70 (see the articles on how to calculate mean values and how to calculate median values for more information on these two averages), but the individual scores are very different. It’s clear that any fair and complete comparison of the students at these two schools will require more than a simple calculation of average values.
What is the Range?
Let’s start digging deeper by looking at the most obvious difference between the scores of the students from the two schools. Namely, the scores from the first school range between 40 and 100, whereas those from the second school range between 68 and 72. My use of the word “range” here is no coincidence—it’s a word we use in English, but it also has a mathematical definition: the range is simply the difference between the highest value and the lowest. So, for the first school, the range is 100 – 40 = 60; whereas for the second school, the range is 72 – 68 = 4. The range is the crudest way to estimate the spread of the data, also known as the dispersion. In other words, it tells us how much variation there is in the scores. Student scores at the first school are therefore extremely varied—two students failed (with scores of 40 and 55), and one student got a perfect score of 100. In contrast to this, the five students at the second school—with scores ranging between 68 and 72—all performed very similarly.
Okay, so are we done now? Does the combination of the average value and range tell us everything we need to know to fairly compare the two sets of data? For our particular example here, it probably does. But this example is a little ideal and oversimplified since most classes have more than five students in them. Imagine instead that the class at the first school had 30 students in it instead of 5, and that 28 of these 30 students had scores ranging between 66 and 85. This leaves us with two other students—one scored a rather miserable 15 and the other scored a perfect 100. The mean value of the scores that ranged between 15 and 100 could still be 70, but the range would now be 85! That in spite of the fact that the range of scores of 28 of the 30 students (that’s about 93% of them) is only 19. So, in this more realistic scenario, with a range of 85, the range does not provide a good measure of the typical spread in scores.
What is the Standard Deviation and How is it Calculated?
And that’s exactly where the standard deviation comes in. In a nutshell, the standard deviation provides an approximate measure of the mean distance between each data point and the mean of all the data points. I know this can be a mouthful to digest at first, so let’s take a minute to walk through the calculation slowly. Let me first say this isn't going to be the exact method for calculating the true statistical standard deviation. The real calculation is a little too complex to cover in the time we have right now, but if you'd like to see a demonstration of this real method, check out this week's Math Dude "Video Extra!" epi-sode on YouTube. In the meantime, here's an approximate calculation that gives the idea of what the standard deviation tells you (we'll call it the quasi-standard deviation). For this example, let’s look at the scores of the five students from the second school: 68, 69, 70, 71, and 72. Here’s the quick and dirty tip for calculating the approximate quasi-standard deviation of these five scores. First, the mean value of the five scores is 70, and the quasi-standard deviation is the mean distance between each data point and this mean value of 70. Now, the distance from both 68 and 72 to 70 is 2, the distance from both 69 and 71 to 70 is 1, and the distance between 70 and 70 is, of course, 0 (ignore positive and negative values here—think of it as being interested in the distance, not the direction). Now, all we have to do is find the mean of these five distances: 2 + 1 + 0 + 1 + 2 = 6, so the approximate quasi-standard deviation of the data set—in this case test scores—is therefore 6 / 5 = 1.2
What Does the Approximate Quasi-Standard Deviation Mean?
But what does the approximate quasi-standard deviation mean? Well, for comparison with the value we just calculated, let’s quickly calculate the approximate quasi-standard deviation of the scores in the first class—the one with the widely varying scores of 40, 55, 70, 85, and 100. The mean is again 70, and the distance from each data point to this mean is 30, 15, 0, 15, and 30. The approximate quasi-standard deviation is then just the mean of these five numbers, which is 18. So now, what do these two approximate quasi-standard deviations tell us? Well, as with the range, the approximate quasi-standard deviation measures how much variation there is in the data. But whereas the range tells us the total amount of variation, the approximate quasi-standard deviation tells us the average variation; and, therefore, the value of the approximate quasi-standard deviation won’t be skewed nearly as much by a few extreme data points. In the end, the approximate quasi-standard deviations for the two schools confirm in a well-defined way that the scores from the first school vary a lot more on average than those from the second.
Wrap Up
Okay, that’s all the math we have time for today. Needless to say, there’s a lot more to talk about. Next time we’ll put a dent in that list by looking at some ways that the standard deviation is used in science and politics—applications that are sure to have an impact on your life.
Thanks again to our sponsor this week, Go To Meeting. Visit GoToMeeting.com/podcast and sign up for a free 45 day trial of their online conferencing service.
Please email your math questions and comments to。。。。。。You can get updates about the Math Dude podcast, the “Video Extra!” episodes on YouTube, and all my other musings about math, science, and life in general by following me on Twitter. And don’t forget to join our great community of social networking math fans by becoming a fan of the Math Dude on Facebook.
Finally, if you like what you’ve read and have a few minutes to spare, I’d greatly appreciate your review on iTunes. And while you’re there, please subscribe to the podcast to ensure you’ll never miss a new episode.
Until next time, this is Jason Marshall with The Math Dude’s Quick and Dirty Tips to Make Math Easier. Thanks for reading, math fans!
|