Understanding Variation for Wise Comparisons - Measuring variance
Median:
- The median is the middle value of a dataset when it’s arranged in ascending or descending order.
- If there’s an odd number of data points, the median is the exact middle value.
- If there’s an even number of data points, the median is the average of the two middle values.
Mean (Average):
- The mean is the sum of all data points divided by the total number of data points.
- It’s a measure of central tendency and represents the “typical” value in the dataset.
Percentiles:
- Percentiles divide a dataset into equal parts based on rank.
- For example, the 25th percentile (Q1) is the value below which 25% of the data falls.
- The median (50th percentile) is the value below which 50% of the data falls.
Skewness:
- Skewness describes the asymmetry of a distribution.
- A positive skew means the tail of the distribution extends more to the right (mean > median).
- A negative skew means the tail extends more to the left (mean < median).
Symmetrical Distribution:
- In a symmetrical distribution, the mean, median, and mode are approximately equal.
- Examples include the normal distribution (bell curve) and uniform distribution.
The Understanding Distributions module
introduces the shape (symmetrical or skew) and the center (mean or median) of
the data. Now we will look at the variance, or spread, of the
data.
Imagine that you have score results from two groups of
students who took quizzes. Both groups saw mean quiz scores of 70%. However,
group A's quiz scores range from 50% to 90%, while group B's quiz scores range
from 40% to 100%. The scores for group B are more spread out than group A.
We want to better understand the spread of the data. To do
this, we measure the variance and standard deviation.
Sample variance
What should you do if you don't have data for the whole
population?
There is a difference in the calculation of variance for a
population and for a sample, or subset, of a population. For both,
you calculate the mean, then the differences from the mean, square all the
differences, and then sum the squared differences.
When calculating population variance, as in the previous
example, divide the sum of squared deviations from the mean by the number of
items in the population. In a full population of 20, for example, we divide by
20.
The small n represents the number of observations in a
sample. When calculating sample variance, we subtract 1 to compensate for the
sample bias.
When calculating sample variance, divide the sum
of squared deviations from the mean by the number of items in the sample minus
one. In this case, if you had 20 items in a sample (or subset) of the
population, divide by 19. The purpose of this difference is to get a less
biased estimate of the population's variance. In other words, dividing by the
sample size minus one compensates for working with a sample rather than with
the whole population.
Example: Calculate the variance and standard deviation
Now, follow along to determine the variance and the standard
deviation using an example with fewer numbers.
Imagine that you have five cats in your household.
To keep things straightforward, let's consider the cats in
your home a complete population rather than a sample. You weigh each of the
cats, and record the results as represented in the following table.
|
Cat's
name |
Weight
in pounds |
|
Cinnamon |
7 |
|
Danielle |
8 |
|
Lilypad |
9 |
|
Steve |
12 |
|
The Amazing Fluffy |
14 |
First, calculate the mean (or average) weight for the five cats.
1
Add all the weights together:
7 + 8 + 9 + 12 + 14 = 50
2
Then divide that total by the number of cats in the
data:
50/5 = 10
10 pounds is the mean (or average) weight for
this group of cats.
Now, begin to calculate the variance.
3
First, calculate each cat's difference from the mean weight:
|
Cat's
name |
Weight
(in pounds) |
Difference
from mean |
|
Cinnamon |
7 |
7 - 10 = (-3) |
|
Danielle |
8 |
8 - 10 = (-2) |
|
Lilypad |
9 |
9 - 10 = (-1) |
|
Steve |
12 |
12 - 10 = 2 |
|
The Amazing Fluffy |
14 |
14 - 10 = 4 |
4
Now, square each difference from the mean.
|
Cat's
name |
Weight
(in pounds) |
Difference
from mean |
Squared
value of difference from mean |
|
Cinnamon |
7 |
(-3) |
(-3) * (-3) = 9 |
|
Danielle |
8 |
(-2) |
(-2) * (-2) = 4 |
|
Lilypad |
9 |
(-1) |
(-1) * (-1) = 1 |
|
Steve |
12 |
2 |
2 * 2 = 4 |
|
The Amazing Fluffy |
14 |
4 |
4 * 4 = 16 |
5
Next, add all the squared values of the differences from the
mean together:
9 + 4 + 1 + 4 + 16 = 34
6
Then, divide the result by the number of data points (or
cats):
34/5 = 6.8
6.8 is the variance for the cats.
7
Now that you have calculated the variance, calculate the
standard deviation by finding the square root of the variance. (You can use a
calculator to do this.)
The square root of 6.8 is 2.6. So, 2.6 is
the standard deviation.
You can now see which cats' weights are within one standard
deviation (2.6 pounds) of the mean (10 pounds):
|
Cat's
name |
Weight
(in pounds) |
Difference
from mean |
Within
one standard deviation (2.6 pounds)? |
|
Cinnamon |
7 |
(-3) |
No |
|
Danielle |
8 |
(-2) |
Yes |
|
Lilypad |
9 |
(-1) |
Yes |
|
Steve |
12 |
2 |
Yes |
|
The Amazing Fluffy |
14 |
4 |
No |
The Amazing Fluffy
Knowledge check
Imagine you have asked all the coworkers in your department
how many caffeinated beverages they drink weekly, and you have compiled the
following table with their answers.
To keep things straightforward, let's consider your work
department a complete population rather than a sample.
|
Name |
Number
of caffeinated beverages |
Difference
from mean |
Squared
value of difference from mean |
|
Kaye |
4 |
(-2) |
4 |
|
Lanai |
5 |
(-1) |
1 |
|
Treasure |
6 |
0 |
0 |
|
Sander |
9 |
3 |
9 |
To find the variance, add all the squared values of the
differences from the mean together. Then, divide the result by the number of
data points (or coworkers):
- 4 + 1
+ 0 + 9 = 14
- 14/4
= 3.5
The standard deviation is the square root of the variance.
The square root of 3.5 is 1.87.
Look again at the table about caffeinated beverages. How
many people are within one standard deviation of the mean for
the number of caffeinated beverages consumed weekly?
- 1
- 2
correct answer
- 3
- 4
SUBMIT
TAKE AGAIN
Summary
You've been introduced to the concepts of variance and
standard deviation. In the next lesson, you'll take a deeper look at the
concept of continuous distributions.

Comments
Post a Comment