Understanding Variation for Wise Comparisons - Measuring variance

 

  1. Median:

    • The median is the middle value of a dataset when it’s arranged in ascending or descending order.
    • If there’s an odd number of data points, the median is the exact middle value.
    • If there’s an even number of data points, the median is the average of the two middle values.
  2. Mean (Average):

    • The mean is the sum of all data points divided by the total number of data points.
    • It’s a measure of central tendency and represents the “typical” value in the dataset.
  3. Percentiles:

    • Percentiles divide a dataset into equal parts based on rank.
    • For example, the 25th percentile (Q1) is the value below which 25% of the data falls.
    • The median (50th percentile) is the value below which 50% of the data falls.
  4. Skewness:

    • Skewness describes the asymmetry of a distribution.
    • positive skew means the tail of the distribution extends more to the right (mean > median).
    • negative skew means the tail extends more to the left (mean < median).
  5. Symmetrical Distribution:

    • In a symmetrical distribution, the mean, median, and mode are approximately equal.
    • Examples include the normal distribution (bell curve) and uniform distribution.


 

The Understanding Distributions module introduces the shape (symmetrical or skew) and the center (mean or median) of the data. Now we will look at the variance, or spread, of the data. 

Imagine that you have score results from two groups of students who took quizzes. Both groups saw mean quiz scores of 70%. However, group A's quiz scores range from 50% to 90%, while group B's quiz scores range from 40% to 100%. The scores for group B are more spread out than group A.

We want to better understand the spread of the data. To do this, we measure the variance and standard deviation

Sample variance

What should you do if you don't have data for the whole population?

There is a difference in the calculation of variance for a population and for a sample, or subset, of a population. For both, you calculate the mean, then the differences from the mean, square all the differences, and then sum the squared differences.

When calculating population variance, as in the previous example, divide the sum of squared deviations from the mean by the number of items in the population. In a full population of 20, for example, we divide by 20.

The small n represents the number of observations in a sample. When calculating sample variance, we subtract 1 to compensate for the sample bias.

When calculating sample variance, divide the sum of squared deviations from the mean by the number of items in the sample minus one. In this case, if you had 20 items in a sample (or subset) of the population, divide by 19. The purpose of this difference is to get a less biased estimate of the population's variance. In other words, dividing by the sample size minus one compensates for working with a sample rather than with the whole population.


Example: Calculate the variance and standard deviation

 

Now, follow along to determine the variance and the standard deviation using an example with fewer numbers.

Imagine that you have five cats in your household. 

To keep things straightforward, let's consider the cats in your home a complete population rather than a sample. You weigh each of the cats, and record the results as represented in the following table.

Cat's name

Weight in pounds

Cinnamon

7

Danielle

8

Lilypad

9

Steve

12

The Amazing Fluffy

14

First, calculate the mean (or average) weight for the five cats.

1

Add all the weights together: 

    7 + 8 + 9 + 12 + 14 = 50

2

Then divide that total by the number of cats in the data: 

    50/5 = 10

10 pounds is the mean (or average) weight for this group of cats.

Now, begin to calculate the variance. 

3

First, calculate each cat's difference from the mean weight:

Cat's name

Weight (in pounds)

Difference from mean 
(10 pounds)

Cinnamon

7

7 - 10 = (-3)

Danielle

8

8 - 10 = (-2)

Lilypad

9

9 - 10 = (-1)

Steve

12

12 - 10 = 2

The Amazing Fluffy

14

14 - 10 = 4

4

Now, square each difference from the mean.

Cat's name

Weight (in pounds)

Difference from mean 
(10 pounds)

Squared value of difference from mean

Cinnamon

7

(-3)

(-3) * (-3) = 9

Danielle

8

(-2)

(-2) * (-2) = 4

Lilypad

9

(-1)

(-1) * (-1) = 1

Steve

12

2

2 * 2 = 4

The Amazing Fluffy

14

4

4 * 4 = 16

5

Next, add all the squared values of the differences from the mean together:

    9 + 4 + 1 + 4 + 16 = 34

6

Then, divide the result by the number of data points (or cats):

    34/5 = 6.8

6.8 is the variance for the cats.

7

Now that you have calculated the variance, calculate the standard deviation by finding the square root of the variance. (You can use a calculator to do this.) 

The square root of 6.8 is 2.6. So, 2.6 is the standard deviation.

You can now see which cats' weights are within one standard deviation (2.6 pounds) of the mean (10 pounds):

Cat's name

Weight (in pounds)

Difference from mean 
(10 pounds)

Within one standard deviation (2.6 pounds)?

Cinnamon

7

(-3)

No

Danielle

8

(-2)

Yes

Lilypad

9

(-1)

Yes

Steve

12

2

Yes

The Amazing Fluffy

14

4

No

The Amazing Fluffy

Knowledge check

Imagine you have asked all the coworkers in your department how many caffeinated beverages they drink weekly, and you have compiled the following table with their answers. 

To keep things straightforward, let's consider your work department a complete population rather than a sample.

Name

Number of caffeinated beverages

Difference from mean 
(6 beverages)

Squared value of difference from mean

Kaye

4

(-2)

4

Lanai

5

(-1)

1

Treasure 

6

0

0

Sander

9

3

9

To find the variance, add all the squared values of the differences from the mean together. Then, divide the result by the number of data points (or coworkers):

  • 4 + 1 + 0 + 9 = 14
  • 14/4 = 3.5

The standard deviation is the square root of the variance. The square root of 3.5 is 1.87.

Look again at the table about caffeinated beverages. How many people are within one standard deviation of the mean for the number of caffeinated beverages consumed weekly?

  • 1
  • 2 correct answer
  • 3
  • 4

SUBMIT

TAKE AGAIN

Summary

You've been introduced to the concepts of variance and standard deviation. In the next lesson, you'll take a deeper look at the concept of continuous distributions.

 

 








Comments

Popular posts from this blog

Variation, the normal distribution, and uncertainty