Hypothesis testing and p-values
We became familiar with concepts around using variation and
the normal distribution to explore, understand, and communicate with data. You
also looked at confidence intervals as an example of inference.
In this lesson, you'll continue to learn about inference.
Inference is the process of drawing conclusions about a population based on a
sample of the data. This occurs because, in most instances, it is not practical
to obtain all the measurements in a given population.
In other words, if we have data for all of the members of a
population, we don't need to make any inferences about the difference between
groups within that population.
When it isn't possible to gather data for every individual
member of a population, we collect data from samples, and then we make
inferences.
In his book Avoiding Data Pitfalls, author Ben
Jones, founder and CEO of Data Literacy, LLC, and a member of the Tableau
Community, points out that the census in the United States happens only once a
decade due to how expensive and complicated it is to try to count "every
single person in every single residential structure in the entire country and
such an undertaking is not without its sources of bias and error."
However, because most organizations do not have financial or human resources
that equal the U.S Federal government's, they base decisions on inferences made
from looking at data samples.
Hypothesis testing
Many types of organizations use hypothesis testing. Some
businesses, for example, use hypothesis testing for quality control to see if a
certain product meets a standard, or to compare new and old sales methods.
Medical research also often bases inferences on data
samples. Imagine, for example, that a biotech company has manufactured a new
drug to alleviate a disease. To determine whether the medication works, a
controlled experiment needs to be conducted. Because it will not be possible to
experiment on every single person who has the disease, a subset of people with
the disease are randomly sampled for testing.
Within this sample, one group (the experimental
group) receives the treatment, and the other group (the control
group) receives a placebo, or sugar pill, instead of the medication. The
groups are randomly assigned so that any difference in health outcomes can be
attributed to research intervention.
Tests are set up for both groups, and measurements are
taken. When testing differences between the two groups, researchers decide how
far apart the results must be in order to determine if the health outcomes for
the experimental group and the control group are significantly
Researchers collect data from the sample groups and run
appropriate statistical tests. Then, the researchers use these test results to
decide if there is a significant difference in the groups.
Once the data has been obtained, the researchers will need
to make inferences about the population at large (every single person who has
the disease).
This is called hypothesis testing.
Hypothesis testing begins with the creation of null and alternative hypothesis
statements.
Null hypothesis
–
The null hypothesis states that the medication will have no
impact on health outcomes. It proposes that those who receive the treatment
will not have different outcomes from those who do not.
Alternative hypothesis
–
The alternative hypothesis states that there will be a
difference. It proposes that those receiving the medication will show more
improved health outcomes than those who do not.
Hypothesis tests begin by assuming that the null hypothesis
is true. Then, the tests aim to discern how likely it is to observe outcomes
that are at least as great as in the experiment or test, assuming the null is
true.
In other words, if it's a small probability that the results
would be as great if the null is true, then there is evidence to support the
alternative hypothesis. If it's a large probability that the results would be
as great if the null is true, then there is not enough evidence to support the
alternative hypothesis, and there is a need to try again with a new
formula.
Hypothesis tests take the number of samples, the size of the
difference measured, and the amount of variation observed in each group into
account.
The numeric result of a hypothesis test (the probability
that the null hypothesis is true) is called the p-value. A p-value
is used to help determine whether to reject the null hypothesis. In this case,
rejecting the null hypothesis means that treatment would work in the larger
population. A small p-value indicates that there is enough evidence to reject
the null hypothesis and to support the alternative hypothesis.
It's important to note, however, that the p-value doesn't
prove or disprove anything. A high p-value doesn't prove that the null
hypothesis is valid, and a low p-value doesn't prove that it's invalid. That's
why p-values need to be considered with care.
Taking care with p-values
Taking care with p-values
At one time, researchers were trained to use the p-value of
0.05 as a cut off. In other words, a p-value of 0.05 or lower was believed to
be sufficient to reject the null hypothesis. The 0.05 cutoff corresponds to the
tails of the normal distribution. Remember, 95% confidence intervals matched
the area of the normal distribution that falls within -2 or +2 standard
deviations from the mean. The 0.05 (or 5%) cutoff corresponds to the area that
falls outside of -2 or +2 standard deviations from the mean.
Over the past several years, that thinking has been revised.
In the medication experiment, for example, if a lower cutoff were used
(effectively raising the confidence interval above 95%), it may be harder to
reject the null hypothesis. Alternately, imagine that, after using a lower
cutoff, the p-value remains low enough to reject the null hypothesis, but the
actual difference in outcomes isn't very great.
For these reasons, and many others, the American Statistical
Association issued a statement in 2016 in which they claimed, "By itself,
a p-value does not provide a good measure regarding a model or
hypothesis." To read the full article, click here.
The link will open in a separate window.
P-values can also be manipulated by the kind of data brought
into the analysis.
To see an example of how p-values can be manipulated, take a
look at this interactive
"p-hacking" exercise on the website FiveThirtyEight, a
polling aggregation website that also analyzes opinion polls, politics,
economics, and sports.
The link will open in a separate window.
Knowledge check
Which of the following statement is the most accurate about
p-values?
- P-values
are the only reason to perform hypothesis testing.
- P-values
should be regarded with absolute certainty.
- P-values
below 0.05 conclusively prove that the null hypothesis is false.
- P-values
can be influenced by other factors and manipulated.
SUBMIT
TAKE AGAIN
Summary
You've now been introduced to inference, hypothesis testing,
and p-values. Understanding these concepts can help you make wise comparisons.
References used in this module
For web resources, links will open in separate windows.
- Cairo,
Alberto. The Truthful Art: Data, Charts, and Maps for
Communication. Indianapolis, IN: New Riders, 2016.
- Cairo,
Alberto. "Explaining visualizations in The New York Times, NPR, and
the BBC." The Functional Art (blog), 2019. Blog link.
Access Alberto Cairo's professional website.
- Cairo,
Alberto. "Those Hurricane Maps Don't Mean What You Think They
Mean." The New York Times, 2019. Article link.
- "Hack
Your Way to Scientific Glory." FiveThirtyEight. ABC News
Internet Ventures. Article link.
- Jones,
Ben. Avoiding Data Pitfalls: How to Steer Clear of Common Blunders
when Working with Data and Presenting Analysis and Visualizations.
Hoboken, NJ: John Wiley & Sons, 2019.
You can access Ben Jones' professional website, Data
Literacy, here.
- Lane,
David M. Introduction to Statistics. Online Statistics
Education: An Interactive Multimedia Course of Study, 2020.
To select a textbook version for web or mobile, access the
website here. Or
access a PDF of the textbook here.
- Wasserstein,
Ronald and Nicole Lazar. "The ASA Statement On P-Values: Context,
Process, And Purpose." The American Statistician,
2016. Article link.






Comments
Post a Comment