Hypothesis testing and p-values

 

We became familiar with concepts around using variation and the normal distribution to explore, understand, and communicate with data. You also looked at confidence intervals as an example of inference.

In this lesson, you'll continue to learn about inference. Inference is the process of drawing conclusions about a population based on a sample of the data. This occurs because, in most instances, it is not practical to obtain all the measurements in a given population.

 



In other words, if we have data for all of the members of a population, we don't need to make any inferences about the difference between groups within that population. 

When it isn't possible to gather data for every individual member of a population, we collect data from samples, and then we make inferences.

 

In his book Avoiding Data Pitfalls, author Ben Jones, founder and CEO of Data Literacy, LLC, and a member of the Tableau Community, points out that the census in the United States happens only once a decade due to how expensive and complicated it is to try to count "every single person in every single residential structure in the entire country and such an undertaking is not without its sources of bias and error." However, because most organizations do not have financial or human resources that equal the U.S Federal government's, they base decisions on inferences made from looking at data samples.

Hypothesis testing

 

Many types of organizations use hypothesis testing. Some businesses, for example, use hypothesis testing for quality control to see if a certain product meets a standard, or to compare new and old sales methods.

Medical research also often bases inferences on data samples. Imagine, for example, that a biotech company has manufactured a new drug to alleviate a disease. To determine whether the medication works, a controlled experiment needs to be conducted. Because it will not be possible to experiment on every single person who has the disease, a subset of people with the disease are randomly sampled for testing.

 



 

Within this sample, one group (the experimental group) receives the treatment, and the other group (the control group) receives a placebo, or sugar pill, instead of the medication. The groups are randomly assigned so that any difference in health outcomes can be attributed to research intervention. 

Tests are set up for both groups, and measurements are taken. When testing differences between the two groups, researchers decide how far apart the results must be in order to determine if the health outcomes for the experimental group and the control group are significantly

 



 

Researchers collect data from the sample groups and run appropriate statistical tests. Then, the researchers use these test results to decide if there is a significant difference in the groups. 

Once the data has been obtained, the researchers will need to make inferences about the population at large (every single person who has the disease). 

This is called hypothesis testing.

 

Hypothesis testing begins with the creation of null and alternative hypothesis statements. 

 

Null hypothesis

The null hypothesis states that the medication will have no impact on health outcomes. It proposes that those who receive the treatment will not have different outcomes from those who do not.

 

Alternative hypothesis

The alternative hypothesis states that there will be a difference. It proposes that those receiving the medication will show more improved health outcomes than those who do not.

 

Hypothesis tests begin by assuming that the null hypothesis is true. Then, the tests aim to discern how likely it is to observe outcomes that are at least as great as in the experiment or test, assuming the null is true. 

In other words, if it's a small probability that the results would be as great if the null is true, then there is evidence to support the alternative hypothesis. If it's a large probability that the results would be as great if the null is true, then there is not enough evidence to support the alternative hypothesis, and there is a need to try again with a new formula. 

Hypothesis tests take the number of samples, the size of the difference measured, and the amount of variation observed in each group into account.

 



 

The numeric result of a hypothesis test (the probability that the null hypothesis is true) is called the p-value. A p-value is used to help determine whether to reject the null hypothesis. In this case, rejecting the null hypothesis means that treatment would work in the larger population. A small p-value indicates that there is enough evidence to reject the null hypothesis and to support the alternative hypothesis.

It's important to note, however, that the p-value doesn't prove or disprove anything. A high p-value doesn't prove that the null hypothesis is valid, and a low p-value doesn't prove that it's invalid. That's why p-values need to be considered with care.

 

Taking care with p-values

 



 

 

Taking care with p-values

At one time, researchers were trained to use the p-value of 0.05 as a cut off. In other words, a p-value of 0.05 or lower was believed to be sufficient to reject the null hypothesis. The 0.05 cutoff corresponds to the tails of the normal distribution. Remember, 95% confidence intervals matched the area of the normal distribution that falls within -2 or +2 standard deviations from the mean. The 0.05 (or 5%) cutoff corresponds to the area that falls outside of -2 or +2 standard deviations from the mean.

Over the past several years, that thinking has been revised. In the medication experiment, for example, if a lower cutoff were used (effectively raising the confidence interval above 95%), it may be harder to reject the null hypothesis. Alternately, imagine that, after using a lower cutoff, the p-value remains low enough to reject the null hypothesis, but the actual difference in outcomes isn't very great.

For these reasons, and many others, the American Statistical Association issued a statement in 2016 in which they claimed, "By itself, a p-value does not provide a good measure regarding a model or hypothesis." To read the full article, click here

The link will open in a separate window.

 

 



 

P-values can also be manipulated by the kind of data brought into the analysis. 

To see an example of how p-values can be manipulated, take a look at this interactive "p-hacking" exercise on the website FiveThirtyEight, a polling aggregation website that also analyzes opinion polls, politics, economics, and sports. 

The link will open in a separate window.

Knowledge check

Which of the following statement is the most accurate about p-values?

  • P-values are the only reason to perform hypothesis testing.
  • P-values should be regarded with absolute certainty.
  • P-values below 0.05 conclusively prove that the null hypothesis is false.
  • P-values can be influenced by other factors and manipulated.

SUBMIT

TAKE AGAIN

Summary

You've now been introduced to inference, hypothesis testing, and p-values. Understanding these concepts can help you make wise comparisons.

References used in this module

For web resources, links will open in separate windows.

  • Cairo, Alberto. The Truthful Art: Data, Charts, and Maps for Communication. Indianapolis, IN: New Riders, 2016.
  • Cairo, Alberto. "Explaining visualizations in The New York Times, NPR, and the BBC." The Functional Art (blog), 2019. Blog link.

Access Alberto Cairo's professional website.

  • Cairo, Alberto. "Those Hurricane Maps Don't Mean What You Think They Mean." The New York Times, 2019. Article link.
  • "Hack Your Way to Scientific Glory." FiveThirtyEight. ABC News Internet Ventures. Article link.
  • Jones, Ben. Avoiding Data Pitfalls: How to Steer Clear of Common Blunders when Working with Data and Presenting Analysis and Visualizations. Hoboken, NJ: John Wiley & Sons, 2019.

You can access Ben Jones' professional website, Data Literacyhere.

  • Lane, David M. Introduction to Statistics. Online Statistics Education: An Interactive Multimedia Course of Study, 2020. 

To select a textbook version for web or mobile, access the website here. Or access a PDF of the textbook here.

  • Wasserstein, Ronald and Nicole Lazar. "The ASA Statement On P-Values: Context, Process, And Purpose." The American Statistician, 2016. Article link.

Comments

Popular posts from this blog

Variation, the normal distribution, and uncertainty