Posts

Hypothesis testing and p-values

Image
  We became familiar with concepts around using variation and the normal distribution to explore, understand, and communicate with data. You also looked at confidence intervals as an example of inference. In this lesson, you'll continue to learn about inference. Inference is the process of drawing conclusions about a population based on a sample of the data. This occurs because, in most instances, it is not practical to obtain all the measurements in a given population.   In other words, if we have data for all of the members of a population, we don't need to make any inferences about the difference between groups within that population.  When it isn't possible to gather data for every individual member of a population, we collect data from samples, and then we make inferences.   In his book  Avoiding Data Pitfalls , author Ben Jones, founder and CEO of Data Literacy, LLC, and a member of the Tableau Community, points out that the census in the Uni...

Variation, the normal distribution, and uncertainty

Image
  Continuous distributions describe the probability distribution of continuous random variables. Unlike discrete distributions (which deal with distinct, countable outcomes), continuous distributions involve a range of possible values. Here are a few key points: 1.     Probability Density Function (PDF) : o    The PDF represents the likelihood of a continuous random variable taking on a specific value. o    It’s a function that describes the relative likelihood of different outcomes. o    For example, the normal distribution (or Gaussian distribution) is a common continuous distribution with a bell-shaped PDF. 2.     Area Under the Curve : o    In continuous distributions, probabilities are represented as areas under the curve. o    The total area under the curve is always 1 (since the variable must take on some value). o    To find the probability of a specific range of values...

Understanding Variation for Wise Comparisons - Measuring variance

Image
  Median : The median is the middle value of a dataset when it’s arranged in ascending or descending order. If there’s an odd number of data points, the median is the exact middle value. If there’s an even number of data points, the median is the average of the two middle values. Mean (Average) : The mean is the sum of all data points divided by the total number of data points. It’s a measure of central tendency and represents the “typical” value in the dataset. Percentiles : Percentiles divide a dataset into equal parts based on rank. For example, the 25th percentile (Q1) is the value below which 25% of the data falls. The median (50th percentile) is the value below which 50% of the data falls. Skewness : Skewness describes the asymmetry of a distribution. A  positive skew  means the tail of the distribution extends more to the right (mean > median). A  negative skew  means the tail extends more to the left (mean < median). Symmetrical Distribution : In a...