import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

np.random.seed(42)

full_data = pd.read_csv(‘../data/coffee_dataset.csv’)

sample_data = full_data.sample(200)

diffs = []

for _ in range(10000):

bootsamp = sample_data.sample(200, replace = True)

coff_mean = bootsamp[bootsamp[‘drinks_coffee’] == True][‘height’].mean()

nocoff_mean = bootsamp[bootsamp[‘drinks_coffee’] == False][‘height’].mean()

diffs.append(coff_mean – nocoff_mean)

np.percentile(diffs, 0.5), np.percentile(diffs, 99.5)

# statistical evidence coffee drinkers are on average taller

diffs_age = []

for _ in range(10000):

bootsamp = sample_data.sample(200, replace = True)

under21_mean = bootsamp[bootsamp[‘age’] == ‘<21’][‘height’].mean()

over21_mean = bootsamp[bootsamp[‘age’] != ‘<21’][‘height’].mean()

diffs_age.append(over21_mean – under21_mean)

np.percentile(diffs_age, 0.5), np.percentile(diffs_age, 99.5)

# statistical evidence that over21 are on average taller

diffs_coff_under21 = []

for _ in range(10000):

bootsamp = sample_data.sample(200, replace = True)

under21_coff_mean = bootsamp.query(“age == ‘<21’ and drinks_coffee == True”)[‘height’].mean()

under21_nocoff_mean = bootsamp.query(“age == ‘<21’ and drinks_coffee == False”)[‘height’].mean()

diffs_coff_under21.append(under21_nocoff_mean – under21_coff_mean)

np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)

# For the under21 group, we have evidence that the non-coffee drinkers are on average taller

diffs_coff_over21 = []

for _ in range(10000):

bootsamp = sample_data.sample(200, replace = True)

over21_coff_mean = bootsamp.query(“age != ‘<21’ and drinks_coffee == True”)[‘height’].mean()

over21_nocoff_mean = bootsamp.query(“age != ‘<21’ and drinks_coffee == False”)[‘height’].mean()

diffs_coff_over21.append(over21_nocoff_mean – over21_coff_mean)

np.percentile(diffs_coff_over21, 2.5), np.percentile(diffs_coff_over21, 97.5)

# For the over21 group, we have evidence that on average the non-coffee drinkers are taller

Within the under 21 and over 21 groups, we saw that on average non-coffee drinkers were taller. But, when combined, we saw that on average coffee drinkers were on average taller. This is again **Simpson’s paradox**, and essentially there are more adults in the dataset who were coffee drinkers. So these individuals made it seem like coffee drinkers were on average taller – which is a misleading result.

A larger idea for this is the idea of confounding variables altogether. You will learn even more about these in the regression section of the course.

Though you were comparing the average heights of coffee drinkers to non-coffee drinkers, there are a number of other applications that use a comparison for the means of two groups.

A/B testing is one of the most important to businesses around the world. In this technique, you are changing something about your web layout to understand how it impacts users. You ideally want to provide a page that leads to more clicks, higher revenue, and/or higher customer satisfaction.

Here, you learned about **practical** and **statistical** significance.

Using confidence intervals and hypothesis testing, you are able to provide **statistical significance** in making decisions.

However, it is also important to take into consideration **practical significance** in making decisions.**Practical significance** takes into consideration other factors of your situation that might not be considered directly in the results of your hypothesis test or confidence interval. Constraints like **space**, **time**, or **money** are important in business decisions. However, they might not be accounted for directly in a statistical test.

**One educated, but potentially biased opinion on the traditional methods** is that these methods are no longer necessary with what is possible with statistics with modern computing, and these methods will become even less important with the future of computing. Therefore, memorizing these formulas to throw at particular situation will be a glazed over component of this class. However, there are resources below should you want to dive into a few of the 100s if not 1000s of hypothesis tests that are possible with traditional techniques.

To learn more about the traditional methods, see the documentation here on the corresponding hypothesis tests.

In the left margin, you will see a drop down of the hypothesis tests available, as shown in the image below.

Each of these hypothesis tests is linked to a corresponding confidence interval, but again the bootstrapping approach can be used in place of any of these! Simply by understanding what you would like to estimate, and simulating the sampling distribution for the statistic that best estimates that value.

**T-Test, Two Sample T-Test, Paired T-Test, Z-Test, Chi-Squared Test, F-Test**

Understanding Sampling Distributions and Bootstrapping means that you can simulate the results of any confidence interval you want to build.

In this video you saw a comparison of the traditional method for calculating a difference of means using a python built in to the bootstrapping method you have been using throughout this lesson.

With large sample sizes, these end up looking very similar. With smaller sample sizes, using a traditional methods likely has assumptions that are not true of your interval. Small sample sizes are not ideal for bootstrapping methods though either, as they can lead to misleading results simply due to not accurately representing your entire population well.

It is important to understand the way that your **sample size** and **confidence level** relate to the confidence interval you achieve at the end of your analysis.

Assuming you control all other items of your analysis:

- Increasing your sample size will decrease the width of your confidence interval.
- Increasing your confidence level (say 95% to 99%) will increase the width of your confidence interval.

You saw that you can compute:

- The confidence interval
**width**as the difference between your upper and lower bounds of your confidence interval. - The
**margin of error**is half the confidence interval width, and the value that you add and subtract from your sample estimate to achieve your confidence interval final results.

### Recap

In this lesson, you learned:

- How to use your knowledge of bootstrapping and sampling distributions to create a confidence interval for any population parameter.
- You learned how to build confidence intervals for the population mean and difference in means, but really the same process can be done for any parameter you are interested in.
- You also learned about how to use python built-in functions to build confidence intervals, but that these rely on assumptions like the Central Limit Theorem.
- You learned about the difference between
**statistical significance**and**practical significance**. - Finally, you learned about other language associated with confidence intervals like
**margin of error**and**confidence interval width**, and how to correctly interpret your confidence intervals. Remember, confidence intervals are about**parameters**in a population, and not about individual observations.

### What’s Next

The topics of confidence intervals and hypothesis testing essentially do the same thing, but depending on who you talk to or what source you are reading from, it is important to understand both.

In the next lesson, you will be learning about hypothesis testing!