## Data Analyst Nanodegree

I’ve earned it! https://confirm.udacity.com/TLVUZQTR

## T-Tests

# Import packages import pandas as pd import scipy.stats as stats %matplotlib inline # Read in the data data = pd.read_csv(‘Customer Support Time Study.csv’) # Set columns to lists to use in ttest function joe = data[‘Joey’].values.tolist() nat = data[‘Nathaly’].values.tolist() # Plot the means (optional) data.mean().plot(‘bar’) # Perform ttest stats.ttest_ind(nat,joe,equal_var = True)

## git

Further Research Git Internals – Plumbing and Porcelain (advanced – bookmark this and check it out later) Customizing Git – Git Hooks Git Init Recap Use the git init command to create a new, empty repository in the current directory. \$ git init Running this command creates a hidden .git directory. This .git directory is the brain/storage center for the repository. It […]

## Multiple Linear Regression

In this lesson, you will be extending your knowledge of simple linear regression, where you were predicting a quantitative response variable using a quantitative explanatory variable. That is, you were using an equation that looked like this: \hat{y} = b_0 + b_1x_1y^​=b0​+b1​x1​ In this lesson, you will learn about multiple linear regression. In these cases, […]

## Logistic Regression

Fitting Logistic Regression import numpy as np import pandas as pd import statsmodels.api as sm df = pd.read_csv(‘./fraud_dataset.csv’) df.head() 1. As you can see, there are two columns that need to be changed to dummy variables. Replace each of the current columns to the dummy version. Use the 1 for weekday and True, and 0 otherwise. Use the first […]

## Simple Linear Regression

In this lesson, you will: Identify Regression Applications Learn How Regression Works Apply Regression to Problems Using Python Machine Learning is frequently split into supervised and unsupervised learning. Regression, which you will be learning about in this lesson (and its extensions in later lessons), is an example of supervised machine learning. In supervised machine learning, you are interested in predicting […]

## Case Study: A/B Tests

A/B tests are used to test changes on a web page by running an experiment where a control group sees the old version, while the experiment group sees the new version. A metric is then chosen to measure the level of engagement from users in each group. These results are then used to judge whether one version is more effective than […]

## Hypothesis Testing

rules for setting up null and alternative hypotheses: The H_0H0​ is true before you collect any data. The H_0H0​ usually states there is no effect or that two groups are equal. The H_0H0​ and H_1H1​ are competing, non-overlapping hypotheses. H_1H1​ is what we would like to prove to be true. H_0H0​ contains an equal sign of some kind – either =, \leq≤, or \geq≥. H_1H1​ contains the opposition […]

## Confidence Intervals – Udacity

import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline np.random.seed(42) full_data = pd.read_csv(‘../data/coffee_dataset.csv’) sample_data = full_data.sample(200)   diffs = [] for _ in range(10000): bootsamp = sample_data.sample(200, replace = True) coff_mean = bootsamp[bootsamp[‘drinks_coffee’] == True][‘height’].mean() nocoff_mean = bootsamp[bootsamp[‘drinks_coffee’] == False][‘height’].mean() diffs.append(coff_mean – nocoff_mean) np.percentile(diffs, 0.5), np.percentile(diffs, 99.5) # statistical evidence […]

## Statistics – Udacity

Descriptive Statistics Descriptive statistics is about describing our collected data using the measures discussed throughout this lesson: measures of center, measures of spread, shape of our distribution, and outliers. We can also use plots of our data to gain a better understanding. Inferential Statistics Inferential Statistics is about using our collected data to draw conclusions to a larger […]