## Data Analyst Nanodegree

I’ve earned it! https://confirm.udacity.com/TLVUZQTR

## Simple Linear Regression

In this lesson, you will: Identify Regression Applications Learn How Regression Works Apply Regression to Problems Using Python Machine Learning is frequently split into supervised and unsupervised learning. Regression, which you will be learning about in this lesson (and its extensions in later lessons), is an example of supervised machine learning. In supervised machine learning, you are interested in predicting […]

## Case Study: A/B Tests

A/B tests are used to test changes on a web page by running an experiment where a control group sees the old version, while the experiment group sees the new version. A metric is then chosen to measure the level of engagement from users in each group. These results are then used to judge whether one version is more effective than […]

## Hypothesis Testing

rules for setting up null and alternative hypotheses: The H_0H0​ is true before you collect any data. The H_0H0​ usually states there is no effect or that two groups are equal. The H_0H0​ and H_1H1​ are competing, non-overlapping hypotheses. H_1H1​ is what we would like to prove to be true. H_0H0​ contains an equal sign of some kind – either =, \leq≤, or \geq≥. H_1H1​ contains the opposition […]

## Confidence Intervals – Udacity

import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline np.random.seed(42) full_data = pd.read_csv(‘../data/coffee_dataset.csv’) sample_data = full_data.sample(200)   diffs = [] for _ in range(10000): bootsamp = sample_data.sample(200, replace = True) coff_mean = bootsamp[bootsamp[‘drinks_coffee’] == True][‘height’].mean() nocoff_mean = bootsamp[bootsamp[‘drinks_coffee’] == False][‘height’].mean() diffs.append(coff_mean – nocoff_mean) np.percentile(diffs, 0.5), np.percentile(diffs, 99.5) # statistical evidence […]

## Statistics – Udacity

Descriptive Statistics Descriptive statistics is about describing our collected data using the measures discussed throughout this lesson: measures of center, measures of spread, shape of our distribution, and outliers. We can also use plots of our data to gain a better understanding. Inferential Statistics Inferential Statistics is about using our collected data to draw conclusions to a larger […]

## Probability – Udacity

Probability Here you learned some fundamental rules of probability. Using notation, we could say that the outcome of a coin flip could either be T or H for the event that the coin flips tails or heads, respectively. Then the following rules are true: \bold{P(H)} = 0.5P(H)=0.5 \bold{1 – P(H) = P(\text{not H})} = 0.51−P(H)=P(not H)=0.5 where \bold{\text{not H}}not H is the event […]

## Data Analysis Process – Case Study 1 – Udacity

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline

## Plotting with Pandas – Udacity

import pandas as pd % matplotlib inline df_census = pd.read_csv(‘census_income_data.csv’) df_census.info() df_census.hist(figsize=8, 8)); df_census[‘age’].hist() df_census[‘age’].plot(kind=’hist’); df_census[‘education’].value_counts() #aggregates counts for each unique value in a column df_census[‘education’].value_counts().plot(kind=’bar’) df_census[‘education’].value_counts().plot(kind=’pie’, figsize=(8, 8));   df_cancer = pd.read_csv(‘cancer_data_edited.csv’) pd.plotting.scatter_matrix(df_cancer, figsize=(15, 15)); df_cancer.plot(x=’compactness’, y=’concavity’, kind=’scatter’); df_cancer[‘concave_points’].plot(kind=’box’);   import pandas as pd df = pd.read_csv(‘cancer_data_edited.csv’) df.head() df_m = df[df[‘diagnosis’] == ‘M’] df_m.head() […]