## Simple Linear Regression

In this lesson, you will: Identify Regression Applications Learn How Regression Works Apply Regression to Problems Using Python Machine Learning is frequently split into supervised and unsupervised learning. Regression, which you will be learning about in this lesson (and its extensions in later lessons), is an example of supervised machine learning. In supervised machine learning, you are interested in predicting […]

## Case Study: A/B Tests

A/B tests are used to test changes on a web page by running an experiment where a control group sees the old version, while the experiment group sees the new version. A metric is then chosen to measure the level of engagement from users in each group. These results are then used to judge whether one version is more effective than […]

## Hypothesis Testing

rules for setting up null and alternative hypotheses: The H_0H0​ is true before you collect any data. The H_0H0​ usually states there is no effect or that two groups are equal. The H_0H0​ and H_1H1​ are competing, non-overlapping hypotheses. H_1H1​ is what we would like to prove to be true. H_0H0​ contains an equal sign of some kind – either =, \leq≤, or \geq≥. H_1H1​ contains the opposition […]

## Confidence Intervals – Udacity

import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline np.random.seed(42) full_data = pd.read_csv(‘../data/coffee_dataset.csv’) sample_data = full_data.sample(200)   diffs = [] for _ in range(10000): bootsamp = sample_data.sample(200, replace = True) coff_mean = bootsamp[bootsamp[‘drinks_coffee’] == True][‘height’].mean() nocoff_mean = bootsamp[bootsamp[‘drinks_coffee’] == False][‘height’].mean() diffs.append(coff_mean – nocoff_mean) np.percentile(diffs, 0.5), np.percentile(diffs, 99.5) # statistical evidence […]

## Statistics – Udacity

Descriptive Statistics Descriptive statistics is about describing our collected data using the measures discussed throughout this lesson: measures of center, measures of spread, shape of our distribution, and outliers. We can also use plots of our data to gain a better understanding. Inferential Statistics Inferential Statistics is about using our collected data to draw conclusions to a larger […]

## Data Analysis Process – Case Study 1 – Udacity

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline

## Plotting with Pandas – Udacity

import pandas as pd % matplotlib inline df_census = pd.read_csv(‘census_income_data.csv’) df_census.info() df_census.hist(figsize=8, 8)); df_census[‘age’].hist() df_census[‘age’].plot(kind=’hist’); df_census[‘education’].value_counts() #aggregates counts for each unique value in a column df_census[‘education’].value_counts().plot(kind=’bar’) df_census[‘education’].value_counts().plot(kind=’pie’, figsize=(8, 8));   df_cancer = pd.read_csv(‘cancer_data_edited.csv’) pd.plotting.scatter_matrix(df_cancer, figsize=(15, 15)); df_cancer.plot(x=’compactness’, y=’concavity’, kind=’scatter’); df_cancer[‘concave_points’].plot(kind=’box’);   import pandas as pd df = pd.read_csv(‘cancer_data_edited.csv’) df.head() df_m = df[df[‘diagnosis’] == ‘M’] df_m.head() […]

## Introduction to the Python Standard Library – Udacity

Our favourite modules The Python Standard Library has a lot of modules! To help you get familiar with what’s available, here are a selection of our favourite Python Standard Library modules and why we use them! csv: very convenient for reading and writing csv files collections: useful extensions of the usual data types including OrderedDict, defaultdict and namedtuple random: […]

## Python 3 Syntax – Udacity

def readable_timedelta(days): “””Print the number of weeks and days in a number of days.””” #to get the number of weeks we use integer division weeks = days // 7 #to get the number of days that remain we use %, the modulus operator remainder = days % 7 return “{} week(s) and {} day(s)”.format(weeks, remainder)