library(bayesrules)
library(tidyverse)
library(scales)
set.seed(84735)Bayesian
P (Hypotheses | Data) Hypotheses have probabilities in light of the observed data
Frequentist
P (Data | Hypotheses)
Data have probability considering the conditions of the hypotheses
Credible Intervals
Confidence Intervals
| Description | Sample Statistic | Population Parameter |
|---|---|---|
| Mean | x̄ | μ |
| Standard Deviation | s | σ |
| Variance | s2 | σ2 |
| Proportion | p | π |
Are there any pink cows in the world?
Null hypothesis: There are no pink cows in the world.
Alternative hypothesis: There is a pink cow in the world.
We go looking for evidence against the null.
If we find any evidence against the null (a single pink cow) then we can conclude the null is false. We say we reject the null hypothesis.
If we do not find any evidence against the null (a single pink cow) then we fail to reject the null. We can keep searching for more evidence against the null (i.e. continue looking for a pink cow). We will never be able to say the null is true so we never accept the null. All we can do is keep looking for a pink cow.
Are there any black cows in the world?
Null hypothesis: There are no black cows in the world.
Alternative hypothesis: There is a black cow in the world.
When we see a black cow, we reject the null hypothesis and conclude that there is a black cow in the world.
Is there a foreign object in the cat’s body?
Null hypothesis: There is no foreign object in the cat’s body.
Alternative hypothesis: There is a foreign object in the cat’s body.
X-ray
X-ray does not show any foreign object.
Null hypothesis: There is no problem with my cell phone.
Alternative hypothesis: There is a problem with my cell phone.
No problems were detected.
If there was no variance there would be no need for statistics.
We want to understand average number of sleep Irvine residents get. What if everyone in Irvine slept 8 hours every night? (sleep = {8, 8,…, 8})
We want to predict who will graduate college. What if everyone graduated college? (graduate = {TRUE, TRUE,…, TRUE})
We want to understand if Android users spend more time on their phones when compared to iOS users. What if everyone spent 3 hours per day on their phones? (time = {3, 3,…, 3}, os = {Android, Android, …. iOS})
We want to understand, if birth height and weight are positively associated in babies. What if every baby was 7.5 lbs? (weight = {7.5, 7.5,…, 7.5}, height = {20, 22,…, 18})
In all these fake scenarios there would be no variance in sleep, graduate, time, weight. These variables would all be constants thus would not even be a variable.
Things vary. We use statistics in research studies to understand how variables vary and often we want to know how they covary with other variables.
To make the connection between research questions of studies and statistics, we will take small steps and begin with writing hypotheses using notation.
Research Question Do UCI students sleep on average 8 hours on a typical night?
Variable sleep (8,7,9,7.5, …)
Research Question Using Notation \(\mu \stackrel{?}{=} 8\)
Hypotheses
\(H_0 : \mu = 8\)
\(H_A : \mu \neq 8\)
\(H_0 : \mu - 8 = 0\)
\(H_A : \mu - 8 \neq 0\)
The parameter we want to infer about is a single mean.
Research Question Do the majority of Americans approve allowing DACA immigrants to become citizens?
Variable approve (yes, yes, yes, no, yes, no, no)
Research Question Using Notation \(\pi \stackrel{?}{>} 0.5\)
Hypotheses
\(H_0: \pi \leq 0.5\)
\(H_A: \pi > 0.5\)
The parameter we want to infer about is a single proportion.
Research Question Is California March 2020 unemployment rate different than US March 2020 unemployment rate which is at 4.4%?
Variable unemployed_CA (no, no, yes, no, yes, no, no…)
Research Question Using Notation \(\pi \stackrel{?}{=} 0.044\)
Hypotheses
\[H_0:\pi= 0.044\] \[H_A: \pi \neq 0.044\]
The parameter we want to infer about is a single proportion.
Research Question Are there more STEM majors at UCI than non-STEM majors?
Variable STEM (TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE…)
Research Question Using Notation \(\pi_{STEM} \stackrel{?}{>} 0.5\)
Hypotheses
\[H_0: \pi \leq 0.5\] \[H_A: \pi > 0.5\]
The parameter we want to infer about is a single proportion.
RQ Do STEM (s) majors have higher or lower (different) income after graduation when compared to non-STEM (n) majors?
Variables explanatory: STEM (TRUE, FALSE, FALSE, TRUE,…)
response: income(40000, 20000, 65490, 115000,…)
Research Question Using Notation \(\mu_{s} \stackrel{?}{=} \mu_{n}\) or \(\mu_{s} - \mu_{n} \stackrel{?}{=}0\)
Hypotheses
\[H_0:\mu_{s} = \mu_{n}\] \[H_A:\mu_{s} \neq \mu_{n}\]
\[H_0:\mu_{s} - \mu_{n} = 0\] \[H_A:\mu_{s} - \mu_{n} \neq 0\]
We want to infer about difference of two means.
RQ Do Democrats and Republicans approve legal abortion at same rates?
Variables explanatory: party (D, D, R, R,…)
response: approve(TRUE, FALSE, FALSE, TRUE,…)
Research Question Using Notation \(\pi_{d} \stackrel{?}{=} \pi_{r}\) or \(\pi_{d} - \pi_{r} \stackrel{?}{=}0\)
Hypotheses
\(H_0:\pi_{d} = \pi_{r}\)
\(H_A:\pi_{d} \neq \pi_{r}\) . . .
We want to infer about difference of two proportions.
| Description | Parameter of Interest | Response | Explanatory |
|---|---|---|---|
| Single Mean | \(\mu\) | Numeric | |
| Difference of Two Means | \(\mu_1 - \mu_2\) | Numeric | Binary |
| Single Proportion | \(\pi\) | Binary | |
| Difference of Two Proportions | \(\pi_1 - \pi_2\) | Binary | Binary |
Later on we will also learn
| Parameter of Interest | Response | Explanatory |
|---|---|---|
| \(\beta_1\) | Numeric | Categorical and/or Numeric |