Hypothesis Testing

Chapter 11

In groups

  • Discuss Homework: Chapter 11, #1, 2

Remember the Penguins!

ggplot(penguins, mapping = aes(x = body_mass_g)) +
  geom_histogram(binwidth=200, col="white", fill = "seagreen")

Density plot (new)

ggplot(penguins, mapping = aes(x = body_mass_g)) +
  geom_density(fill = "seagreen")

Box plot

ggplot(penguins, mapping = aes(x = body_mass_g)) +
  geom_boxplot(fill = "seagreen")

Body mass faceted by species

ggplot(penguins, mapping = aes(x = body_mass_g)) +
  geom_histogram(binwidth=200, col="white", fill = "seagreen") +
  facet_wrap(~species)

Mulitple boxplots

ggplot(penguins, mapping = aes(x = body_mass_g, fill = species)) +
  geom_boxplot()

Overlay density plots (new!)

ggplot(penguins, mapping = aes(x = body_mass_g, fill = species)) +
  geom_density(alpha = 0.3)

Ridge plot (new!)

ggplot(penguins, aes(x = body_mass_g, y = species, fill = species)) +
  geom_density_ridges(alpha = 0.3) 

Summary Statistics

penguins |>
  group_by(species) |>
  summarize( mean = mean(body_mass_g, na.rm = TRUE),
             sd = sd(body_mass_g, na.rm = TRUE)
  )
# A tibble: 3 × 3
  species    mean    sd
  <fct>     <dbl> <dbl>
1 Adelie    3701.  459.
2 Chinstrap 3733.  384.
3 Gentoo    5076.  504.


Question: Are the mean body weights of Adelie and Chinstraps really different? Or is the difference just due to random variation?

Sample vs Population

  • Chapter 5: sample mean is (hopefully) an approxmiation of the population mean

  • Clearly the sample means are different (3701 vs. 3733)

  • But are the population means different?

What do you think?

Hypothesis Testing

  • \(H_0\) – Null Hypothesis (no difference)

Mean body weight of Adelie and Chinstrap penguins are the same.

  • \(H_1\) – Alternative Hypothesis (there is a difference)

Mean body weight of Adelie and Chinstrap penguins are different.

Coin Flipping

If you flip a fair coin 100 times, how many times will it come up heads?

Do a simulation

Gender Discrimination Study

Textbook Section 11.1