Introduction to Bayesian Inference:

The Beta-Binomial Model

Dr. Mine Dogucu

Packages

library(bayesrules)
library(tidyverse)

Examples from this lecture are mainly taken from the Bayes Rules! book and the new functions are from the bayesrules package.

Bayesian Inference

Bechdel Test

Alison Bechdel’s 1985 comic Dykes to Watch Out For has a strip called The Rule where a person states that they only go to a movie if it satisfies the following three rules:

  • the movie has to have at least two women in it;

  • these two women talk to each other; and

  • they talk about something besides a man.

This test is used for assessing movies in terms of representation of women. Even though there are three criteria, a movie either fails or passes the Bechdel test.

Unknown

Let \(\pi\) be the the proportion of movies that pass the Bechdel test.

The Beta distribution is a good fit for modeling our prior understanding about \(\pi\).

We will utilize functions from library(bayesrules) to examine different people’s prior understanding of \(\pi\) and build our own.

The Optimist

summarize_beta(14, 1)
       mean mode         var         sd
1 0.9333333    1 0.003888889 0.06236096
plot_beta(14, 1) 
Curve showing how plausible different values of π are, with the highest point around 1. Values close to 0 are much less plausible. The curve starts rising from near 0.75, peaks around 1.

The Clueless

summarize_beta(1, 1)
  mean mode        var        sd
1  0.5  NaN 0.08333333 0.2886751
plot_beta(1, 1) 
A flat line showing how plausible different values of π are at exactly f(pi) equalling 1 for all values of pi.

The Feminist

summarize_beta(5, 11)
    mean      mode        var        sd
1 0.3125 0.2857143 0.01263787 0.1124183
plot_beta(5, 11) 
Curve showing how plausible different values of π are, with the highest point around 0.3. Values near 0.3 are most likely, while values close to 0 or 1 are much less plausible. The curve rises from near 0, peaks around 0.3, and then gradually decreases toward 1.

Vocabulary

Informative prior: An informative prior reflects specific information about the unknown variable with high certainty (ie. low variability).

Vague (diffuse) prior:

A vague or diffuse prior reflects little specific information about the unknown variable. A flat prior, which assigns equal prior plausibility to all possible values of the variable, is a special case.

Quiz question

Which of these people are more certain (i.e. have a highly informative prior)?

  • The optimist
  • The clueless
  • The feminist

Plotting Beta Prior

A 3×3 grid of line plots showing probability density functions for different Beta distributions on the interval 0 to 1 (x-axis labeled π, y-axis labeled f(π)). Each panel is titled with its parameter values: Beta(1,5), Beta(1,2), Beta(3,7), Beta(1,1), Beta(5,5), Beta(20,20), Beta(7,3), Beta(2,1), and Beta(5,1). The shapes vary across panels: Beta(1,1) is flat (uniform); Beta(1,5) and Beta(1,2) decrease from left to right (mass near 0); Beta(2,1) and Beta(5,1) increase toward 1 (mass near 1); Beta(3,7) peaks left of center; Beta(7,3) peaks right of center; Beta(5,5) is symmetric and moderately peaked at 0.5; and Beta(20,20) is sharply peaked around 0.5.

Your prior

What is your prior model of \(\pi\)?

Utilize the summarize_beta() and plot_beta() functions to describe your own prior model of \(\pi\). Make sure to note this down. We will keep referring to this quite a lot.

Data

set.seed(84735)
bechdel_sample <- sample_n(bechdel, 20)

We are taking a random sample of size 20 from the bechdel data frame using the sample_n() function.

The set.seed() makes sure that we end up with the same set of 20 movies when we run the code. This will hold true for anyone in the class. So we can all reproduce each other’s analyses, if we wanted to. The number 84735 has no significance other than that it closely resembles BAYES.

Data

glimpse(bechdel_sample)
Rows: 20
Columns: 3
$ year   <dbl> 2005, 1983, 2013, 2001, 2010, 1997, 2010, 2009, 1998, 2007, 201…
$ title  <chr> "King Kong", "Flashdance", "The Purge", "American Outlaws", "Se…
$ binary <chr> "FAIL", "PASS", "FAIL", "FAIL", "PASS", "FAIL", "FAIL", "PASS",…
count(bechdel_sample, binary)
# A tibble: 2 × 2
  binary     n
  <chr>  <int>
1 FAIL      11
2 PASS       9

The Optimist

summarize_beta_binomial(14, 1, y = 9, n = 20)
      model alpha beta      mean      mode         var         sd
1     prior    14    1 0.9333333 1.0000000 0.003888889 0.06236096
2 posterior    23   12 0.6571429 0.6666667 0.006258503 0.07911070

The Optimist

plot_beta_binomial(14, 1, y = 9, n = 20)
A single plot showing three overlaid curves on the interval 0 to 1 (x-axis labeled π, y-axis labeled density). A yellow curve labeled “prior” is heavily skewed toward 1, rising sharply near π = 1. A blue curve labeled “(scaled) likelihood” is centered around π ≈ 0.5 with a moderate spread. A green curve labeled “posterior” is shifted to the right of the likelihood, peaking around π ≈ 0.65 and narrower than the likelihood. The posterior lies between the prior and likelihood, reflecting an update toward higher values of π.

The Clueless

summarize_beta_binomial(1, 1, y = 9, n = 20)
      model alpha beta      mean mode        var        sd
1     prior     1    1 0.5000000  NaN 0.08333333 0.2886751
2 posterior    10   12 0.4545455 0.45 0.01077973 0.1038255

The Clueless

plot_beta_binomial(1, 1, y = 9, n = 20)
A single plot of two curves and a line on the interval 0 to 1 (x-axis labeled π, y-axis labeled density). The yellow “prior” is flat across the entire range, indicating a uniform distribution. The blue “(scaled) likelihood” and green “posterior” curves overlap almost perfectly, both symmetric and centered at π ≈ 0.5 with a bell-shaped peak. The posterior matches the likelihood closely, showing that the uniform prior has little influence on the updated distribution.

The Feminist

summarize_beta_binomial(5, 11, y = 9, n = 20)
      model alpha beta      mean      mode        var         sd
1     prior     5   11 0.3125000 0.2857143 0.01263787 0.11241827
2 posterior    14   22 0.3888889 0.3823529 0.00642309 0.08014418

The Feminist

plot_beta_binomial(5, 11, y = 9, n = 20)
A single plot of three curves. Prior peaks around when pi is 0.3, likelihood peaks around when pi is 0.45 and posterior is between the prior and likelihood and is more peaked than both.

Comparison

The last three plots provided next to one another

Your Posterior

Utilize summarize_beta_binomial() and plot_beta_binomial() functions to examine your own posterior model.

Balancing Act of Bayesian Analysis

Three weighing scales, each holding the prior on one side and data on the other. The left scale is tipped toward the prior, the middle is balanced, and the right is tipped toward the data.

In Bayesian methodology, the prior model and the data both contribute to our posterior model.

Different Data, Different Posteriors

Morteza, Nadide, and Ursula – all share the optimistic Beta(14,1) prior for \(\pi\) but each have access to different data. Morteza reviews movies from 1991. Nadide reviews movies from 2000 and Ursula reviews movies from 2013. How will the posterior distribution for each differ?

Morteza’s analysis

bechdel_1991 <- filter(bechdel, year == 1991)
count(bechdel_1991, binary)
# A tibble: 2 × 2
  binary     n
  <chr>  <int>
1 FAIL       7
2 PASS       6
6/13
[1] 0.4615385
summarize_beta_binomial(14, 1, y = 6, n = 13)
      model alpha beta      mean      mode         var         sd
1     prior    14    1 0.9333333 1.0000000 0.003888889 0.06236096
2 posterior    20    8 0.7142857 0.7307692 0.007037298 0.08388860

Morteza’s analysis

plot_beta_binomial(14, 1, y = 6, n = 13)
The prior curve mostly has high values of pi as highly plausible with a peak at pi at 1. The likelihood has a wide range with a peak near 0.46. Posterior is in the middle with a peak around 0.72.

Nadide’s analysis

bechdel_2000 <- filter(bechdel, year == 2000)
count(bechdel_2000, binary)
# A tibble: 2 × 2
  binary     n
  <chr>  <int>
1 FAIL      34
2 PASS      29
29/(34+29)
[1] 0.4603175
summarize_beta_binomial(14, 1, y = 29, n = 63)
      model alpha beta      mean      mode         var         sd
1     prior    14    1 0.9333333 1.0000000 0.003888889 0.06236096
2 posterior    43   35 0.5512821 0.5526316 0.003131268 0.05595773

Nadide’s analysis

plot_beta_binomial(14, 1, y = 29, n = 63)
The prior curve mostly has high values of pi as highly plausible with a peak at pi at 1. The likelihood has a narrower range compared to Morteza's plot with a peak near 0.46. Posterior is in the middle but this time closer to the likelihood with a peak around 0.55.

Ursula’s analysis

bechdel_2013 <- filter(bechdel, year == 2013)
count(bechdel_2013, binary)
# A tibble: 2 × 2
  binary     n
  <chr>  <int>
1 FAIL      53
2 PASS      46
46/(53+46)
[1] 0.4646465
summarize_beta_binomial(14, 1, y = 46, n = 99)
      model alpha beta      mean      mode         var         sd
1     prior    14    1 0.9333333 1.0000000 0.003888889 0.06236096
2 posterior    60   54 0.5263158 0.5267857 0.002167891 0.04656062

Ursula’s analysis

plot_beta_binomial(14, 1, y = 46, n = 99)
The prior curve mostly has high values of pi as highly plausible with a peak at pi at 1. The likelihood has even a narrower range compared to Nadide's plot with a peak near 0.46. Posterior is in the middle but this time much closer to the likelihood with a peak around 0.52.

Summary

A summary plot with 3 by 3 totaling 9 plots. Columns represent three different data scenarios previously shown on these slides as Y=6 of n = 13, Y=29 of n=63, and Y=46 of n=99. Rows represent three different priors from top to bottom Beta(14,1), Beta(5,11), and Beta(1,1). When the number of observations increase for data then posterior is closer to the likelihood. When the prior is highly informative (extreme) then posterior is also more influenced by the prior compared to a less informative prior. In fact for a flat prior there is no impact of the prior.

priors: Beta(14,1), Beta(5,11), Beta(1,1)