Statistical Rethinking - Chapter 04 (Extra Material)

Small Worlds in Large Worlds

Introduction

These are answers and solutions to additional exercises from previous versions of the end of chapter 4 in Satistical Rethinking 2 by Richard McElreath. I have created these notes as a part of my ongoing involvement in the AU Bayes Study Group. Much of my inspiration for these solutions, where necessary, has been obtained from Gregor Mathes.

Practice 1

Question: Refit model m4.3 from the chapter but omit the mean weight xbar. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is difference?

Answer: Let’s firstly refit the model m4.3 using the code on pages 100 & 101:

library(rethinking)
data(Howell1)
d <- Howell1
d2 <- d[d$age >= 18, ]
# define the average weight, x-bar
xbar <- mean(d2$weight)
# fit original model
m4.3 <- quap(alist(
  height ~ dnorm(mu, sigma),
  mu <- a + b * (weight - xbar),
  a ~ dnorm(178, 20),
  b ~ dlnorm(0, 1),
  sigma ~ dunif(0, 50)
), data = d2)
# fit reduced model
m4.3_reduced <- quap(alist(
  height ~ dnorm(mu, sigma),
  mu <- a + b * weight,
  a ~ dnorm(178, 20),
  b ~ dlnorm(0, 1),
  sigma ~ dunif(0, 50)
), data = d2)

How do we compare these models and their posteriors? Here, I want to look at three things:

  1. Covariances between parameters estimates
round(vcov(m4.3), digits = 3)
##           a     b sigma
## a     0.073 0.000 0.000
## b     0.000 0.002 0.000
## sigma 0.000 0.000 0.037
round(vcov(m4.3_reduced), digits = 3)
##            a      b sigma
## a      3.602 -0.078 0.009
## b     -0.078  0.002 0.000
## sigma  0.009  0.000 0.037

As we can see, the covariances increase quite a bit when omitting xbar and this not centring.

  1. Summaries of each parameter in the posterior
summary(extract.samples(m4.3))
##        a               b              sigma      
##  Min.   :153.5   Min.   :0.7481   Min.   :4.322  
##  1st Qu.:154.4   1st Qu.:0.8756   1st Qu.:4.945  
##  Median :154.6   Median :0.9036   Median :5.076  
##  Mean   :154.6   Mean   :0.9038   Mean   :5.073  
##  3rd Qu.:154.8   3rd Qu.:0.9323   3rd Qu.:5.204  
##  Max.   :155.6   Max.   :1.0647   Max.   :5.754
summary(extract.samples(m4.3_reduced))
##        a               b              sigma      
##  Min.   :106.5   Min.   :0.7154   Min.   :4.379  
##  1st Qu.:113.3   1st Qu.:0.8620   1st Qu.:4.943  
##  Median :114.6   Median :0.8904   Median :5.073  
##  Mean   :114.5   Mean   :0.8907   Mean   :5.073  
##  3rd Qu.:115.8   3rd Qu.:0.9187   3rd Qu.:5.204  
##  Max.   :122.4   Max.   :1.0714   Max.   :5.911

Between the two models, neither \(\beta\) (b) nor \(\sigma\) (sigma) differ greatly. However, our posterior estimate of \(\alpha\) (a) is quite a bit lower in the reduced model than it is in the original model. This is down to the interpretation of the \(\alpha\) parameter itself. In the original model, \(\alpha\) denotes the average height of a person at the mean weight in the data set. Since we removed the xbar component in the reduced model, \(\alpha\) now identifies the average height of a person of weight \(0kg\) - a nonsense metric.

  1. Predictions and Intervals
    Here, I have written a function that takes a model object, data, and some additional arguments to automate plot generation:
plot.predictions <- function(X, Y, data, model, main) {
  XOrig <- X
  X <- data[, colnames(data) == X]
  Y <- data[, colnames(data) == Y]
  plot(Y ~ X, col = col.alpha(rangi2, 0.8), main = main)
  # Estimate and plot the quap regression line and 97% HPDI for the mean
  weight.seq <- seq(from = min(X), to = max(X), length.out = 1000)
  predict_df <- data.frame(XOrig = weight.seq)
  colnames(predict_df) <- XOrig
  mu <- link(model, data = predict_df)
  mu.mean <- apply(mu, 2, mean)
  mu.HPDI <- apply(mu, 2, HPDI, prob = 0.97)
  lines(weight.seq, mu.mean)
  shade(mu.HPDI, weight.seq)
  # Estimate and plot the 97% HPDI for the predicted heights
  predict_ls <- list(weight = weight.seq)
  names(predict_ls) <- XOrig
  sim.height <- sim(model, data = predict_ls)
  height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.97)
  shade(height.HPDI, weight.seq)
}

plot.predictions(X = "weight", Y = "height", data = d2, model = m4.3, main = "Original Model")

plot.predictions(X = "weight", Y = "height", data = d2, model = m4.3_reduced, main = "Reduced Model")

So. Does centring or not change the predictions of our model? No, it does not. At least in this case.

Practice 2

Question: In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights - change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

Answer: Again, I start with code from the book - pages 118, 120 & 122 to be precise - and implement it into a function for easy changing of model specifications:

library(rethinking)
library(splines)
data(cherry_blossoms)
d <- cherry_blossoms
d2 <- d[complete.cases(d$temp), ] # complete cases on temp

cherry_spline <- function(n_Knots, StdV) {
  # knot list
  knot_list <- quantile(d2$year, probs = seq(0, 1, length.out = n_Knots))[-c(1, n_Knots)]
  # basis function
  B <- bs(d2$year,
    knots = knot_list,
    degree = 3, intercept = TRUE
  )
  # Run quap model
  m4.7 <- quap(alist(
    T ~ dnorm(mu, sigma),
    mu <- a + B %*% w,
    a ~ dnorm(6, 10),
    w ~ dnorm(0, StdV),
    sigma ~ dexp(1)
  ),
  data = list(T = d2$temp, B = B, StdV = StdV),
  start = list(w = rep(0, ncol(B)))
  )
  # get 97% posterior interval for mean and plot
  mu <- link(m4.7)
  mu_PI <- apply(mu, 2, PI, 0.97)
  plot(d2$year, d2$temp,
    col = col.alpha(rangi2, 0.3), pch = 16,
    main = paste("Knots:", n_Knots, "-", "Prior Weight:", StdV)
  )
  shade(mu_PI, d2$year, col = col.alpha("black", 0.5))
}

Let’s start by increasing the number of knots:

cherry_spline(n_Knots = 15, StdV = 1)

cherry_spline(n_Knots = 20, StdV = 1)

cherry_spline(n_Knots = 30, StdV = 1)

The more knots we use, the more flexible the resulting function becomes. It fits the data better, but might overfit if we try to do predictions.

Now, we change the prior weights:

cherry_spline(n_Knots = 15, StdV = 1) # base standard deviation

cherry_spline(n_Knots = 15, StdV = .1) # decreased standard deviation

cherry_spline(n_Knots = 15, StdV = 100) # increased standard deviation

As I decrease the standard deviation for the prior or the weights, I see that the resulting function becomes less flexible. I expected our function to become less flexible as I lower the StdV parameter since a lower standard deviation here will increase the weights and thus give each base function more of say in determining the overall function globally, making the result smoother.

Practice 3

Question: Return to data(cherry_blossoms) and model the association between blossom date (doy) and March temperature (temp). Note that there are many missing values in both variables. You may consider a linear model, a polynomial, or a spline on temperature. How well does temperature rend predict the blossom trend?

Answer:

library(rethinking)
library(splines)
data(cherry_blossoms)
d <- cherry_blossoms[, 2:3]
d2 <- na.omit(d)
d2$temps <- scale(d2$temp)
with(d2, plot(temps, doy,
  xlab = "Centred Temperature in March", ylab = "Day in Year"
))

There is a seemingly negative relationship here, but there is also a lot of noise. I expect polynomial or spline approaches to capture too much of that noise and opt for a simple linear regression instead:

# define average temp
xbar <- mean(d2$temp)

# fit modell
cherry_linear <- quap(
  alist(
    doy ~ dnorm(mu, sigma),
    mu <- a + b * (temp - xbar),
    a ~ dnorm(115, 30),
    b ~ dnorm(-2, 5),
    sigma ~ dunif(0, 50)
  ),
  data = d2
)

# output
precis(cherry_linear)
##             mean        sd       5.5%     94.5%
## a     104.921730 0.2106744 104.585031 105.25843
## b      -2.990424 0.3078875  -3.482488  -2.49836
## sigma   5.910304 0.1489844   5.672198   6.14841

With average temperatures in March, cherries blossom on day 105 of the year. With every increase of 1°C in temperature in March, cherries blossom - on average - 3 earlier. Our PI shows that we are pretty certain of this relationship. Let’s plot this to finish:

plot.predictions(X = "temp", Y = "doy", data = d2, model = cherry_linear, main = "Cherry Blossoms")

Off, that’s quite some uncertainty there. I guess we aren’t doing a tremendous job at predicting cherry blossom dates depending on temperature in March with this model.

Practice 4

Question: Simulate the prior predictive distribution for the cherry blossom spline in the chapter. Adjust the prior on the weights and observe what happens. What do you think the prior on the weight is doing?

Answer:

I haven’t solved this myself (yet). In the meantime, you can consult the answer provided by Gregor Mathes.

Practice 5

Question: The cherry blossom spline in the chapter used an intercept a, but technically it doesn’t require one. The first basis function could substitute for the intercept. Try refitting the cherry blossom spline without the intercept. What else about the model do you need to change to make this work?

Answer:

library(rethinking)
library(splines)
data(cherry_blossoms)
d <- cherry_blossoms
d2 <- d[complete.cases(d$temp), ] # complete cases on temp

n_Knots <- 15
# knot list
knot_list <- quantile(d2$year, probs = seq(0, 1, length.out = n_Knots))[-c(1, n_Knots)]
# basis function
B <- bs(d2$year,
  knots = knot_list,
  degree = 3, intercept = FALSE
)
# Run quap model
m4.7 <- quap(alist(
  T ~ dnorm(mu, sigma),
  mu <- B %*% w,
  a ~ dnorm(6, 10),
  w ~ dnorm(0, 1),
  sigma ~ dexp(1)
),
data = list(T = d2$temp, B = B),
start = list(w = rep(0, ncol(B)))
)
# get 97% posterior interval for mean and plot
mu <- link(m4.7)
mu_PI <- apply(mu, 2, PI, 0.97)
plot(d2$year, d2$temp,
  col = col.alpha(rangi2, 0.3), pch = 16,
  main = "No Intercept"
)
shade(mu_PI, d2$year, col = col.alpha("black", 0.5))

We need to change the deterministic formula in the model as well as the creation of basis functions by setting Intercept = FALSE in the bs() function call.

Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rethinking_2.13      rstan_2.21.2         ggplot2_3.3.2        StanHeaders_2.21.0-6
## 
## loaded via a namespace (and not attached):
##  [1] styler_1.3.2       shape_1.4.5        tidyselect_1.1.0   xfun_0.20          rematch2_2.1.2     purrr_0.3.4        lattice_0.20-41    V8_3.2.0           colorspace_1.4-1   vctrs_0.3.4       
## [11] generics_0.0.2     htmltools_0.5.0    stats4_4.0.2       loo_2.4.1          yaml_2.2.1         rlang_0.4.10       pkgbuild_1.1.0     R.oo_1.24.0        pillar_1.4.6       glue_1.4.2        
## [21] withr_2.3.0        R.utils_2.10.1     matrixStats_0.56.0 R.cache_0.14.0     lifecycle_0.2.0    stringr_1.4.0      munsell_0.5.0      blogdown_1.0.2     gtable_0.3.0       R.methodsS3_1.8.1 
## [31] mvtnorm_1.1-1      coda_0.19-4        codetools_0.2-16   evaluate_0.14      inline_0.3.16      knitr_1.30         callr_3.4.4        ps_1.3.4           curl_4.3           fansi_0.4.1       
## [41] Rcpp_1.0.5         scales_1.1.1       backports_1.1.10   RcppParallel_5.0.2 jsonlite_1.7.2     gridExtra_2.3      digest_0.6.27      stringi_1.5.3      bookdown_0.21      processx_3.4.4    
## [51] dplyr_1.0.2        grid_4.0.2         cli_2.0.2          tools_4.0.2        magrittr_2.0.1     tibble_3.0.3       crayon_1.3.4       pkgconfig_2.0.3    MASS_7.3-51.6      ellipsis_0.3.1    
## [61] prettyunits_1.1.1  assertthat_0.2.1   rmarkdown_2.6      rstudioapi_0.11    R6_2.5.0           compiler_4.0.2
Erik Kusch
Erik Kusch
PhD Student

In my research, I focus on statistical approaches to understanding complex processes and patterns in biology using a variety of data banks.

Related