Julian Faraway 2024-10-14
See the introduction for an overview.
This example is discussed in more detail in the book Bayesian Regression Modeling with INLA
Packages used:
library(ggplot2)
library(lme4)
library(INLA)
library(knitr)
library(brms)
library(mgcv)
library(here)
In Fitzmaurice and Laird, 1993, data on 537 children aged 7–10 in six Ohio cities are reported. The response is binary — does the child suffer from wheezing (indication of a pulmonary problem) where one indicates yes and zero no. This status is reported for each of four years at ages 7, 8, 9 and 10. There is also an indicator variable for whether the mother of the child is a smoker. Because we have four binary responses for each child, we expect these to be correlated and our model needs to reflect this.
Read in and examine the first two subjects worth of data:
ohio = read.csv(here("data","ohio.csv"),header = TRUE)
kable(ohio[1:8,])
resp | id | age | smoke |
---|---|---|---|
0 | 0 | -2 | 0 |
0 | 0 | -1 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 |
0 | 1 | -2 | 0 |
0 | 1 | -1 | 0 |
0 | 1 | 0 | 0 |
0 | 1 | 1 | 0 |
We sum the number of smoking and non-smoking mothers:
table(ohio$smoke)/4
0 1
350 187
We use this to produce the proportion of wheezing children classified by age and maternal smoking status:
xtabs(resp ~ smoke + age, ohio)/c(350,187)
age
smoke -2 -1 0 1
0 0.16000 0.14857 0.14286 0.10571
1 0.16578 0.20856 0.18717 0.13904
Age has been adjusted so that nine years old is zero. We see that wheezing appears to decline with age and that there may be more wheezing in children with mothers who smoke. But the effects are not clear and we need modeling to be sure about these conclusions.
A plausible model uses a logit link with a linear predictor of the form:
$$
\eta_{ij} = \beta_0 + \beta_1 age_j + \beta_2 smoke_i + u_i, \quad i=1, \dots ,537, \quad j=1,2,3,4,
$$ with $$
P(Y_{ij} = 1) = {\exp(\eta_{ij}) \over 1+\exp(\eta_{ij})}.
$$ The random effect
Here is the model fit penalized quasi-likelihood using the lme4
package:
modagh <- glmer(resp ~ age + smoke + (1|id), nAGQ=25,
family=binomial, data=ohio)
summary(modagh, correlation = FALSE)
Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 25) ['glmerMod']
Family: binomial ( logit )
Formula: resp ~ age + smoke + (1 | id)
Data: ohio
AIC BIC logLik deviance df.resid
1603.3 1626.0 -797.6 1595.3 2144
Scaled residuals:
Min 1Q Median 3Q Max
-1.373 -0.201 -0.177 -0.149 2.508
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 4.69 2.16
Number of obs: 2148, groups: id, 537
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.1015 0.2190 -14.16 <2e-16
age -0.1756 0.0677 -2.60 0.0095
smoke 0.3986 0.2731 1.46 0.1444
We see that there is no significant effect due to maternal smoking.
Suppose you do not take into account the correlated response within the individuals and fit a GLM ignoring the ID random effect:
modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio)
faraway::sumary(modglm)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.8837 0.0838 -22.5 <2e-16
age -0.1134 0.0541 -2.1 0.036
smoke 0.2721 0.1235 2.2 0.028
n = 2148 p = 3
Deviance = 1819.889 Null Deviance = 1829.089 (Difference = 9.199)
We see that the effect of maternal smoking is significant (but this would be the incorrect conclusion).
Integrated nested Laplace approximation is a method of Bayesian computation which uses approximation rather than simulation. More can be found on this topic in Bayesian Regression Modeling with INLA and the chapter on GLMMs
We can fit this model in INLA as:
formula <- resp ~ age + smoke + f(id, model="iid")
imod <- inla(formula, family="binomial", data=ohio)
The id
variable represents the child and we use an iid
model
indicating that the
imod$summary.fixed |> kable()
mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
---|---|---|---|---|---|---|---|
(Intercept) | -2.94704 | 0.20002 | -3.36027 | -2.93971 | -2.57571 | -2.93953 | 0 |
age | -0.17294 | 0.06297 | -0.29673 | -0.17283 | -0.04976 | -0.17283 | 0 |
smoke | 0.38511 | 0.23960 | -0.08280 | 0.38418 | 0.85835 | 0.38419 | 0 |
The posterior means are similar to the PQL estimates. We can get plots of the posteriors of the fixed effects:
fnames = names(imod$marginals.fixed)
par(mfrow=c(1,2))
for(i in 2:3){
plot(imod$marginals.fixed[[i]],
type="l",
ylab="density",
xlab=fnames[i])
abline(v=0)
}
par(mfrow=c(1,1))
We can also see the summary for the random effect SD:
hpd = inla.tmarginal(function(x) 1/sqrt(x), imod$marginals.hyperpar[[1]])
inla.zmarginal(hpd)
Mean 1.92585
Stdev 0.161199
Quantile 0.025 1.62603
Quantile 0.25 1.81363
Quantile 0.5 1.91964
Quantile 0.75 2.03111
Quantile 0.975 2.25927
Again the result is similar to the PQL output although notice that INLA provides some assessment of uncertainty in this value in contrast to the PQL result. We can also see the posterior density:
plot(hpd,type="l",xlab="linear predictor",ylab="density")
BRMS stands for Bayesian Regression Models with STAN. It provides a convenient wrapper to STAN functionality.
Fitting the model is very similar to lmer
as seen above. There is a
bernoulli
option for the family
which is appropriate for a 0-1
response.
bmod <- brm(resp ~ age + smoke + (1|id), family=bernoulli(), data=ohio, cores = 4)
We can check the MCMC diagnostics and the posterior densities with:
plot(bmod)
Looks quite similar to the INLA results.
We can look at the STAN code that brms
used with:
stancode(bmod)
// generated with brms 2.21.0
functions {
}
data {
int<lower=1> N; // total number of observations
array[N] int Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
array[M_1] vector[N_1] z_1; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
target += bernoulli_logit_glm_lpmf(Y | Xc, mu, b);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
We can see that some half-t distributions are used as priors for the hyperparameters.
We examine the fit:
summary(bmod)
Family: bernoulli
Links: mu = logit
Formula: resp ~ age + smoke + (1 | id)
Data: ohio (Number of observations: 2148)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~id (Number of levels: 537)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.19 0.19 1.85 2.58 1.00 992 1613
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.11 0.22 -3.55 -2.71 1.00 1473 2209
age -0.18 0.07 -0.31 -0.05 1.00 5081 3181
smoke 0.40 0.28 -0.13 0.93 1.00 1346 2408
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results are consistent with previous results.
It is possible to fit some GLMMs within the GAM framework of the mgcv
package. An explanation of this can be found in this
blog
We need to make a factor version of id otherwise it gets treated as a numerical variable.
ohio$fid = factor(ohio$id)
gmod = gam(resp ~ age + smoke + s(fid,bs="re"),
family=binomial, data=ohio, method="REML")
and look at the summary output:
summary(gmod)
Family: binomial
Link function: logit
Formula:
resp ~ age + smoke + s(fid, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3690 0.1508 -15.71 <2e-16
age -0.1523 0.0627 -2.43 0.015
smoke 0.2956 0.2405 1.23 0.219
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(fid) 282 535 548 <2e-16
R-sq.(adj) = 0.393 Deviance explained = 46.1%
-REML = 814.01 Scale est. = 1 n = 2148
We get the fixed effect estimates. We also get a test on the random effect (as described in this article). The hypothesis of no variation between the ids is rejected.
We can get an estimate of the id SD:
gam.vcomp(gmod)
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(fid) 1.9486 1.6539 2.2957
Rank: 1/1
which is the same as the REML estimate from lmer
earlier.
The random effect estimates for the fields can be found with:
head(coef(gmod))
(Intercept) age smoke s(fid).1 s(fid).2 s(fid).3
-2.36900 -0.15226 0.29557 -0.72026 -0.72026 -0.72026
In Wood (2019), a simplified
version of INLA is proposed. The first construct the GAM model without
fitting and then use the ginla()
function to perform the computation.
gmod = gam(resp ~ age + smoke + s(fid,bs="re"),
family=binomial, data=ohio, fit = FALSE)
gimod = ginla(gmod)
We get the posterior densities for the fixed effects as:
par(mfrow=c(1,2))
for(i in 2:3){
plot(gimod$beta[i,],gimod$density[i,],type="l",
xlab=gmod$term.names[i],ylab="density")
}
par(mfrow=c(1,1))
It is not straightforward to obtain the posterior densities of the hyperparameters.
-
No strong differences in the results between the different methods. In all cases, we do not find strong evidence of an effect for maternal smoking.
-
LME4 was very fast. INLA was fast. BRMS, MGCV and GINLA were slower. We have a large number of subject random effects which slows down the
mgcv
approach considerably.
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.7
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] foreach_1.5.2 here_1.0.1 mgcv_1.9-1 nlme_3.1-166 brms_2.21.0 Rcpp_1.0.13 knitr_1.48 INLA_24.06.27
[9] sp_2.1-4 lme4_1.1-35.5 Matrix_1.7-0 ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 farver_2.1.2 dplyr_1.1.4 loo_2.8.0 fastmap_1.2.0
[6] tensorA_0.36.2.1 digest_0.6.37 estimability_1.5.1 lifecycle_1.0.4 Deriv_4.1.3
[11] sf_1.0-16 StanHeaders_2.32.10 processx_3.8.4 magrittr_2.0.3 posterior_1.6.0
[16] compiler_4.4.1 rlang_1.1.4 tools_4.4.1 utf8_1.2.4 yaml_2.3.10
[21] labeling_0.4.3 bridgesampling_1.1-2 pkgbuild_1.4.4 classInt_0.4-10 plyr_1.8.9
[26] abind_1.4-5 KernSmooth_2.23-24 withr_3.0.1 grid_4.4.1 stats4_4.4.1
[31] fansi_1.0.6 xtable_1.8-4 e1071_1.7-14 colorspace_2.1-1 inline_0.3.19
[36] iterators_1.0.14 emmeans_1.10.4 scales_1.3.0 MASS_7.3-61 cli_3.6.3
[41] mvtnorm_1.2-6 rmarkdown_2.28 generics_0.1.3 RcppParallel_5.1.9 rstudioapi_0.16.0
[46] reshape2_1.4.4 minqa_1.2.8 DBI_1.2.3 proxy_0.4-27 rstan_2.32.6
[51] stringr_1.5.1 splines_4.4.1 bayesplot_1.11.1 parallel_4.4.1 matrixStats_1.3.0
[56] vctrs_0.6.5 boot_1.3-31 jsonlite_1.8.8 callr_3.7.6 systemfonts_1.1.0
[61] units_0.8-5 faraway_1.0.8 glue_1.8.0 nloptr_2.1.1 ps_1.7.7
[66] codetools_0.2-20 distributional_0.4.0 stringi_1.8.4 gtable_0.3.5 QuickJSR_1.3.1
[71] munsell_0.5.1 tibble_3.2.1 pillar_1.9.0 htmltools_0.5.8.1 Brobdingnag_1.2-9
[76] R6_2.5.1 fmesher_0.1.7 rprojroot_2.0.4 evaluate_0.24.0 lattice_0.22-6
[81] backports_1.5.0 MatrixModels_0.5-3 rstantools_2.4.0 class_7.3-22 svglite_2.1.3
[86] coda_0.19-4.1 gridExtra_2.3 checkmate_2.3.2 xfun_0.47 pkgconfig_2.0.3