library(tidyverse)library(tidymodels)library(pROC) # make ROC curveslibrary(knitr)library(kableExtra)## data heart_disease <-read_csv("https://sta221-fa25.github.io/data/framingham.csv") |>select(age, education, TenYearCHD, totChol, currentSmoker) |>drop_na() |>mutate(high_risk =as_factor(TenYearCHD),education =as_factor(education),currentSmoker =as_factor(currentSmoker) )# set default theme in ggplot2ggplot2::theme_set(ggplot2::theme_bw())
Topics
Comparing models using AIC and BIC
Test of significance for a subset of predictors
Risk of coronary heart disease
This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.
high_risk:
1: High risk of having heart disease in next 10 years
0: Not high risk of having heart disease in next 10 years
age: Age at exam time (in years)
totChol: Total cholesterol (in mg/dL)
currentSmoker: 0 = nonsmoker, 1 = smoker
education: 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College
\[
\begin{aligned} & AIC = -2\log L + \color{blue}{2(p)} \\
& BIC = -2\log L + \color{blue}{\log(n)\times (p)}
\end{aligned}
\]
Second term: Increases as p increases
Using AIC & BIC
\[
\begin{aligned} & AIC = -2\log L + \color{red}{2(p)} \\
& BIC = -2 \log L + \color{red}{\log(n)\times(p)}
\end{aligned}
\]
Choose model with the smaller value of AIC or BIC
If \(n \geq 8\), the penalty for BIC is larger than that of AIC, so BIC tends to favor more parsimonious models (i.e. models with fewer terms)
AIC from the glance() function
Let’s look at the AIC for the model that includes age, totChol, and currentSmoker
glance(high_risk_fit)$AIC
[1] 3232.812
Calculating AIC
-2*glance(high_risk_fit)$logLik +2* (3+1)
[1] 3232.812
Comparing the models using AIC
Let’s compare the full and reduced models using AIC.
glance(high_risk_fit_reduced)$AIC
[1] 3232.812
glance(high_risk_fit_full)$AIC
[1] 3231.6
Based on AIC, which model would you choose?
Comparing the models using BIC
Let’s compare the full and reduced models using BIC
glance(high_risk_fit_reduced)$BIC
[1] 3258.074
glance(high_risk_fit_full)$BIC
[1] 3275.807
Based on BIC, which model would you choose?
Drop-in-deviance test
Drop-in-deviance test
We will use a drop-in-deviance test (aka Likelihood Ratio Test) to test
the overall statistical significance of a logistic regression model
the statistical significance of a subset of coefficients in the model
Deviance
The deviance is a measure of the degree to which the predicted values are different from the observed values (compares the current model to a “saturated” model)
In logistic regression,
\[
D = -2 \log L
\]
\(D \sim \chi^2_{n - p}\) ( \(D\) follows a Chi-square distribution with \(n - p\) degrees of freedom)3
Note: \(n - p\) a the degrees of freedom associated with the error in the model (like residuals)
\(\chi^2\) distribution
Test for overall significance
We can test the overall significance for a logistic regression model, i.e., whether there is at least one predictor with a non-zero coefficient
\[
\begin{aligned}
&H_0: \beta_1 = \dots = \beta_p = 0 \\
&H_a: \beta_j \neq 0 \text{ for at least one } j
\end{aligned}
\]
The drop-in-deviance test for overall significance compares the fit of a model with no predictors to the current model.
Drop-in-deviance test statistic
Let \(L_0\) and \(L_a\) be the likelihood functions of the model under \(H_0\) and \(H_a\), respectively. The test statistic is
where \(\hat{\pi}^0\) is the predicted probability under \(H_0\) and \(\hat{\pi}_i^a = \frac{\exp \{x_i^\mathsf{T}\boldsymbol{\beta}\}}{1 + \exp \{x_i^\mathsf{T}\boldsymbol{\beta}\}}\) is the predicted probability under \(H_a\)4
When \(n\) is large, \(G \sim \chi^2_p\), ( \(G\) follows a Chi-square distribution with \(p\) degrees of freedom)
The p-value is calculated as \(P(\chi^2 > G)\)
Large values of \(G\) (small p-values) indicate at least one \(\beta_j\) is non-zero
Heart disease model: drop-in-deviance test
\[
\begin{aligned}
&H_0: \beta_{age} = \beta_{totChol} = \beta_{currentSmoker} = 0 \\
&H_a: \beta_j \neq 0 \text{ for at least one }j
\end{aligned}
\]
Fit the null model (we’ve already fit the alternative model)
null_model <-glm(high_risk ~1, data = heart_disease, family ="binomial")
term
estimate
std.error
statistic
p.value
(Intercept)
-1.72294
0.0436342
-39.486
0
Heart disease model: drop-in-deviance test
Calculate the log-likelihood for the null and alternative models
(L_0 <-glance(null_model)$logLik)
[1] -1737.735
(L_a <-glance(high_risk_fit)$logLik)
[1] -1612.406
Calculate the likelihood ratio test statistic
(G <--2* (L_0 - L_a))
[1] 250.6572
Heart disease model: likelihood ratio test
Calculate the p-value
(p_value <-pchisq(G, df =3, lower.tail =FALSE))
[1] 4.717158e-54
Conclusion
The p-value is small, so we reject \(H_0\). The data provide evidence that at least one predictor in the model has a non-zero coefficient.
Why use overall test?
Why do we use a test for overall significance instead of just looking at the test for individual coefficients?5
Suppose we have a model such that \(p = 100\) and \(H_0: \beta_1 = \dots = \beta_{100} = 0\) is true
About 5% of the p-values for individual coefficients will be below 0.05 by chance.
So we expect to see 5 small p-values if even no linear association actually exists.
Therefore, it is very likely we will see at least one small p-value by chance.
The overall test of significance does not have this problem. There is only a 5% chance we will get a p-value below 0.05, if a relationship truly does not exist.
Test a subset of coefficients
Suppose there are two models:
Reduced Model: includes predictors \(x_1, \ldots, x_q\)
Full Model: includes predictors \(x_1, \ldots, x_q, x_{q+1}, \ldots, x_p\)
We can use a drop-in-deviance test to determine if any of the new predictors are useful
\[
\begin{aligned}
&H_0: \beta_{q+1} = \dots = \beta_p = 0\\
&H_a: \beta_j \neq 0 \text{ for at least one }j
\end{aligned}
\]
The p-value is calculated using a \(\chi_{\Delta df}^2\) distribution, where \(\Delta df\) is the number of parameters being tested (the difference in number of parameters between the full and reduced model).
Example: Include education?
Should we include education in the model?
Reduced model: age, totChol, currentSmoker
Full model: age, totChol, currentSmoker , education
\[
\begin{aligned}
&H_0: \beta_{ed2} = \beta_{ed3} = \beta_{ed4} = 0 \\
&H_a: \beta_j \neq 0 \text{ for at least one }j
\end{aligned}
\]
reduced_model <-glm(high_risk ~ age + totChol + currentSmoker, data = heart_disease, family ="binomial")full_model <-glm(high_risk ~ age + totChol + currentSmoker + education, data = heart_disease, family ="binomial")