View libraries and data used in these notes
library(tidyverse)
library(tidymodels)Dr. Alexander Fisher
The maximum likelihood estimator (MLE) satisfies
\[ \sqrt{n}\left(\hat\beta - \beta\right) \overset{d}{\longrightarrow} N\left(0,\, I(\beta)^{-1}\right), \]
where \(I(\beta)\) is the Fisher information matrix, per observation, defined as
\[ I(\beta) = -\,\mathbb{E}\left[\frac{\partial^2 \ell(\beta)}{\partial \beta \partial \beta^\top}\right]. \] Notice that the asymptotic variance of the MLE is given by the inverse of the Fisher information.
Recall the logistic regression model,
\[ \Pr(Y_i = 1 \mid x_i) = p_i = \frac{\exp(x_i^\top \beta)}{1 + \exp(x_i^\top \beta)}. \]
The log-likelihood for \(n\) conditionally independent Bernoulli observations is:
\[ \ell(\beta) = \sum_{i=1}^n \big( y_i \log p_i + (1 - y_i)\log(1 - p_i) \big). \]
From homework 4, you showed that the second derivative of \(\ell(\beta)\) unwinds:
\[ \begin{aligned} H&= - \boldsymbol{X}^\top D \boldsymbol{X}, \end{aligned} \]
which implies
\[ I(\beta) = \boldsymbol{X}^T D \boldsymbol{X}. \] Here,
Therefore, under the logistic regression model,
\[ \operatorname{Var}(\hat\beta) \approx (\boldsymbol{X}^\top D \boldsymbol{X})^{-1}. \]
and examine the output.
\[
\hat{\beta}_j \pm z^* \times SE(\hat{\beta}_j)
\] where \(z^*\) is calculated from the \(N(0, 1)\) distribution (e.g. for 95% CI, \(z^*\) can be found from qnorm(0.975)).
Call:
glm(formula = am ~ mpg + wt, family = binomial, data = mtcars)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 25.8866 12.1935 2.123 0.0338 *
mpg -0.3242 0.2395 -1.354 0.1759
wt -6.4162 2.5466 -2.519 0.0118 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 17.184 on 29 degrees of freedom
AIC: 23.184
Number of Fisher Scoring iterations: 7
int_width = qnorm(0.975) * 0.2395
lb = -0.3242 - int_width
ub = -0.3242 + int_width
cat("95% CI: ", lb, ub)95% CI: -0.7936114 0.1452114
We are 95% confident that for every unit increase in mpg, the odds that the car has a manual transmission multiply by a factor between \(e^{-0.79}\) \(e^{0.14}\).
The standard error information is in the general summary() output. But if we want to validate the formula to compute the standard error, we could do so manually below.
We can grab the \(Var[\hat{\beta}_{MLE}]\) with vcov(fit) or compute it manually:
X = model.matrix(fit)
p = fit$fitted.values
D = diag(p * (1-p))
var_mle = solve(t(X) %*% D %*% X)
var_mle (Intercept) mpg wt
(Intercept) 148.682172 -2.73887794 -30.3420686
mpg -2.738878 0.05735976 0.5171025
wt -30.342069 0.51710248 6.4852006
and take the square root to get the standard error: