A What is dot hat in a regression output
https://stats.stackexchange.com/a/256364/154908
Q. The augment() function in the broom package for R creates a dataframe of predicted values from a regression model. Columns created include the fitted values, the standard error of the fit and Cook’s distance. They also include something with which I’m not familar and that is the column .hat.
library(broom)
data(mtcars)
m1 <- lm(mpg ~ wt, data = mtcars)
head(augment(m1))
#> # A tibble: 6 x 10
#> .rownames mpg wt .fitted .se.fit .resid .hat .sigma .cooksd .std.resid
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4 21 2.62 23.3 0.634 -2.28 0.0433 3.07 1.33e-2 -0.766
#> 2 Mazda RX4… 21 2.88 21.9 0.571 -0.920 0.0352 3.09 1.72e-3 -0.307
#> 3 Datsun 710 22.8 2.32 24.9 0.736 -2.09 0.0584 3.07 1.54e-2 -0.706
#> 4 Hornet 4 … 21.4 3.22 20.1 0.538 1.30 0.0313 3.09 3.02e-3 0.433
#> 5 Hornet Sp… 18.7 3.44 18.9 0.553 -0.200 0.0329 3.10 7.60e-5 -0.0668
#> 6 Valiant 18.1 3.46 18.8 0.555 -0.693 0.0332 3.10 9.21e-4 -0.231
# .hat vector
augment(m1)$.hat
#> [1] 0.0433 0.0352 0.0584 0.0313 0.0329 0.0332 0.0354 0.0313 0.0314 0.0329
#> [11] 0.0329 0.0558 0.0401 0.0419 0.1705 0.1953 0.1838 0.0661 0.1177 0.0956
#> [21] 0.0503 0.0343 0.0328 0.0443 0.0445 0.0866 0.0704 0.1291 0.0313 0.0380
#> [31] 0.0354 0.0377
Can anyone explain what this value is, and is it different between linear regression and logistic regression?
A. Those would be the diagonal elements of the hat-matrix which describe the leverage each point has on its fitted values.
If one fits:
\[\vec{Y} = \mathbf{X} \vec {\beta} + \vec {\epsilon}\]
then:
\[\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\] In this example:
\[ \begin{pmatrix}Y_1\\ \vdots\\ Y_{32}\end{pmatrix} = \begin{pmatrix} 1 & 2.620\\ \vdots\\ 1 & 2.780 \end{pmatrix} \cdot \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix} + \begin{pmatrix}\epsilon_1\\ \vdots\\ \epsilon_{32}\end{pmatrix} \]
Then calculating this \(\mathbf{H}\) matrix results in:
library(MASS)
wt <- mtcars[, 6]
X <- matrix(cbind(rep(1, length(wt)), wt), ncol=2)
H <- X %*% ginv(t(X) %*% X) %*% t(X)
Where this last matrix is a \(32 \times 32\) matrix and contains these hat values on the diagonal.
X 32x2
t(X) 2x32
X %*% t(X) 32x32
t(X) %*% X 2x2
ginv(t(X) %*% X) 2x2
ginv(t(X) %*% X) %*% t(X) 2x32
X %*% ginv(t(X) %*% X) 32x2
# this last matrix is a 32×32 matrix and contains these hat values on the diagonal.
diag(H)
#> [1] 0.0433 0.0352 0.0584 0.0313 0.0329 0.0332 0.0354 0.0313 0.0314 0.0329
#> [11] 0.0329 0.0558 0.0401 0.0419 0.1705 0.1953 0.1838 0.0661 0.1177 0.0956
#> [21] 0.0503 0.0343 0.0328 0.0443 0.0445 0.0866 0.0704 0.1291 0.0313 0.0380
#> [31] 0.0354 0.0377