Day 40

Math 216: Statistical Thinking

Bastola

Blood Alcohol Content (BAC)

X ID_OSU Gender Weight Beers BAC
1 1 female 132 5 0.100
2 2 female 128 2 0.030
3 3 female 110 9 0.190
4 4 male 192 8 0.120
5 5 male 172 3 0.040
6 6 female 250 7 0.095
7 7 female 125 3 0.070
8 8 male 175 5 0.060
9 9 female 175 3 0.020
10 10 male 275 5 0.050
11 11 female 130 4 0.070
12 12 male 168 6 0.100
13 13 female 128 5 0.085
14 14 male 246 7 0.090
15 15 male 164 1 0.010
16 16 male 175 4 0.050


Simple Linear Regression: Linear Regression of BAC (y) on Beers (x)

\[ \begin{align*} \widehat{BAC} &=-0.0127+0.0180(\text{Beers})\\ \hat{\sigma} &= 0.02044 \end{align*} \]


Call:
lm(formula = BAC ~ Beers, data = bac)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.027118 -0.017350  0.001773  0.008623  0.041027 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.012701   0.012638  -1.005    0.332    
Beers        0.017964   0.002402   7.480 2.97e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02044 on 14 degrees of freedom
Multiple R-squared:  0.7998,    Adjusted R-squared:  0.7855 
F-statistic: 55.94 on 1 and 14 DF,  p-value: 2.969e-06

Confidence Interval

\[ C\% \text{ confidence interval for } \beta_i \text{ is } \hat{\beta}_i \pm t^* \operatorname{SE}(\hat{\beta}_i) \]

  • Get CIs for slope/intercept with confint command or compute using qt(.975, df= ) to get t* for 95% CI
confint(bac.lm)  # 95% C.I.
                  2.5 %     97.5 %
(Intercept) -0.03980535 0.01440414
Beers        0.01281262 0.02311490

Inference for slope (effect of Beers on BAC)

\[ \begin{align*} \mathrm{H}_0: &\ \beta_i = 0 & \text{(no effect for predictor i)} \\ \mathrm{H}_A: &\ \beta_i \neq 0 & \text{(predictor i has an effect on y)} \end{align*} \]

library(broom)
knitr::kable(tidy(bac.lm))
term estimate std.error statistic p.value
(Intercept) -0.0127006 0.0126375 -1.004993 0.3319551
Beers 0.0179638 0.0024017 7.479592 0.0000030

Multiple Regression: Include More Predictors (Explanatory Variables)

Suppose you have \(p\) explanatory variables. The mean of Y is a linear function of \(\boldsymbol{x}_1, \boldsymbol{x}_2, \ldots, \boldsymbol{x}_p\).

\[ E(Y \mid X) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p \]

Scatterplot Matrix for BAC Example

library(GGally)
ggpairs(bac[, -c(1,2,3)]) + theme(axis.text.x = element_text(size = 12))

Regression of BAC on Beers and Weight

The fitted model for BAC is:

lm1.bac <- lm(BAC ~ Beers + Weight, data = bac) # fit the model
summary(lm1.bac)

Call:
lm(formula = BAC ~ Beers + Weight, data = bac)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0162968 -0.0067796  0.0003985  0.0085287  0.0155621 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.986e-02  1.043e-02   3.821  0.00212 ** 
Beers        1.998e-02  1.263e-03  15.817 7.16e-10 ***
Weight      -3.628e-04  5.668e-05  -6.401 2.34e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01041 on 13 degrees of freedom
Multiple R-squared:  0.9518,    Adjusted R-squared:  0.9444 
F-statistic: 128.3 on 2 and 13 DF,  p-value: 2.756e-09

\[ \widehat{BAC} = 0.0399 + 0.0200 (\text{Beers}) - 0.00036 (\text{Weight}). \]

Regression of BAC on Beers and Weight

knitr::kable(tidy(lm1.bac))
term estimate std.error statistic p.value
(Intercept) 0.0398634 0.0104333 3.820787 0.0021219
Beers 0.0199757 0.0012629 15.817343 0.0000000
Weight -0.0003628 0.0000567 -6.401230 0.0000234
knitr::kable(confint(lm1.bac))
2.5 % 97.5 %
(Intercept) 0.0173236 0.0624031
Beers 0.0172474 0.0227040
Weight -0.0004853 -0.0002404

Regression of BAC on Beers, Weight, and Gender

\[ \widehat{BAC} = 0.039 + 0.020 (\text{Beers}) - 0.00034 (\text{Weight}) - 0.0032 \text{ (Male)} \]

Both number of beers and weight are statistically significant predictors of BAC (p-value < 0.0001). Holding weight constant, we are 95% confident that the true effect of drinking one more beer is a 0.017 to 0.023 unit increase in mean BAC.

Scatterplot Matrix for BAC Example

library(GGally)
ggpairs(bac[, -c(1,2)]) + theme(axis.text.x = element_text(size = 12))

Regression for BAC on Beers, Weight, and Gender

lm2.bac <- lm(BAC ~ Beers + Weight + Gender, data = bac) # fit the model
summary(lm2.bac)

Call:
lm(formula = BAC ~ Beers + Weight + Gender, data = bac)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.018125 -0.005713  0.001501  0.007896  0.014655 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.871e-02  1.097e-02   3.528 0.004164 ** 
Beers        1.990e-02  1.309e-03  15.196 3.35e-09 ***
Weight      -3.444e-04  6.842e-05  -5.034 0.000292 ***
Gendermale  -3.240e-03  6.286e-03  -0.515 0.615584    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01072 on 12 degrees of freedom
Multiple R-squared:  0.9528,    Adjusted R-squared:  0.941 
F-statistic: 80.81 on 3 and 12 DF,  p-value: 3.162e-08

Regression Details for BAC on Beers, Weight, and Gender

“Male” is an indicator variable that equals 1 when predicting male Blood Alcohol Content (BAC) and 0 for female.

  1. Barb’s Prediction
    • Context: Barb drank 4 beers, weighs 160 lbs, and is female.
    • Equation: \[ \widehat{BAC} = 0.039 + 0.020(4) - 0.00034(160) - 0.0032(0) = 0.0646 \]
  2. John’s Prediction
    • Context: John drank 4 beers, weighs 160 lbs, and is male.
    • Equation: \[ \widehat{BAC} = 0.039 + 0.020(4) - 0.00034(160) - 0.0032(1) = 0.0614 \]

Model Disgnostics

plot(lm2.bac, which=1)

plot(lm2.bac, which=2)