Day 37

Math 216: Statistical Thinking

Bastola

Key Assumptions for Linear Regression

Simple Linear Regression Model:
- \(y = \beta_0 + \beta_1 x + \varepsilon\)
Assumptions:
1. Mean of Errors (\(\varepsilon\)): The mean of the probability distribution of \(\varepsilon\) is 0, aligning the expected value of \(y\) with \(\beta_0 + \beta_1 x\) for any \(x\).
2. Constant Variance: The variance of \(\varepsilon\) is constant across all values of \(x\), denoted as \(\sigma^2\).
3. Normal Distribution of Errors: \(\varepsilon\) follows a normal distribution.
4. Independence of Errors: The errors associated with different \(y\) values are independent.

Constant Variance

# Define the data
x <- c(1, 2, 3, 4, 5)
y <- c(1, 1, 2, 2, 4)

# Fit a linear model
mod <- lm(y ~ x)
summary(mod)


Call:
lm(formula = y ~ x)

Residuals:
         1          2          3          4          5 
 4.000e-01 -3.000e-01  6.478e-17 -7.000e-01  6.000e-01 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -0.1000     0.6351  -0.157   0.8849  
x             0.7000     0.1915   3.656   0.0354 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6055 on 3 degrees of freedom
Multiple R-squared:  0.8167,    Adjusted R-squared:  0.7556 
F-statistic: 13.36 on 1 and 3 DF,  p-value: 0.03535

Making Inferences About the Slope \(\beta_1\)

Objective: Assess the significance of the slope \(\beta_1\) to understand the contribution of \(x\) in predicting \(y\).
Statistical Test:
- Null Hypothesis (\(H_0\)): \(\beta_1 = 0\) (No relationship)
- Alternative Hypothesis (\(H_a\)): \(\beta_1 \neq 0\) (Significant relationship)
Using R for Hypothesis Testing:
- Perform t-tests to decide whether to reject \(H_0\). A significant \(p\)-value (\(< \alpha\)) indicates a meaningful contribution of \(x\) to predicting \(y\).

Practical Steps Using R

Conducting the Test:
1. Estimate \(\hat{\beta}_0\) and \(\hat{\beta}_1\) using the least squares method.
2. Compute the standard error and perform a t-test to check the significance of \(\hat{\beta}_1\).
3. Interpret the results: A significant test suggests that changes in \(x\) systematically relate to changes in \(y\).

Hypothesis Testing

How can we make a decision of this hypothesis test using R?

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.1	0.6350853	-0.1574592	0.8848840
x	0.7	0.1914854	3.6556308	0.0353528

Confidence Intervals

Confidence Intervals in R

confint(mod, level = 0.95)

                  2.5 %   97.5 %
(Intercept) -2.12112485 1.921125
x            0.09060793 1.309392