Day 29

Math 216: Statistical Thinking

Dr. Bastola

Independent Sampling Overview

When samples cannot be paired, they are treated as independent. For two independent samples:

Sample 1: Size \(n_1\), mean \(\bar{x}_1\), variance \(s_1^2\)
Sample 2: Size \(n_2\), mean \(\bar{x}_2\), variance \(s_2^2\)

Conditions to Check:

Sample Sizes: \(n_1 \geq 30\), \(n_2 \geq 30\) (for large samples).
Normality: For smaller samples, ensure the data is approximately normally distributed.
Variances: Known variances or assume equal variance under normality.

Sampling Distribution Properties

For \(\bar{X}_1 - \bar{X}_2\):

Expected Value: \(E(\bar{X}_1 - \bar{X}_2) = \mu_1 - \mu_2\)
Standard Error:

\[\sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{ \underbrace{\frac{\sigma_1^2}{n_1}}_{\text{Group 1 variability}} + \underbrace{\frac{\sigma_2^2}{n_2}}_{\text{Group 2 variability}}}\]
Distribution Shape:
- Exactly normal if populations are normal
- Approximately normal via CLT for \(n \geq 30\)

Variance Homogeneity Testing

Formal Tests: Levene’s test
Practical Approach:
- Compare variance ratios (\(s_1^2/s_2^2\))
- Consider sample sizes (unequal n makes it harder to find effects)

Pooled Variance

Used when assuming equal population variances (\(\sigma_1^2 = \sigma_2^2\)):

\[s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}\]

Weighted Average: Combines variances proportionally to sample sizes
Degrees of Freedom: \(df = n_1 + n_2 - 2\) (total observations minus groups)

Component	Meaning
\((n_1-1)s_1^2\)	Scaled variability from Group 1
\((n_2-1)s_2^2\)	Scaled variability from Group 2
Denominator	Total degrees of freedom

Hypothesis Testing Framework

Null Hypothesis: \(H_0: \mu_1 - \mu_2 = 0\)
Alternatives:
- \(H_a: \mu_1 - \mu_2 \neq 0\) (Two-tailed)
- \(H_a: \mu_1 - \mu_2 > 0\) (One-tailed)

Case 1: Equal Variances (Pooled) \[t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\] \[s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}\] \[df = n_1 + n_2 - 2\]

Case 2: Unequal Variances (Welch) \[t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\] \[df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}\]

Confidence Interval

General Form

\[(\bar{x}_1 - \bar{x}_2) \pm t^*_{\alpha/2} \cdot SE\]

Pooled Variance CI: \[SE = s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]

Welch’s CI: \[SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

Interpretation Example

“With 95% confidence, the true mean difference lies between [−3.2, 5.8]. As this interval contains 0, we cannot reject the null hypothesis at α=0.05.”