graph TD A[Start] --> B{"σ known?"} B -->|Yes| C["Use z-test/z-interval"] B -->|No| D{"n ≥ 30?"} D -->|Yes| E["CLT: Use t-test (z ≈ t)"] D -->|No| F["Normal? QQ-plot/test"] F -->|Yes| G[Use t-test] F -->|No| H[Non-parametric test]
Math 216: Statistical Thinking
graph TD A[Start] --> B{"σ known?"} B -->|Yes| C["Use z-test/z-interval"] B -->|No| D{"n ≥ 30?"} D -->|Yes| E["CLT: Use t-test (z ≈ t)"] D -->|No| F["Normal? QQ-plot/test"] F -->|Yes| G[Use t-test] F -->|No| H[Non-parametric test]
What if the population data is decidedly non-normal?
Small Sample Sizes and Non-normality: When sample sizes are small (\(n < 30\)) and the data is non-normal, traditional tests like t-tests may become unreliable. This can lead to inflated Type I errors—incorrectly rejecting the null hypothesis (\(H_0\)) when it is true.
Nonparametric Statistics: These tests do not assume a normal distribution. Instead, they rely on ranks or medians, making them robust to outliers and extreme values.
Example: 15-weight sample from Davis
dataset:
Example: 15-weight sample from Davis
dataset:
Population Context: Full dataset (N=200) has median=57kg, but our sample (first 15 obs) has median=68kg:
[1] 0.03515625
[1] 0.05803929
Resolution: Sign test detects true median shift (68 vs 57) while t-test is confused by:
Population: Lognormal distribution (median=7.38, mean=12.18)
When H₀ is TRUE (testing median=7.38 in lognormal population):
set.seed(456)
err_rates <- replicate(10000, {
samp <- sample(skewed_pop, 15)
c(
t = t.test(samp, mu = 7.38)$p.value < 0.05,
sign = SIGN.test(samp, md = 7.38)$p.value < 0.05
)
})
# Get one error rate per method:
rowMeans(err_rates)
t sign
0.0956 0.0354
Results:
Binomial Foundation: Under \(H_0\): median \(= \eta_0\), each observation has 50% chance of being above/below \(\eta_0\)
Davis Example (\(H_0\): \(\eta = 57\) kg):
\[ \begin{aligned} \text{p-value} &= 2 \times P(X \geq 12) \\ &= 2 \times \sum_{k=12}^{15} \binom{15}{k} (0.5)^{15} \\ &= 2 \times (0.01389 + 0.00320 + 0.00046 + 0.00003) \\ &= 0.03516 \end{aligned} \]
R Calculation: