Day 34

Math 216: Statistical Thinking

Bastola

Chi-Square Test & Contingency Table

  • Objective: Assess independence between two categorical variables (e.g., flu contraction vs. vaccination status).
  • Analysis: Compare observed vs. expected frequencies in a contingency table to detect significant associations.

Table 1: Flu by Vaccination Status

Vaccinated Unvaccinated Total
Flu 20 35 55
No Flu 80 65 145
Total 100 100 200

Table 2: Flu by Vaccine Type

Status No Vaccine One Shot Two Shot Total
Flu 24 9 13 46
No Flu 289 100 565 954
Total 313 109 578 1000

Contingency Table

Column 1 Column 2 Column c Total
Row 1 \(n_{11}\) \(n_{12}\) \(n_{1c}\) \(r_1\)
Row 2 \(n_{21}\) \(n_{22}\) \(n_{2c}\) \(r_2\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
Row r \(n_{r1}\) \(n_{r2}\) \(n_{rc}\) \(r_r\)
Total \(c_1\) \(c_2\) \(c_c\) \(n\)
  • \(n=\sum_{i=1}^{r} r_i = \sum_{j=1}^{c} c_j\)
  • Rows (\(r\)) and columns (\(c\)) represent different categorical classifications.

Hypothesis Testing Using Contingency Tables

  • Determine if there is independence or dependence between row and column classifications.
  • Hypotheses:
    • \(H_0\): Row and column classifications are independent.
    • Probability of belonging to both the \(i^{th}\) row and \(j^{th}\) column:
      • \(\operatorname{Prob}(r_i \cap c_j) = P(r_i) \cdot P(c_j) = \left(\frac{r_i}{n}\right) \cdot \left(\frac{c_j}{n}\right)\)
    • Expected count for cell \((i, j)\):
      • \(e_{ij} = n \cdot \left(\frac{r_i}{n}\right) \cdot \left(\frac{c_j}{n}\right) = \frac{r_i c_j}{n}\)
  • Test Statistic:
    • \(T = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(n_{ij}-e_{ij})^2}{e_{ij}} \sim \chi^2_{1-\alpha} \text{ with df } = (r-1) \cdot (c-1)\)

A survey was conducted to evaluate the effectiveness of a new flu vaccine that had been administered in a small community. The vaccine was provided free of charge in a two-shot sequence over a period of 2 weeks to those wishing to avail themselves of it. Some people receive the two-shot sequence, some appeared only for the first shot, and the others received neither.

A survey of 1000 local inhabitants in the following spring provided the information shown in the following table.

Status No Vaccine One Shot Two Shot
Flu 24 9 13 46
No flu 289 100 565 954
Total 313 109 578 1000

R-code Solutions

flu <- c(24, 9, 13)
no_flu <- c(289, 100, 565)
# Create a matrix from the data
tab <- rbind(flu, no_flu)
chisq.test(tab, correct = FALSE)

    Pearson's Chi-squared test

data:  tab
X-squared = 17.313, df = 2, p-value = 0.000174

A survey is taken to determine whether there is a relationship between political affiliation and strength of support for space exploration. We randomly select 100 individuals and ask their political affiliation and their support level to obtain the data in the following table.

Support Level Republican Democrat Independent Total
Strong 8 10 12 30
Moderate 12 17 6 35
Weak 10 13 12 35
Total 30 40 30 100

R-code Solutions

strong <- c(8, 10, 12)
moderate <- c(12, 17, 6)
weak <- c(10, 13, 12)
# Create a matrix from the data
tab <- rbind(strong, moderate, weak)
chisq.test(tab, correct = FALSE)

    Pearson's Chi-squared test

data:  tab
X-squared = 4.5397, df = 4, p-value = 0.3379