Day 38

Math 216: Statistical Thinking

Bastola

Coefficient of Correlation and Determination

  • Coefficient of Correlation (\(r\)):
    • Measures the strength of the linear relationship between \(x\) and \(y\).
    • Computed as \(r = \frac{\text{SS}_{xy}}{\sqrt{\text{SS}_{xx} \text{SS}_{yy}}}\), providing a scale-free measure ranging from -1 to +1.
  • Coefficient of Determination (\(r^2\)):
    • Represents the proportion of variability in \(y\) explained by the linear relationship with \(x\).
    • Total variability (SST) is made up of
      • Explained (SSR): how far the model’s predictions stray from the mean, and
      • Unexplained (SSE): how far the actual points stray from the model’s predictions.

Total Variability (SST)

Total Variability (SST) in GPA

Explained Variability (SSR)

Explained Variability (SSR) with Regression

Unexplained Variability (SSE)

Unexplained Variability (SSE)

Coefficient Relationships

  • Red dashed line: Mean model (ȳ = 3.39)
  • Blue line: Regression model (ŷ = 2.26 + 0.056x)
  • Green segments: Unexplained variability (SSE = 1.038)

\[ \begin{align} R^2 &= \frac{\mathrm{SSR}}{\mathrm{SST}}=1-\frac{\mathrm{SSE}}{\mathrm{SST}}\\ &=\frac{\displaystyle\sum_i \bigl(\hat y_i - \bar y\bigr)^2}{\displaystyle\sum_i \bigl(y_i - \bar y\bigr)^2} = 1 \;-\;\frac{\displaystyle\sum_i \bigl(y_i - \hat y_i\bigr)^2}{\displaystyle\sum_i \bigl(y_i - \bar y\bigr)^2}. \end{align} \]

Calculations

  • Formula Breakdown:

    • SST = 5.958 (Total squared differences from mean)
    • SSE = 1.038 (Unexplained squared errors)
    • R² = 1 - 1.038/5.958 = 0.826

Practical Interpretation

  • With R² = 0.83, 83% of GPA variation associates with study hours

  • Residual 17% influenced by other factors (course difficulty, prior knowledge)

  • Caveat:

    • Explains associated variation, not necessarily causal
    • Does not indicate prediction accuracy magnitude