Math 216: Statistical Thinking
Data: each case \(i\) has two measurements
A scatterplot is the plot of \((x_i, y_i)\)
positive association: as \(x\) increases, \(y\) increases
negative association: as \(x\) increases, \(y\) decreases
Correlation coefficients, denoted \(r\) (sample) or \(\rho\) (population), measure the linear relationship between two variables.
Strength varies as \(r \approx \pm 1\) (strong), \(r \approx 0\) (weak).
Direction: Positive (\(r > 0\)) or negative (\(r < 0\)) linear association.
Formula: \[ r = \frac{\sum_{i=1}^n \left(\frac{x_i - \bar{x}}{s_x}\right) \left(\frac{y_i - \bar{y}}{s_y}\right)}{n-1} \]
Visualization: Scatterplots reveal the clustering around the regression line; outliers can heavily influence \(r\).
Goal: To find a straight line that best fits the data in a scatterplot.
Observation | Temperature (°F) | Chirp Rate (chirps/15 sec) |
---|---|---|
1 | 89 | 20 |
2 | 72 | 16 |
3 | 93 | 20 |
4 | 84 | 18 |
5 | 81 | 17 |
6 | 75 | 16 |
7 | 70 | 15 |
8 | 82 | 17 |
9 | 69 | 15 |
10 | 83 | 16 |
11 | 80 | 15 |
12 | 83 | 17 |
13 | 81 | 16 |
14 | 84 | 17 |
15 | 76 | 14 |