KEY CONCEPTS

*Correlation and regression are statistical methods to examine the linear relationship between two numerical variables measured on the same subjects. Correlation describes a relationship, and regression describes both a relationship and predicts an outcome.**Correlation coefficients range from –1 to +1, both indicating a perfect relationship between two variables. A correlation equal to 0 indicates no relationship.**Scatterplots provide a visual display of the relationship between two numerical variables and are recommended to check for a linear relationship and extreme values.**The coefficient of determination, or*r^{2}*, is simply the squared correlation; it is the preferred statistic to describe the strength between two numerical variables.**The*t*test can be used to test the hypothesis that the population correlation is zero.**The Fisher*z*transformation is used to form confidence intervals for the correlation or to test any hypotheses about the value of the correlation.**The Fisher*z*transformation can also be used to form confidence intervals for the difference between correlations in two independent groups.**It is possible to test whether the correlation between one variable and a second is the same as the correlation between a third variable and a second variable.**When one or both of the variables in correlation is skewed, the Spearman rho nonparametric correlation is advised.**Linear regression is called*linear*because it measures only straight-line relationships.**The least squares method is the one used in almost all regression examples in medicine. With one independent and one dependent variable, the regression equation can be given as a straight line.**The standard error of the estimate is a statistic that can be used to test hypotheses or form confidence intervals about both the intercept and the regression coefficient (slope).**One important use of regression is to be able to predict outcomes in a future group of subjects.**When predicting outcomes, the confidence limits are called confidence bands about the regression line. The most accurate predictions are for outcomes close to the mean of the independent variable*X,*and they become less precise as the outcome departs from the mean.**It is possible to test whether the regression line is the same (i.e., has the same slope and intercept) in two different groups.**A residual is the difference between the actual and the predicted outcome; looking at the distribution of residuals helps statisticians decide if the linear regression model is the best approach to analyzing the data.**Regression toward the mean can result in a treatment or procedure appearing to be of value when it has had no actual effect; having a control group helps to guard against this problem.**Correlation and regression should not be used unless observations are independent; it is not appropriate to include multiple measurements of the same subjects.**Mixing two populations can also cause the correlation and regression coefficient to be larger than they should.**The use of correlation versus regression should be dictated by the purpose of ...*