General Linear Model (GLM)

Model Summary - S, R-Sq, R-Sq (adj) and R-Sq (pred) Values

S, R $image\squared.gif$ , adjusted R $image\squared.gif$ , and predicted R $image\squared.gif$ are measures of how well the model fits the data. These values can help you select the model with the best fit.

· S is measured in the units of the response variable and represents the standard distance that data values fall from the regression line. For a given study, the better the equation predicts the response, the lower S is.

· R $image\squared.gif$ (R-Sq) describes the amount of variation in the observed response values that is explained by the predictor(s). R $image\squared.gif$ always increases with additional predictors. For example, the best five-predictor model will always have a higher R $image\squared.gif$ than the best four-predictor model. Therefore, R $image\squared.gif$ is most useful when comparing models of the same size.

· Adjusted R $image\squared.gif$ is a modified R $image\squared.gif$ that has been adjusted for the number of terms in the model. If you include unnecessary terms, R $image\squared.gif$ can be artificially high. Unlike R $image\squared.gif$ , adjusted R $image\squared.gif$ may get smaller when you add terms to the model. Use adjusted R $image\squared.gif$ to compare models with different numbers of predictors.

· R2(pred) is a measure of how well the model predicts the response for new observations. Large differences between Predicted R2 and the other two R2 statistics can indicate that the model is overfit. An overfit model does not predict new observations nearly as well as the model fits the existing data. Predicted R2 is more useful than adjusted R2 for comparing models because it is calculated with observations not included in the model calculation.

Example Output

S R-sq R-sq(adj) R-sq(pred)

0.147504 94.61% 92.81% 88.01%

Interpretation

For the salary data, S is 0.147504, R $image\squared.gif$ is 94.61%, and adjusted R $image\squared.gif$ equals 92.81%. R $image\squared.gif$ (pred) is 88.01%, which indicates that the model explains 88.01% of the variation in Salary when you use it for prediction. If you are comparing different salary models, then you generally look for models that minimize S and maximize the R $image\squared.gif$ values.