Partial least squares regression

A technique that reduces the predictors to a smaller set of uncorrelated components and performs least squares regression on these components, instead of on the original data. Partial least squares (PLS) is particularly useful when your predictors are highly collinear, or when you have more predictors than observations and ordinary least squares regression either produces coefficients with high standard errors or fails completely.

PLS is primarily used in the chemical, drug, food, and plastic industries. A common application is to model the relationship between spectral measurements (NIR, IR, UV), which include many variables that are often correlated with one another, and chemical composition or other physio-chemical properties. In PLS, the emphasis is on developing predictive models. For this reason, PLS is not usually used to screen out variables that are not useful in explaining the response.

PLS can calculate as many components as there are predictors; often, cross-validation is used to identify the smaller set of components that provide the greatest predictive ability. If you calculate all possible components, the resulting model is equivalent to the model you would obtain using least squares regression. In PLS, components are selected based how much variance they explain in the predictors and between the predictors and the response(s). If the predictors are highly correlated, or if a smaller number of components perfectly model the response, then the number of components in the PLS model may be much less than the number of predictors.

Unlike least squares regression, PLS can fit multiple response variables in a single model. Because PLS models the responses in a multivariate way, the results may differ significantly from those calculated for the responses individually. Include multiple responses in a single model only if the responses are correlated; if not, fit a separate model for each response.