PLSDA#

Overview#

Partial Least Squares can also be applied to classification problems. The general idea is to perform a PLS(2) decomposition between \(X\) and \(Z\) where \(Z\) is dummy encoded for the different classes. The scores that come from the PLS decomposition are then used as an input to a classification model. So really, PLSDA is just using PLS to find a good subspace then performing classication on the transformed coordinates in that space; this classification is the “discrimination” which can be done by any number of methods.

Description of the method#

Partial least squares discriminant analysis (PLSDA) can handle multiclass problem i.. the target variable can have \(K\) (\(K \geq 2\)) classes. It is relies on the same principle as CPLS about the dectection of the number of components.

In this approach, we create \(K\) indicator variable (as much as the number of classes) using the following coding scheme:

\[\begin{split}Z_{k} = \begin{cases} 1 & \text{if}\quad y_{k}=k \\ 0 & \text{otherwise}\end{cases}\end{split}\]

The PLS algorithm handles the \(Z\) target variables and the \(X\) features. We obtain \(K\) classification functions:

\[d\left(y_{k},X\right) = \beta_{k}^{T}X = \beta_{0,k} + \beta_{1,k}X_{1} + \cdots + \beta_{k,p}X_{p}\]

Predictive idea#

The classification rule used in PLSDA consists of assigning each individual \(i\) to the class \(\mathcal{C}_{k}\) using the following rule :

\[\widehat{y}_{k} = \text{arg}\underbrace{\max}_{l}\left\{d\left(y_{l},X\right)\right\}\]

Number of components#

In PLSDA procedure, we can explicitly specify the number of components, with the parameter n_components, for NIPALS [1] algorithms.

VIP#

You can use VIP (variable importance in the projection) to select predictor variables when multicollinearity exists among variables. The VIP coefficients reflects the relative importance for the selected factors.

Description#

The VIP for a feature \(j\) in PLSDA model with \(H\) components is given as:

\[VIP_{j} = \sqrt{\dfrac{p}{\displaystyle \sum_{h=1}^{h=H}R^{2}\left(y,t_{h}\right)}\displaystyle \sum_{h=1}^{h=H}R^{2}\left(y,t_{h}\right) w_{j,h}^{2}}\]

where \(R^{2}\left(y,t_{h}\right)\) is the square correlation coefficient between \(y\) and \(t_{h}\); \(w_{j,h}\) is the \(x\)-weight coefficient.

Variables with a VIP score greater than \(1\) (default threshold in PLSDA procedure) are considered important for the projection of the PLS regression.

Note

These selections rules must be use with caution because the VIP reflects only the relative importance (each others) of the input variables. It does not mean that a variable with a low VIP is not relevant for the classification.

Coefficients#

Coefficients are the parameters in a regression equation. The estimated coefficients are used with the predictors to calculate the fitted value of the response variable and the predicted response of new observations. In contrast to least squares, the PLS coefficients are nonlinear estimators. Standardized coefficients indicate the importance of each predictor in the model and correspond to the standardized \(x\)- and \(z\)-variables. In PLS, the coefficient matrix of shape \((p,K)\) is calculated from the weights and loadings.

The formula for standardized coefficients is:

\[\beta^{std} = W\left(P^{T}W\right)^{-1}Q^{T}\]

To calculate the nonstandardized coefficients and intercept, use these formulas:

\[\begin{split}\beta_{j,k} & = \beta_{j,k}^{std} \dfrac{\sigma_{Z_{k}}}{\sigma_{j}} \\ \beta_{0,k} & = \mu_{Z_{k}} - \displaystyle \sum_{j} \mu_{j}\beta_{j,k}\end{split}\]

where:

Terms	Description
\(W\)	the \(x\)-weight matrix
\(P\)	the \(x\)-loading matrix
\(Q\)	the \(Z\)-loading matrix
\(j\)	the features \(j\)
\(p\)	the number of features
\(K\)	the number of classes (targets)

Explained variance of \(X\)#

The explained variance ratio is defined by the following formula:

\[\text{Explained variance ratio} = \dfrac{\text{Variance explained by component}}{\text{Total variance}}\]

which equal to:

\[\text{Explained variance ratio}(h) = \dfrac{\lvert \lvert t_{h}p_{h}^{T}\rvert \rvert_{F}^{2}}{\lvert \lvert X\rvert \rvert_{F}^{2}}\]