discrimintools.PLSLDA#

class discrimintools.PLSLDA(n_components=2, scale=True, priors=None, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, warn_message=True)[source]#

Partial Least Squares Linear Discriminant Analysis (PLSLDA)

Performs partial least squares linear discriminant analysis (PLSLDA). It’s a classical linear discriminant analysis carried out on the scores of a partial least scores of explanatory variables. Partial least squares linear discriminant analysis consists in three steps:

  1. Recode the target variable into n_classes dummy variables.

  2. Computation of partial least squares regression using PLSRegression.

  3. Computation of linear discriminant analysis on x_scores extract in step using DISCRIM.

Parameters:
  • n_components (int or None, default = 2) – Number of components to keep. Should be in [1, n_features].

  • scale (bool, defaul = True) – Whether to scale X and y.

  • classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted using unique values in y.

  • max_iter (int, default = 500) – The maximum number of iterations for NIPALS method

  • tol (float, default = 1e-06) – The tolerance used as convergence criteria in the NIPALS method.

  • var_select (bool, default = True) – Whether to applied feature selection based on variables importance in Projection for Partial Least-Squares Regression

  • threshold (float, default = 1.0) – You can use VIP to select predictor variables when multicollinearity exists among variables. Variables with a VIP score greater than 1 are considered important for the projection of the PLS regression.

  • warn_message (bool, default = True) – Whether to show warning messages.

Returns:

  • call_ (NamedTuple) – Call informations:

    • XtotDataFrame of shape (n_samples, n_columns)

      Input data.

    • XDataFrame of shape (n_samples, n_columns)

      Training data.

    • ySeries of shape (n_samples,)

      Target values. True values for X.

    • targetstr

      Name of target.

    • featureslist

      Names of features seen during fit.

    • classeslist

      Names of classes.

    • priorsSeries of shape (n_classes,)

      Priors probabilities.

    • n_samplesint

      Number of samples.

    • n_featuresint

      Number of features.

    • n_classesint

      Number of target values

    • max_componentsint

      Maximum number of components.

    • n_componentsint

      Number of components kept.

  • cancoef_ (NamedTuple) – Canonical coefficients:

    • standardizedDataFrame of shape (n_variables, n_components)

      The standardized canonical coefficients

    • rawDataFrame of shape (n_variables+1, n_components)

      The raw canonical coefficients

  • classes_ (NamedTuple) – Classes informations:

    • infosDataFrame of shape (n_classes, 3)

      class level information (frequency, proportion, prior probability).

    • coordDataFrame of shape (n_classes, n_components)

      Class coordinates.

    • euclDataFrame of shape (n_classes, n_classes)

      The squared Euclidean distance to origin.

    • genDataFrame shape (n_classes, n_classes)

      The generalized squared distance to origin.

  • coef_ (NamedTuple) – Partial least squares linear discriminant analysis coefficients:

    • standardizedDataFrame of shape (n_variables, n_classes)

      The standardized coefficients.

    • rawDataFrame of shape (n_variables+1, n_classes)

      The raw coefficients.

  • explained_variance_ (DataFrame of shape (n_components, 2)) – The explained variance and the cumulative explained variance.

  • ind_ (NamedTuple) – Individuals informations:

    • coordDataFrame of shape (n_samples, n_components)

      The transformed training simples.

    • scoresDataFrame of shape (n_samples,) or (n_samples, n_classes - 1)

      The total scores of individuals.

    • euclDataFrame of shape (n_samples, n_classes)

      The squared Euclidean distance to origin.

    • genDataFrame shape (n_samples, n_classes)

      The generalized squared distance to origin.

  • lda_ (class) – An object of class DISCRIM.

  • model_ (str, default = ‘plslda’) – The model fitted name.

  • var_ (NamedTuple) – Variables informations:

    • weightsDataFrame of shape (n_features, n_components)

      The left singular vectors of the cross-covariance matrices of each iteration.

    • loadingsDataFrame of shape (n_features, n_components)

      The loadings of X.

    • rotationsDataFrame of shape (n_features, n_components)

      The projection matrix used to transform X.

See also

PLSLOGIT

Partial Least Squares Logistic Regression

summaryPLSLDA

Printing summaries of Partial Least Squares Linear Discriminant Analysis model.

summaryDA

Printing summaries of Discriminant Analysis model.

References

[1] H. Abdi (2003), « Partial Least Square Regression », Multivariate analysis. In M. Lewis-Beck, A. Bryman, & T. Futing (Eds): Encyclopedia for research methods for the social sciences. Thousand Oaks:Sage.

[2] M. Tenenhaus (1998), « La régression PLS - Théorie et Pratique », Editions TECHNIP.

[3] R. Tomassone, M. Danzart, J.J. Daudin, J.P. Masson (1988), « Discrimination et classement », Masson.

[4] Ricco Rakotomalala (2008), « Analyse Discriminante sur axes principaux », Université Lumière Lyon 2.

[5] Ricco Rakotomalala (2008), « Analyse Discriminante PLS », Université Lumière Lyon 2.

[6] Ricco Rakotomalala (2008), « Analyse Discriminante PLS - Etude comparative », Université Lumière Lyon 2.

[7] Ricco Rakotomalala (2008), « Régression PLS », Université Lumière Lyon 2.

[8] Ricco Rakotomalala (2008), « Régression PLS - Sélection du nombre d’axes », Université Lumière Lyon 2.

[9] Ricco Rakotomalala (2008), « Régression PLS - Comparaison de logiciels », Université Lumière Lyon 2.

[10] S. Chevallier, D. Bertrand, A. Kohler, P. Courcoux (2006), « Application of PLS-DA in multivariate image analysis », in J. Chemometrics, 20 : 221-229.

[11] S. Vancolen (2004), « La régression PLS », Université de Neuchâtel.

Examples

>>> from discrimintools.datasets import load_dataset
>>> from discrimintools import PLSLDA
>>> D = load_dataset("breast")
>>> y, X = D["Class"], D.drop(columns=["Class"])
>>> clf = PLSLDA()
>>> clf.fit(X,y)
PLSDA()
__init__(n_components=2, scale=True, priors=None, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, warn_message=True)[source]#

Methods

__init__([n_components, scale, priors, ...])

decision_function(X)

Apply decision function to an input data

eval_predict(X, y[, verbose])

Evaluation of the prediction' quality

fit(X, y)

Fit Partial Least Squares Linear Discriminant Analysis Model

fit_transform(X, y)

Fit to data, then transform it

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

pred_table(X, y)

Prediction table

predict(X)

Predict class labels for samples in X

predict_log_proba(X)

Return log of posterior probabilities

predict_proba(X)

Estimate probability

score(X, y)

Return accuracy on the given input data

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Project data to maximize class separation