discrimintools.PLSLOGIT#

class discrimintools.PLSLOGIT(n_components=2, scale=True, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, multi_class=None, warn_message=True, **kwargs)[source]#

Partial Least Squares Logistic Regression (PLSLOGIT)

Performs partial least squares logistic regression (PLSLOGIT). It’s a classical logistic regression (binary, multinomial, ordinal) carried out on the scores of a partial least scores of explanatory variables. Partial least squares logistic regression consists in three steps:

  1. Recode the target variable into n_classes dummy variables.

  2. Computation of partial least squares regression using PLSRegression.

  3. Computation of logistic regression (binary, multinomial, ordinal) on x_scores extract in step 2 using Statsmodels.

Parameters:
  • n_components (int or None, default = 2) – Number of components to keep. Should be in [1, n_features].

  • scale (bool, defaul = True) – Whether to scale X and y.

  • classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted using unique values in y.

  • max_iter (int, default = 500) – The maximum number of iterations for NIPALS method

  • tol (float, default = 1e-06) – The tolerance used as convergence criteria in the NIPALS method.

  • var_select (bool, default = True) – Whether to applied feature selection based on variables importance in Projection for Partial Least-Squares Regression

  • threshold (float, default = 1.0) – You can use VIP to select predictor variables when multicollinearity exists among variables. Variables with a VIP score greater than 1 are considered important for the projection of the PLS regression.

  • multi_class (None, str.) – You can choose between multinomial or ordinal logistic regression. Only for multiclass problem.

  • warn_message (bool, default = True) – Whether to show warning messages.

  • kwargs – Additionals parameters to used in fit for logistic regression. see statsmodels.

Returns:

  • call_ (NamedTuple) – Call informations:

    • XtotDataFrame of shape (n_samples, n_columns)

      Input data.

    • XDataFrame of shape (n_samples, n_features)

      Training data.

    • ySeries of shape (n_samples,)

      Target values. True values for X.

    • targetstr

      Name of target.

    • featureslist

      Names of features seen during fit.

    • classeslist

      Names of classes

    • priorsSeries of shape (n_classes,)

      Priors probabilities

    • centerSeries of shape (n_features,)

      The average of X

    • scaleSeries of shape (n_features,)

      The standard deviation of X.

    • n_samplesint

      Number of samples.

    • n_featuresint

      Number of features.

    • max_componentsint

      Maximum number of components.

    • n_componentsint

      Number of components kept.

    • n_classesint

      Number of target values.

    • max_iterint

      Maximum number of iterations.

    • tolfloat

      The tolerance used as convergence criteria.

    • thresholdfloat,

      The tolerance for variable importance in projection.

    • multi_classNone, str

      The multiclass logistic regression applied.

  • cancoef_ (NamedTuple) – Canonical coefficients:

    • standardizedDataFrame of shape (n_variables, n_components)

      The standardized canonical coefficients

    • rawDataFrame of shape (n_variables + 1, n_components)

      The raw canonical coefficients

  • classes_ (NamedTuple) – Classes informations:

    • infosDataFrame of shape (n_classes, 3)

      class level information (frequency, proportion, prior probability).

    • coordDataFrame of shape (n_classes, n_components)

      Class coordinates.

    • euclDataFrame of shape (n_classes, n_classes)

      The squared Euclidean distance to origin.

    • genDataFrame shape (n_classes, n_classes)

      The generalized squared distance to origin.

  • coef_ (NamedTuple) – Partial least squares logit model coefficients:

    • standardizedDataFrame of shape (n_variables, n_classes - 1)

      The standardized coefficients.

    • rawDataFrame of shape (n_variables+1, n_classes - 1)

      The raw coefficients.

  • explained_variance_ (DataFrame of shape (n_components, 2)) – The explained variance and the cumulative explained variance.

  • ind_ (NamedTuple) – Individuals informations:

    • coordDataFrame of shape (n_samples, n_components)

      The transformed training simples.

    • scoresDataFrame of shape (n_samples,) or (n_samples, n_classes - 1)

      The total scores of individuals.

    • euclDataFrame of shape (n_samples, n_classes)

      The squared Euclidean distance to origin.

    • genDataFrame shape (n_samples, n_classes)

      The generalized squared distance to origin.

  • logit_ (class) – An object of class Logit.

  • logit_coef_ (DataFrame of shape (n_components + 1,) or (n_components + 1, n_classes - 1)) – Logistic regression model coefficients.

  • model_ (str, default = ‘plslogit’) – The model fitted name.

  • var_ (NamedTuple) – Variables informations:

    • weightsDataFrame of shape (n_features, n_components)

      The left singular vectors of the cross-covariance matrices of each iteration.

    • loadingsDataFrame of shape (n_features, n_components)

      The loadings of X.

    • rotationsDataFrame of shape (n_features, n_components)

      The projection matrix used to transform X.

See also

PLSDA

Partial Least Squares Linear Discriminant Analysis

summaryPLSLOGIT

Printing summaries of Partial Least Squares Linear Logistic Regression model.

summaryDA

Printing summaries of Discriminant Analysis model.

References

[1] Droesbeke J. J., Lejeune M., Saporta G. (2005), « Modèles statistiques pour données qualitatives », Editions TECHNIP.

[2] Tuffery S. (2017), « Data Mining et Statistique décisionnelle : La science des données », Editions TECHNIP.

[3] Tuffery S. (2024), « Modélisation prédictive et Apprentissage statistique avec R », Editions TECHNIP, 5ed;

[4] Tuffery R. (2025), « Data Science, Statistique et Machine Learning », Editions TECHNIP, 6ed.

Examples

>>> from discrimintools.datasets import load_dataset, load_vins
>>> from discrimintools import PLSLOGIT
>>> #pls + logit
>>> D = load_dataset("breast")
>>> y, X = D["Class"], D.drop(columns=["Class"])
>>> clf = PLSLOGIT()
>>> clf.fit(X,y)
PLSLOGIT()
>>> D = load_vins("train")
>>> y, X = D["Qualite"], D.drop(columns=["Qualite"])
>>> #pls + multinomial
>>> clf = PLSLOGIT(classes=('Mediocre','Moyen','Bon'))
>>> clf.fit(X,y)
PLSLOGIT(classes=('Mediocre','Moyen','Bon'))
>>> "pls + ordinal
>>> clf = PLSLOGIT(multi_class="ordinal",classes=('Mediocre','Moyen','Bon'),method='bfgs')
>>> clf.fit(X,y)
PLSLOGIT(multi_class="ordinal",classes=('Mediocre','Moyen','Bon'),method='bfgs')
__init__(n_components=2, scale=True, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, multi_class=None, warn_message=True, **kwargs)[source]#

Methods

__init__([n_components, scale, classes, ...])

decision_function(X)

Apply decision function to an input data

eval_predict(X, y[, verbose])

Evaluation of the prediction' quality

fit(X, y)

Fit Partial Least Squares Logistic Regression Model

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

pred_table(X, y)

Prediction table

predict(X)

Predict class labels for samples in X

predict_log_proba(X)

Predict logarithm of probability estimates.

predict_proba(X)

Probability estimates.

score(X, y)

Return accuracy on the given input data

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.