discrimintools.PLSLOGIT#

class discrimintools.PLSLOGIT(n_components=2, scale=True, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, multi_class=None, warn_message=True, **kwargs)[source]#

Partial Least Squares Logistic Regression (PLSLOGIT)

Performs partial least squares logistic regression (PLSLOGIT). It’s a classical logistic regression (binary, multinomial, ordinal) carried out on the scores of a partial least scores of explanatory variables. Partial least squares logistic regression consists in three steps:

Recode the target variable into n_classes dummy variables.
Computation of partial least squares regression using PLSRegression.
Computation of logistic regression (binary, multinomial, ordinal) on x_scores extract in step 2 using Statsmodels.

Parameters:

n_components (int or None, default = 2) – Number of components to keep. Should be in [1, n_features].
scale (bool, defaul = True) – Whether to scale X and y.
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted using unique values in y.
max_iter (int, default = 500) – The maximum number of iterations for NIPALS method
tol (float, default = 1e-06) – The tolerance used as convergence criteria in the NIPALS method.
var_select (bool, default = True) – Whether to applied feature selection based on variables importance in Projection for Partial Least-Squares Regression
threshold (float, default = 1.0) – You can use VIP to select predictor variables when multicollinearity exists among variables. Variables with a VIP score greater than 1 are considered important for the projection of the PLS regression.
multi_class (None, str.) – You can choose between multinomial or ordinal logistic regression. Only for multiclass problem.
warn_message (bool, default = True) – Whether to show warning messages.
kwargs – Additionals parameters to used in fit for logistic regression. see statsmodels.

Returns:

call_ (NamedTuple) – Call informations:
- XtotDataFrame of shape (n_samples, n_columns)
  
  Input data.
- XDataFrame of shape (n_samples, n_features)
  
  Training data.
- ySeries of shape (n_samples,)
  
  Target values. True values for X.
- targetstr
  
  Name of target.
- featureslist
  
  Names of features seen during fit.
- classeslist
  
  Names of classes
- priorsSeries of shape (n_classes,)
  
  Priors probabilities
- centerSeries of shape (n_features,)
  
  The average of X
- scaleSeries of shape (n_features,)
  
  The standard deviation of X.
- n_samplesint
  
  Number of samples.
- n_featuresint
  
  Number of features.
- max_componentsint
  
  Maximum number of components.
- n_componentsint
  
  Number of components kept.
- n_classesint
  
  Number of target values.
- max_iterint
  
  Maximum number of iterations.
- tolfloat
  
  The tolerance used as convergence criteria.
- thresholdfloat,
  
  The tolerance for variable importance in projection.
- multi_classNone, str
  
  The multiclass logistic regression applied.
cancoef_ (NamedTuple) – Canonical coefficients:
- standardizedDataFrame of shape (n_variables, n_components)
  
  The standardized canonical coefficients
- rawDataFrame of shape (n_variables + 1, n_components)
  
  The raw canonical coefficients
classes_ (NamedTuple) – Classes informations:
- infosDataFrame of shape (n_classes, 3)
  
  class level information (frequency, proportion, prior probability).
- coordDataFrame of shape (n_classes, n_components)
  
  Class coordinates.
- euclDataFrame of shape (n_classes, n_classes)
  
  The squared Euclidean distance to origin.
- genDataFrame shape (n_classes, n_classes)
  
  The generalized squared distance to origin.
coef_ (NamedTuple) – Partial least squares logit model coefficients:
- standardizedDataFrame of shape (n_variables, n_classes - 1)
  
  The standardized coefficients.
- rawDataFrame of shape (n_variables+1, n_classes - 1)
  
  The raw coefficients.
explained_variance_ (DataFrame of shape (n_components, 2)) – The explained variance and the cumulative explained variance.
ind_ (NamedTuple) – Individuals informations:
- coordDataFrame of shape (n_samples, n_components)
  
  The transformed training simples.
- scoresDataFrame of shape (n_samples,) or (n_samples, n_classes - 1)
  
  The total scores of individuals.
- euclDataFrame of shape (n_samples, n_classes)
  
  The squared Euclidean distance to origin.
- genDataFrame shape (n_samples, n_classes)
  
  The generalized squared distance to origin.
logit_ (class) – An object of class Logit.
logit_coef_ (DataFrame of shape (n_components + 1,) or (n_components + 1, n_classes - 1)) – Logistic regression model coefficients.
model_ (str, default = ‘plslogit’) – The model fitted name.
var_ (NamedTuple) – Variables informations:
- weightsDataFrame of shape (n_features, n_components)
  
  The left singular vectors of the cross-covariance matrices of each iteration.
- loadingsDataFrame of shape (n_features, n_components)
  
  The loadings of X.
- rotationsDataFrame of shape (n_features, n_components)
  
  The projection matrix used to transform X.

See also

PLSDA: Partial Least Squares Linear Discriminant Analysis
summaryPLSLOGIT: Printing summaries of Partial Least Squares Linear Logistic Regression model.
summaryDA: Printing summaries of Discriminant Analysis model.

References

[1] Droesbeke J. J., Lejeune M., Saporta G. (2005), « Modèles statistiques pour données qualitatives », Editions TECHNIP.

[2] Tuffery S. (2017), « Data Mining et Statistique décisionnelle : La science des données », Editions TECHNIP.

[3] Tuffery S. (2024), « Modélisation prédictive et Apprentissage statistique avec R », Editions TECHNIP, 5ed;

[4] Tuffery R. (2025), « Data Science, Statistique et Machine Learning », Editions TECHNIP, 6ed.

Examples

>>> from discrimintools.datasets import load_dataset, load_vins
>>> from discrimintools import PLSLOGIT
>>> #pls + logit
>>> D = load_dataset("breast")
>>> y, X = D["Class"], D.drop(columns=["Class"])
>>> clf = PLSLOGIT()
>>> clf.fit(X,y)
PLSLOGIT()
>>> D = load_vins("train")
>>> y, X = D["Qualite"], D.drop(columns=["Qualite"])
>>> #pls + multinomial
>>> clf = PLSLOGIT(classes=('Mediocre','Moyen','Bon'))
>>> clf.fit(X,y)
PLSLOGIT(classes=('Mediocre','Moyen','Bon'))
>>> "pls + ordinal
>>> clf = PLSLOGIT(multi_class="ordinal",classes=('Mediocre','Moyen','Bon'),method='bfgs')
>>> clf.fit(X,y)
PLSLOGIT(multi_class="ordinal",classes=('Mediocre','Moyen','Bon'),method='bfgs')

__init__(n_components=2, scale=True, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, multi_class=None, warn_message=True, **kwargs)[source]#

Methods

`__init__`([n_components, scale, classes, ...])
`decision_function`(X)	Apply decision function to an input data
`eval_predict`(X, y[, verbose])	Evaluation of the prediction' quality
`fit`(X, y)	Fit Partial Least Squares Logistic Regression Model
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`pred_table`(X, y)	Prediction table
`predict`(X)	Predict class labels for samples in X
`predict_log_proba`(X)	Predict logarithm of probability estimates.
`predict_proba`(X)	Probability estimates.
`score`(X, y)	Return accuracy on the given input data
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.

discrimintools.PLSLOGIT#

This Page