discrimintools.PLSLOGIT#
- class discrimintools.PLSLOGIT(n_components=2, scale=True, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, multi_class=None, warn_message=True, **kwargs)[source]#
-
Partial Least Squares Logistic Regression (PLSLOGIT)
Performs partial least squares logistic regression (PLSLOGIT). It’s a classical logistic regression (binary, multinomial, ordinal) carried out on the scores of a partial least scores of explanatory variables. Partial least squares logistic regression consists in three steps:
Recode the target variable into
n_classesdummy variables.Computation of partial least squares regression using PLSRegression.
Computation of logistic regression (binary, multinomial, ordinal) on
x_scoresextract in step 2 using Statsmodels.
- Parameters:
-
n_components (int or None, default = 2) – Number of components to keep. Should be in
[1, n_features].scale (bool, defaul = True) – Whether to scale
Xandy.classes (None, tuple or list, default = None) – Name of level in order to return. If
None, classes are sorted using unique values in y.max_iter (int, default = 500) – The maximum number of iterations for NIPALS method
tol (float, default = 1e-06) – The tolerance used as convergence criteria in the NIPALS method.
var_select (bool, default = True) – Whether to applied feature selection based on variables importance in Projection for Partial Least-Squares Regression
threshold (float, default = 1.0) – You can use VIP to select predictor variables when multicollinearity exists among variables. Variables with a VIP score greater than 1 are considered important for the projection of the PLS regression.
multi_class (None, str.) – You can choose between
multinomialorordinallogistic regression. Only for multiclass problem.warn_message (bool, default = True) – Whether to show warning messages.
kwargs – Additionals parameters to used in
fitfor logistic regression. see statsmodels.
- Returns:
-
-
call_ (NamedTuple) – Call informations:
-
- XtotDataFrame of shape (n_samples, n_columns)
-
Input data.
-
- XDataFrame of shape (n_samples, n_features)
-
Training data.
-
- ySeries of shape (n_samples,)
-
Target values. True values for
X.
-
- targetstr
-
Name of target.
-
- featureslist
-
Names of features seen during
fit.
-
- classeslist
-
Names of classes
-
- priorsSeries of shape (n_classes,)
-
Priors probabilities
-
- centerSeries of shape (n_features,)
-
The average of X
-
- scaleSeries of shape (n_features,)
-
The standard deviation of
X.
-
- n_samplesint
-
Number of samples.
-
- n_featuresint
-
Number of features.
-
- max_componentsint
-
Maximum number of components.
-
- n_componentsint
-
Number of components kept.
-
- n_classesint
-
Number of target values.
-
- max_iterint
-
Maximum number of iterations.
-
- tolfloat
-
The tolerance used as convergence criteria.
-
- thresholdfloat,
-
The tolerance for variable importance in projection.
-
- multi_classNone, str
-
The multiclass logistic regression applied.
-
-
cancoef_ (NamedTuple) – Canonical coefficients:
-
- standardizedDataFrame of shape (n_variables, n_components)
-
The standardized canonical coefficients
-
- rawDataFrame of shape (n_variables + 1, n_components)
-
The raw canonical coefficients
-
-
classes_ (NamedTuple) – Classes informations:
-
- infosDataFrame of shape (n_classes, 3)
-
class level information (frequency, proportion, prior probability).
-
- coordDataFrame of shape (n_classes, n_components)
-
Class coordinates.
-
- euclDataFrame of shape (n_classes, n_classes)
-
The squared Euclidean distance to origin.
-
- genDataFrame shape (n_classes, n_classes)
-
The generalized squared distance to origin.
-
-
coef_ (NamedTuple) – Partial least squares logit model coefficients:
-
- standardizedDataFrame of shape (n_variables, n_classes - 1)
-
The standardized coefficients.
-
- rawDataFrame of shape (n_variables+1, n_classes - 1)
-
The raw coefficients.
-
explained_variance_ (DataFrame of shape (n_components, 2)) – The explained variance and the cumulative explained variance.
-
ind_ (NamedTuple) – Individuals informations:
-
- coordDataFrame of shape (n_samples, n_components)
-
The transformed training simples.
-
- scoresDataFrame of shape (n_samples,) or (n_samples, n_classes - 1)
-
The total scores of individuals.
-
- euclDataFrame of shape (n_samples, n_classes)
-
The squared Euclidean distance to origin.
-
- genDataFrame shape (n_samples, n_classes)
-
The generalized squared distance to origin.
-
logit_ (class) – An object of class Logit.
logit_coef_ (DataFrame of shape (n_components + 1,) or (n_components + 1, n_classes - 1)) – Logistic regression model coefficients.
model_ (str, default = ‘plslogit’) – The model fitted name.
-
var_ (NamedTuple) – Variables informations:
-
- weightsDataFrame of shape (n_features, n_components)
-
The left singular vectors of the cross-covariance matrices of each iteration.
-
- loadingsDataFrame of shape (n_features, n_components)
-
The loadings of X.
-
- rotationsDataFrame of shape (n_features, n_components)
-
The projection matrix used to transform X.
-
-
See also
PLSDA-
Partial Least Squares Linear Discriminant Analysis
summaryPLSLOGIT-
Printing summaries of Partial Least Squares Linear Logistic Regression model.
summaryDA-
Printing summaries of Discriminant Analysis model.
References
[1] Droesbeke J. J., Lejeune M., Saporta G. (2005), « Modèles statistiques pour données qualitatives », Editions TECHNIP.
[2] Tuffery S. (2017), « Data Mining et Statistique décisionnelle : La science des données », Editions TECHNIP.
[3] Tuffery S. (2024), « Modélisation prédictive et Apprentissage statistique avec R », Editions TECHNIP, 5ed;
[4] Tuffery R. (2025), « Data Science, Statistique et Machine Learning », Editions TECHNIP, 6ed.
Examples
>>> from discrimintools.datasets import load_dataset, load_vins >>> from discrimintools import PLSLOGIT >>> #pls + logit >>> D = load_dataset("breast") >>> y, X = D["Class"], D.drop(columns=["Class"]) >>> clf = PLSLOGIT() >>> clf.fit(X,y) PLSLOGIT() >>> D = load_vins("train") >>> y, X = D["Qualite"], D.drop(columns=["Qualite"]) >>> #pls + multinomial >>> clf = PLSLOGIT(classes=('Mediocre','Moyen','Bon')) >>> clf.fit(X,y) PLSLOGIT(classes=('Mediocre','Moyen','Bon')) >>> "pls + ordinal >>> clf = PLSLOGIT(multi_class="ordinal",classes=('Mediocre','Moyen','Bon'),method='bfgs') >>> clf.fit(X,y) PLSLOGIT(multi_class="ordinal",classes=('Mediocre','Moyen','Bon'),method='bfgs')
- __init__(n_components=2, scale=True, classes=None, max_iter=500, tol=1e-10, var_select=False, threshold=1.0, multi_class=None, warn_message=True, **kwargs)[source]#
Methods
__init__([n_components, scale, classes, ...])decision_function(X)Apply decision function to an input data
eval_predict(X, y[, verbose])Evaluation of the prediction' quality
fit(X, y)Fit Partial Least Squares Logistic Regression Model
fit_transform(X[, y])Fit to data, then transform it.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
pred_table(X, y)Prediction table
predict(X)Predict class labels for samples in X
predict_log_proba(X)Predict logarithm of probability estimates.
predict_proba(X)Probability estimates.
score(X, y)Return accuracy on the given input data
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.