discrimintools.DISCRIM#
- class discrimintools.DISCRIM(method='linear', priors=None, classes=None, var_select=False, level=None, tol=None, warn_message=True)[source]#
-
Discriminant Analysis (DISCRIM)
Performs a discriminant analysis (linear and quadratic) on a set of observations (training data) containing one or more numerics variables and a classification variables defining groups of observations. The derived discriminant criterion from the training data can be applied to a testing dataset.
- Parameters:
-
-
method ({‘linear’, ‘quad’}, default = ‘linear’) – The discriminant analysis method to performs, possible values:
‘linear’ for linear discriminant analysis (LDA).
‘quad’ for quadratic discriminant analysis (QDA)
-
priors (str or array-like or Series of shape (n_classes,), default = None) – The priors statement specifies the class prior probabilities of group membership, possibles values:
‘equal’ to set the prior probabilities equal.
‘prop’ to set the prior probabilities proportional to the sample sizes.
numpy 1-D array or Series which specify the prior probability for each level of the classification variable.
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.
var_select (bool, default = False) – Whether to applied feature selection based on variable importance (contribution) in prediction for linear discriminant analysis
level (float, default = None) – Significance level for the variable importance critical probability. You can specify the level option only when both method = ‘linear’ and var_select=True are also specified. If you specify both method = ‘linear’ and var_select=True but omit the level option, DISCRIM uses \(5e-2\) as the significance level for the variabe importance.
tol (float, default = None) – Significance level for the test of homogeneity. You can specify the tol option only when method = ‘quad’ is also specified. If you specify method = ‘quad’ but omit the tol option, DISCRIM uses \(1e-1\) as the significance level for the test.
warn_message (bool, default = True) – Show warning messages. Raise a warning without making the program crash.
-
- Returns:
-
-
call_ (NamedTuple) – Call informations:
-
- XtotDataFrame of shape (n_samples, n_columns)
-
Input data.
-
- XDataFrame of shape (n_samples, n_features)
-
Training data.
-
- ySeries of shape (n_samples,)
-
Target values. True values for
X.
-
- targetstr
-
Name of target.
-
- featureslist
-
Names of features seen during
fit.
-
- classeslist
-
Names of classes.
-
- priorsSeries of shape (n_classes,)
-
Priors probabilities.
-
- n_samplesint
-
Number of samples.
-
- n_featuresint
-
Number of features.
-
- n_classesint
-
Number of target values.
-
-
classes_ (Namedtuple) – Classes informations:
-
- infosDataFrame of shape (n_classes, 3)
-
class level information (frequency, proportion, prior probability).
-
- centerDataFrame of shape (n_classes, n_features)
-
Class means.
-
- totalDataFrame of shape (n_features, n_classes)
-
Total-sample standardized class means.
-
- pooledDataFrame of shape (n_features, n_classes)
-
Pooled-within class standardized class means.
-
- mahalDataFrame of shape (n_classes, n_classes)
-
Squared Mahalanobis distances between classes.
-
- genDataFrame of shape (n_classes, n_classes)
-
Generalized Squared distances between classes.
-
coef_ (DataFrame of shape (n_features, n_classes)) – Linear classification functions coefficients.
-
corr_ (NamedTuple) – Correlation coefficients test:
-
- totalDataFrame of shape (C^{2}_{n_features}, 7)
-
Total-sample correlation coefficients test.
-
- withindict
-
Within-class correlation coefficients test.
-
- pooledDataFrame of shape (C^{2}_{n_features}, 7)
-
Pooled within-class correlation coefficients test.
-
- betweenDataFrame of shape (C^{2}_{n_features}, 7)
-
Between-class correlation coefficients test.
-
-
cov_ (NamedTuple) – Covariance matrices:
-
- totalDataFrame of shape (n_features, n_features)
-
Total-sample covariance matrix.
-
- btotalDataFrame of shape (n_features, n_features)
-
Biased total-sample covariance matrix.
-
- withindict
-
Within-class covariance matrices.
-
- bwithindict
-
Biased within-class covariance matrices.
-
- pooledDataFrame of shape (n_features, n_features)
-
pooled within-class covariance matrix.
-
- bpooledDataFrame of shape (n_features, n_features)
-
biased pooled within-class covariance matrix.
-
- betweenDataFrame of shape (n_features, n_features)
-
Between-class covariance matrix
-
- bbetweenDataFrale of shape (n_features, n_features)
-
biased between-class covariance matrix.
-
- testDataFrame of shape (1, 7)
-
Box’s M test.
-
-
ind_ (NamedTuple) – Individuals informations:
-
- scoresDataFrame of shape (n_samples, n_classes)
-
The total scores of individuals.
-
- mahalDataFrame of shape (n_samples, n_classes)
-
Squared Mahalanobis distance to origin.
-
- genDataFrame shape (n_samples, n_classes)
-
Generalized squared distance to origin.
-
model_ (str, default = “discrim”) – Name of model fitted.
-
sscp_ (NamedTuple) – Sum of square cross product (SSCP) matrices:
-
- totalDataFrame of shape (n_features, n_features)
-
Total-sample SSCP matrix
-
- withindict
-
Within-class SSCP matrices
-
- pooled: DataFrame of shape (n_features, n_features)
-
Pooled within-class SSCP matrix
-
- betweenDataFrame of shape (n_features, n_features)
-
Between-class SSCP matrix.
-
-
statistics_ (NamedTuple) – Statistics results:
-
- anovaDataFrame of shape (n_features, 11)
-
Analysis of variance test.
-
- manovaDataFrame of shape (4, 5)
-
Multivariate analysis of variance test.
-
- average_rsqDataFrame of shape (1, 2)
-
Average R-square.
-
- performanceDataFrame of shape (3, 3)
-
The model global performance. Only if linear discriminant analysis.
-
-
summary_ (NamedTuple) – Summary informations:
-
- infosDataFrame of shape (3, 4)
-
Summary informations (total sample size, number of features, number of classes, total degree of freedom, within-class degree of freedom, between-class degree of freedom).
-
- totalDataFrame of shape (n_features, 8)
-
Total-sample statistics, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html
-
- withindict
-
Within-class statistics.
-
-
vip_ (NamedTuple) – Variable importance for prediction:
-
- vipDataFrame of shape (n_features, 6)
-
Variable importance for prediction.
-
- selectedlist
-
Selected variables.
-
-
See also
GFALDA-
General Factor Analysis Linear Discriminant Analysis
CPLS-
Partial Least Squares for Classification
PLSDA-
Partial Least Squares Discriminant Analysis
PLSLDA-
Partial Least Squares Linear Discriminant Analysis
summaryDISCRIM-
Printing summaries of Discriminant Analysis (linear & quadratic) model.
summaryDA-
Printing summaries of Discriminant Analysis model.
References
[1] Bardos M. (2001), « Analyse discriminante - Application au risque et scoring financier », Dunod.
[2] Lebart Ludovic, Piron Marie, & Morineau Alain (2006), « Statistique Exploratoire Multidimensionnelle », Dunod, Paris 4ed.
[3] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Université Lumière Lyon 2, Version 1.0.
[4] Saporta Gilbert (2011), « Probabilités, Analyse des données et Statistiques », Editions TECHNIP, 3ed.
[5] Tenenhaus Michel (1996), « Méthodes statistiques en gestion », Dunod.
[6] Tuffery Stephane (2017), « Data Mining et Statistique décisionelle », Editions TECHNIP, 5ed.
[7] Tuffery Stephane (2025), « Data Science, Statistique et Machine learning », Editions TECHNIP, 6ed.
[8] SAS/STAT 13.2 User’s Guide (2014), « The DISCRIM Procedure », Chapter 35.
Examples
>>> from discrimintools.datasets import load_alcools >>> from discrimintools import DISCRIM >>> D = load_alcools() # load training data >>> y, X = D["TYPE"], D.drop(columns["TYPE"]) # split into X and y >>> #linear discriminant analysis (LDA) >>> clf = DISCRIM() >>> clf.fit(X,y) DISCRIM(priors='prop') >>> #quadratic discriminant analysis >>> clf2 = DISCRIM(method='quad') >>> clf2.fit(X,y) DISCRIM(method='quad',priors='prop') ```
- __init__(method='linear', priors=None, classes=None, var_select=False, level=None, tol=None, warn_message=True)[source]#
Methods
__init__([method, priors, classes, ...])decision_function(X)Apply decision function to an input data
eval_predict(X, y[, verbose])Evaluation of the prediction' quality
feature_importance([level, all_vars])Variables Importance for Prediction in Linear Discriminant Analysis (LDAVIP)
fit(X, y)Fit the Discriminant Analysis model.
fit_transform(X, y)Fit to data, then transform it
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
pred_table(X, y)Prediction table
predict(X)Predict class labels for samples in X
predict_log_proba(X)Return log of posterior probabilities
predict_proba(X)Estimate probability
score(X, y)Return accuracy on the given input data
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Project data to maximize class separation