discrimintools.DISCRIM#

class discrimintools.DISCRIM(method='linear', priors=None, classes=None, var_select=False, level=None, tol=None, warn_message=True)[source]#

Discriminant Analysis (DISCRIM)

Performs a discriminant analysis (linear and quadratic) on a set of observations (training data) containing one or more numerics variables and a classification variables defining groups of observations. The derived discriminant criterion from the training data can be applied to a testing dataset.

Parameters:

method ({‘linear’, ‘quad’}, default = ‘linear’) – The discriminant analysis method to performs, possible values:
- ‘linear’ for linear discriminant analysis (LDA).
- ‘quad’ for quadratic discriminant analysis (QDA)
priors (str or array-like or Series of shape (n_classes,), default = None) – The priors statement specifies the class prior probabilities of group membership, possibles values:
- ‘equal’ to set the prior probabilities equal.
- ‘prop’ to set the prior probabilities proportional to the sample sizes.
- numpy 1-D array or Series which specify the prior probability for each level of the classification variable.
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.
var_select (bool, default = False) – Whether to applied feature selection based on variable importance (contribution) in prediction for linear discriminant analysis
level (float, default = None) – Significance level for the variable importance critical probability. You can specify the level option only when both method = ‘linear’ and var_select=True are also specified. If you specify both method = ‘linear’ and var_select=True but omit the level option, DISCRIM uses \(5e-2\) as the significance level for the variabe importance.
tol (float, default = None) – Significance level for the test of homogeneity. You can specify the tol option only when method = ‘quad’ is also specified. If you specify method = ‘quad’ but omit the tol option, DISCRIM uses \(1e-1\) as the significance level for the test.
warn_message (bool, default = True) – Show warning messages. Raise a warning without making the program crash.

Returns:

call_ (NamedTuple) – Call informations:
- XtotDataFrame of shape (n_samples, n_columns)
  
  Input data.
- XDataFrame of shape (n_samples, n_features)
  
  Training data.
- ySeries of shape (n_samples,)
  
  Target values. True values for X.
- targetstr
  
  Name of target.
- featureslist
  
  Names of features seen during fit.
- classeslist
  
  Names of classes.
- priorsSeries of shape (n_classes,)
  
  Priors probabilities.
- n_samplesint
  
  Number of samples.
- n_featuresint
  
  Number of features.
- n_classesint
  
  Number of target values.
classes_ (Namedtuple) – Classes informations:
- infosDataFrame of shape (n_classes, 3)
  
  class level information (frequency, proportion, prior probability).
- centerDataFrame of shape (n_classes, n_features)
  
  Class means.
- totalDataFrame of shape (n_features, n_classes)
  
  Total-sample standardized class means.
- pooledDataFrame of shape (n_features, n_classes)
  
  Pooled-within class standardized class means.
- mahalDataFrame of shape (n_classes, n_classes)
  
  Squared Mahalanobis distances between classes.
- genDataFrame of shape (n_classes, n_classes)
  
  Generalized Squared distances between classes.
coef_ (DataFrame of shape (n_features, n_classes)) – Linear classification functions coefficients.
corr_ (NamedTuple) – Correlation coefficients test:
- totalDataFrame of shape (C^{2}_{n_features}, 7)
  
  Total-sample correlation coefficients test.
- withindict
  
  Within-class correlation coefficients test.
- pooledDataFrame of shape (C^{2}_{n_features}, 7)
  
  Pooled within-class correlation coefficients test.
- betweenDataFrame of shape (C^{2}_{n_features}, 7)
  
  Between-class correlation coefficients test.
cov_ (NamedTuple) – Covariance matrices:
- totalDataFrame of shape (n_features, n_features)
  
  Total-sample covariance matrix.
- btotalDataFrame of shape (n_features, n_features)
  
  Biased total-sample covariance matrix.
- withindict
  
  Within-class covariance matrices.
- bwithindict
  
  Biased within-class covariance matrices.
- pooledDataFrame of shape (n_features, n_features)
  
  pooled within-class covariance matrix.
- bpooledDataFrame of shape (n_features, n_features)
  
  biased pooled within-class covariance matrix.
- betweenDataFrame of shape (n_features, n_features)
  
  Between-class covariance matrix
- bbetweenDataFrale of shape (n_features, n_features)
  
  biased between-class covariance matrix.
- testDataFrame of shape (1, 7)
  
  Box’s M test.
ind_ (NamedTuple) – Individuals informations:
- scoresDataFrame of shape (n_samples, n_classes)
  
  The total scores of individuals.
- mahalDataFrame of shape (n_samples, n_classes)
  
  Squared Mahalanobis distance to origin.
- genDataFrame shape (n_samples, n_classes)
  
  Generalized squared distance to origin.
model_ (str, default = “discrim”) – Name of model fitted.
sscp_ (NamedTuple) – Sum of square cross product (SSCP) matrices:
- totalDataFrame of shape (n_features, n_features)
  
  Total-sample SSCP matrix
- withindict
  
  Within-class SSCP matrices
- pooled: DataFrame of shape (n_features, n_features)
  
  Pooled within-class SSCP matrix
- betweenDataFrame of shape (n_features, n_features)
  
  Between-class SSCP matrix.
statistics_ (NamedTuple) – Statistics results:
- anovaDataFrame of shape (n_features, 11)
  
  Analysis of variance test.
- manovaDataFrame of shape (4, 5)
  
  Multivariate analysis of variance test.
- average_rsqDataFrame of shape (1, 2)
  
  Average R-square.
- performanceDataFrame of shape (3, 3)
  
  The model global performance. Only if linear discriminant analysis.
summary_ (NamedTuple) – Summary informations:
- infosDataFrame of shape (3, 4)
  
  Summary informations (total sample size, number of features, number of classes, total degree of freedom, within-class degree of freedom, between-class degree of freedom).
- totalDataFrame of shape (n_features, 8)
  
  Total-sample statistics, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html
- withindict
  
  Within-class statistics.
vip_ (NamedTuple) – Variable importance for prediction:
- vipDataFrame of shape (n_features, 6)
  
  Variable importance for prediction.
- selectedlist
  
  Selected variables.

See also

GFALDA: General Factor Analysis Linear Discriminant Analysis
CPLS: Partial Least Squares for Classification
PLSDA: Partial Least Squares Discriminant Analysis
PLSLDA: Partial Least Squares Linear Discriminant Analysis
summaryDISCRIM: Printing summaries of Discriminant Analysis (linear & quadratic) model.
summaryDA: Printing summaries of Discriminant Analysis model.

References

[1] Bardos M. (2001), « Analyse discriminante - Application au risque et scoring financier », Dunod.

[2] Lebart Ludovic, Piron Marie, & Morineau Alain (2006), « Statistique Exploratoire Multidimensionnelle », Dunod, Paris 4ed.

[3] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Université Lumière Lyon 2, Version 1.0.

[4] Saporta Gilbert (2011), « Probabilités, Analyse des données et Statistiques », Editions TECHNIP, 3ed.

[5] Tenenhaus Michel (1996), « Méthodes statistiques en gestion », Dunod.

[6] Tuffery Stephane (2017), « Data Mining et Statistique décisionelle », Editions TECHNIP, 5ed.

[7] Tuffery Stephane (2025), « Data Science, Statistique et Machine learning », Editions TECHNIP, 6ed.

[8] SAS/STAT 13.2 User’s Guide (2014), « The DISCRIM Procedure », Chapter 35.

Examples

>>> from discrimintools.datasets import load_alcools
>>> from discrimintools import DISCRIM
>>> D = load_alcools() # load training data
>>> y, X = D["TYPE"], D.drop(columns["TYPE"]) # split into X and y
>>> #linear discriminant analysis (LDA)
>>> clf = DISCRIM()
>>> clf.fit(X,y)
DISCRIM(priors='prop')
>>> #quadratic discriminant analysis
>>> clf2 = DISCRIM(method='quad')
>>> clf2.fit(X,y)
DISCRIM(method='quad',priors='prop')
```

__init__(method='linear', priors=None, classes=None, var_select=False, level=None, tol=None, warn_message=True)[source]#

Methods

`__init__`([method, priors, classes, ...])
`decision_function`(X)	Apply decision function to an input data
`eval_predict`(X, y[, verbose])	Evaluation of the prediction' quality
`feature_importance`([level, all_vars])	Variables Importance for Prediction in Linear Discriminant Analysis (LDAVIP)
`fit`(X, y)	Fit the Discriminant Analysis model.
`fit_transform`(X, y)	Fit to data, then transform it
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`pred_table`(X, y)	Prediction table
`predict`(X)	Predict class labels for samples in X
`predict_log_proba`(X)	Return log of posterior probabilities
`predict_proba`(X)	Estimate probability
`score`(X, y)	Return accuracy on the given input data
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Project data to maximize class separation

discrimintools.DISCRIM#

This Page