discrimintools.DISCRIM#

class discrimintools.DISCRIM(method='linear', priors=None, classes=None, var_select=False, level=None, tol=None, warn_message=True)[source]#

Discriminant Analysis (DISCRIM)

Performs a discriminant analysis (linear and quadratic) on a set of observations (training data) containing one or more numerics variables and a classification variables defining groups of observations. The derived discriminant criterion from the training data can be applied to a testing dataset.

Parameters:
  • method ({‘linear’, ‘quad’}, default = ‘linear’) – The discriminant analysis method to performs, possible values:

    • ‘linear’ for linear discriminant analysis (LDA).

    • ‘quad’ for quadratic discriminant analysis (QDA)

  • priors (str or array-like or Series of shape (n_classes,), default = None) – The priors statement specifies the class prior probabilities of group membership, possibles values:

    • ‘equal’ to set the prior probabilities equal.

    • ‘prop’ to set the prior probabilities proportional to the sample sizes.

    • numpy 1-D array or Series which specify the prior probability for each level of the classification variable.

  • classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.

  • var_select (bool, default = False) – Whether to applied feature selection based on variable importance (contribution) in prediction for linear discriminant analysis

  • level (float, default = None) – Significance level for the variable importance critical probability. You can specify the level option only when both method = ‘linear’ and var_select=True are also specified. If you specify both method = ‘linear’ and var_select=True but omit the level option, DISCRIM uses \(5e-2\) as the significance level for the variabe importance.

  • tol (float, default = None) – Significance level for the test of homogeneity. You can specify the tol option only when method = ‘quad’ is also specified. If you specify method = ‘quad’ but omit the tol option, DISCRIM uses \(1e-1\) as the significance level for the test.

  • warn_message (bool, default = True) – Show warning messages. Raise a warning without making the program crash.

Returns:

  • call_ (NamedTuple) – Call informations:

    • XtotDataFrame of shape (n_samples, n_columns)

      Input data.

    • XDataFrame of shape (n_samples, n_features)

      Training data.

    • ySeries of shape (n_samples,)

      Target values. True values for X.

    • targetstr

      Name of target.

    • featureslist

      Names of features seen during fit.

    • classeslist

      Names of classes.

    • priorsSeries of shape (n_classes,)

      Priors probabilities.

    • n_samplesint

      Number of samples.

    • n_featuresint

      Number of features.

    • n_classesint

      Number of target values.

  • classes_ (Namedtuple) – Classes informations:

    • infosDataFrame of shape (n_classes, 3)

      class level information (frequency, proportion, prior probability).

    • centerDataFrame of shape (n_classes, n_features)

      Class means.

    • totalDataFrame of shape (n_features, n_classes)

      Total-sample standardized class means.

    • pooledDataFrame of shape (n_features, n_classes)

      Pooled-within class standardized class means.

    • mahalDataFrame of shape (n_classes, n_classes)

      Squared Mahalanobis distances between classes.

    • genDataFrame of shape (n_classes, n_classes)

      Generalized Squared distances between classes.

  • coef_ (DataFrame of shape (n_features, n_classes)) – Linear classification functions coefficients.

  • corr_ (NamedTuple) – Correlation coefficients test:

    • totalDataFrame of shape (C^{2}_{n_features}, 7)

      Total-sample correlation coefficients test.

    • withindict

      Within-class correlation coefficients test.

    • pooledDataFrame of shape (C^{2}_{n_features}, 7)

      Pooled within-class correlation coefficients test.

    • betweenDataFrame of shape (C^{2}_{n_features}, 7)

      Between-class correlation coefficients test.

  • cov_ (NamedTuple) – Covariance matrices:

    • totalDataFrame of shape (n_features, n_features)

      Total-sample covariance matrix.

    • btotalDataFrame of shape (n_features, n_features)

      Biased total-sample covariance matrix.

    • withindict

      Within-class covariance matrices.

    • bwithindict

      Biased within-class covariance matrices.

    • pooledDataFrame of shape (n_features, n_features)

      pooled within-class covariance matrix.

    • bpooledDataFrame of shape (n_features, n_features)

      biased pooled within-class covariance matrix.

    • betweenDataFrame of shape (n_features, n_features)

      Between-class covariance matrix

    • bbetweenDataFrale of shape (n_features, n_features)

      biased between-class covariance matrix.

    • testDataFrame of shape (1, 7)

      Box’s M test.

  • ind_ (NamedTuple) – Individuals informations:

    • scoresDataFrame of shape (n_samples, n_classes)

      The total scores of individuals.

    • mahalDataFrame of shape (n_samples, n_classes)

      Squared Mahalanobis distance to origin.

    • genDataFrame shape (n_samples, n_classes)

      Generalized squared distance to origin.

  • model_ (str, default = “discrim”) – Name of model fitted.

  • sscp_ (NamedTuple) – Sum of square cross product (SSCP) matrices:

    • totalDataFrame of shape (n_features, n_features)

      Total-sample SSCP matrix

    • withindict

      Within-class SSCP matrices

    • pooled: DataFrame of shape (n_features, n_features)

      Pooled within-class SSCP matrix

    • betweenDataFrame of shape (n_features, n_features)

      Between-class SSCP matrix.

  • statistics_ (NamedTuple) – Statistics results:

    • anovaDataFrame of shape (n_features, 11)

      Analysis of variance test.

    • manovaDataFrame of shape (4, 5)

      Multivariate analysis of variance test.

    • average_rsqDataFrame of shape (1, 2)

      Average R-square.

    • performanceDataFrame of shape (3, 3)

      The model global performance. Only if linear discriminant analysis.

  • summary_ (NamedTuple) – Summary informations:

    • infosDataFrame of shape (3, 4)

      Summary informations (total sample size, number of features, number of classes, total degree of freedom, within-class degree of freedom, between-class degree of freedom).

    • totalDataFrame of shape (n_features, 8)

      Total-sample statistics, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html

    • withindict

      Within-class statistics.

  • vip_ (NamedTuple) – Variable importance for prediction:

    • vipDataFrame of shape (n_features, 6)

      Variable importance for prediction.

    • selectedlist

      Selected variables.

See also

GFALDA

General Factor Analysis Linear Discriminant Analysis

CPLS

Partial Least Squares for Classification

PLSDA

Partial Least Squares Discriminant Analysis

PLSLDA

Partial Least Squares Linear Discriminant Analysis

summaryDISCRIM

Printing summaries of Discriminant Analysis (linear & quadratic) model.

summaryDA

Printing summaries of Discriminant Analysis model.

References

[1] Bardos M. (2001), « Analyse discriminante - Application au risque et scoring financier », Dunod.

[2] Lebart Ludovic, Piron Marie, & Morineau Alain (2006), « Statistique Exploratoire Multidimensionnelle », Dunod, Paris 4ed.

[3] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Université Lumière Lyon 2, Version 1.0.

[4] Saporta Gilbert (2011), « Probabilités, Analyse des données et Statistiques », Editions TECHNIP, 3ed.

[5] Tenenhaus Michel (1996), « Méthodes statistiques en gestion », Dunod.

[6] Tuffery Stephane (2017), « Data Mining et Statistique décisionelle », Editions TECHNIP, 5ed.

[7] Tuffery Stephane (2025), « Data Science, Statistique et Machine learning », Editions TECHNIP, 6ed.

[8] SAS/STAT 13.2 User’s Guide (2014), « The DISCRIM Procedure », Chapter 35.

Examples

>>> from discrimintools.datasets import load_alcools
>>> from discrimintools import DISCRIM
>>> D = load_alcools() # load training data
>>> y, X = D["TYPE"], D.drop(columns["TYPE"]) # split into X and y
>>> #linear discriminant analysis (LDA)
>>> clf = DISCRIM()
>>> clf.fit(X,y)
DISCRIM(priors='prop')
>>> #quadratic discriminant analysis
>>> clf2 = DISCRIM(method='quad')
>>> clf2.fit(X,y)
DISCRIM(method='quad',priors='prop')
```
__init__(method='linear', priors=None, classes=None, var_select=False, level=None, tol=None, warn_message=True)[source]#

Methods

__init__([method, priors, classes, ...])

decision_function(X)

Apply decision function to an input data

eval_predict(X, y[, verbose])

Evaluation of the prediction' quality

feature_importance([level, all_vars])

Variables Importance for Prediction in Linear Discriminant Analysis (LDAVIP)

fit(X, y)

Fit the Discriminant Analysis model.

fit_transform(X, y)

Fit to data, then transform it

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

pred_table(X, y)

Prediction table

predict(X)

Predict class labels for samples in X

predict_log_proba(X)

Return log of posterior probabilities

predict_proba(X)

Estimate probability

score(X, y)

Return accuracy on the given input data

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Project data to maximize class separation