discrimintools.CANDISC#

class discrimintools.CANDISC(n_components=2, classes=None, warn_message=True)[source]#

Canonical Discriminant Analysis (CANDISC)

Canonical discriminant analysis is a dimension-reduction technique related to principal component analysis and canonical correlation. The methodology that is used in deriving the canonical coefficients parallels that of a one-way multivariate analysis of variance (MANOVA). MANOVA tests for equality of the mean vector across class levels. Canonical discriminant analysis finds linear combinations of the quantitative variables that provide maximal separation between classes or groups. Given a classification variable and several quantitative variables, the CANDISC procedure derives canonical variables, which are linear combinations of the quantitative variables that summarize between-class variation in much the same way that principal components summarize total variation.

The CANDISC procedure performs a canonical discriminant analysis, computes squared Mahalanobis distances between class means, and performs both univariate and multivariate one-way analyses of variance.

Parameters:

n_components (int or None, default = 2) –

Number of components to keep. If None set all components are kept:
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.
warn_message (bool, default = True) – Show warning messages. Raise a warning without making the program crash.

Returns:

call_ (NamedTuple) – Call informations:
- XtotDataFrame of shape (n_samples, n_columns)
  
  Input data.
- XDataFrame of shape (n_samples, n_features)
  
  Training data.
- ySeries of shape (n_samples,)
  
  Target values. True values for X.
- targetstr
  
  Name of target.
- featureslist
  
  Names of features seen during fit.
- classeslist
  
  Names of classes.
- priorsSeries of shape (n_classes,)
  
  Priors probabilities.
- n_samplesint
  
  Number of samples.
- n_featuresint
  
  Number of features.
- n_classesint
  
  Number of target values
- max_componentsint
  
  Maximum number of components.
- n_componentsint
  
  Number of components kept.
cancoef_ (NamedTuple) – Canonical coefficients:
- rawDataFrame of shape (n_features + 1, n_components)
  
  Raw canonical coefficients.
- totalDataFrame of shape (n_features, n_components)
  
  Total canonical coefficients.
- pooledDataFrame of shape (n_features, n_components)
  
  Pooled canonical coefficients.
cancorr_ (DataFrame of shape (n_components, 10)) – The canonical correlations test.
classes_ (NamedTuple) – Classes informations:
- infosDataFrame of shape (n_classes, 3)
  
  class level information (frequency, proportion, prior probability).
- centerDataFrame of shape (n_classes, n_features)
  
  Class means.
- totalDataFrame of shape (n_features, n_classes)
  
  Total-sample standardized class means.
- pooledDataFrame of shape (n_features, n_classes)
  
  Pooled-within class standardized class means.
- mahalDataFrame of shape (n_classes, n_classes)
  
  Squared Mahalanobis distances between classes.
- coordDataFrame of shape (n_classes, n_components)
  
  Class coordinates.
- euclDataFrame of shape (n_classes, n_classes)
  
  The squared Euclidean distance to origin.
- genDataFrame shape (n_classes, n_classes)
  
  The generalized squared distance to origin.
coef_ (DataFrame of shape (n_features + 1, n_classes)) – Linear classification functions coefficients.
corr_ (NamedTuple) – Correlation coefficients test:
- totalDataFrame of shape (C^{2}_{n_features}, 7)
  
  Total-sample correlation coefficients test.
- withindict
  
  Within-class correlation coefficients test.
- pooledDataFrame of shape (C^{2}_{n_features}, 7)
  
  Pooled within-class correlation coefficients test.
- betweenDataFrame of shape (C^{2}_{n_features}, 7)
  
  Between-class correlation coefficients test.
cov_ (NamedTuple) – Covariance matrices:
- totalDataFrame of shape (n_features, n_features)
  
  Total-sample covariance matrix.
- btotalDataFrame of shape (n_features, n_features)
  
  Biased total-sample covariance matrix.
- withindict
  
  Within-class covariance matrices.
- bwithindict
  
  Biased within-class covariance matrices.
- pooledDataFrame of shape (n_features, n_features)
  
  Pooled within-class covariance matrix.
- bpooledDataFrame of shape (n_features, n_features)
  
  Biased pooled within-class covariance matrix.
- betweenDataFrame of shape (n_features, n_features)
  
  Between-class covariance matrix
- bbetweenDataFrame of shape (n_features, n_features)
  
  biased between-class covariance matrix.
eig_ (DataFrame of shape (n_components, 4)) – The eigenvalues, the difference between each eigenvalue, the percentage of variance and the cumulative percentage of variance
ind_ (NamedTuple) – Individuals informations:
- coordDataFrame of shape (n_samples, n_components)
  
  The coordinates of individuals.
- mahalDataFrame of shape (n_samples, n_classes)
  
  The squared Mahalanobis distance to origin.
- euclDataFrame of shape (n_samples, n_classes)
  
  The squared Euclidean distance to origin.
- genDataFrame shape (n_samples, n_classes)
  
  The generalized squared distance to origin.
- scoresDataFrame of shape (n_samples, n_classes)
  
  The total scores of individuals.
model_ (str, default = “candisc”) – Name of model fitted.
sscp_ (NamedTuple) – Sum of square cross product (SSCP) matrices:
- totalDataFrame of shape (n_features, n_features)
  
  Total-sample SSCP matrix.
- withindict
  
  Within-class SSCP matrices
- pooled: DataFrame of shape (n_features, n_features)
  
  Pooled within-class SSCP matrix.
- betweenDataFrame of shape (n_features, n_features)
  
  Between-class SSCP matrix.
statistics_ (NamedTuple) – Statistics results:
- anovaDataFrame of shape (n_features, 11)
  
  Analysis of variance test.
- manovaDataFrame of shape (4, 5)
  
  Multivariate analysis of variance test.
- average_rsqDataFrame of shape (1, 2)
  
  Average R-square.
- performanceDataFrame of shape (3, 3)
  
  The model global performance.
summary_ (NamedTuple) – Summary informations:
- infosDataFrame of shape (3, 4)
  
  Summary informations (total sample size, number of features, number of classes, total degree of freedom, within-class degree of freedom, between-class degree of freedom).
- totalDataFrame of shape (n_features, 8)
  
  Total-sample statistics, see pandas.Describe.
- withindict
  
  Within-class statistics
svd_ (Namedtuple) – Singular value decomposition:
- value1D array of shape (n_components,)
  
  The eigenvalues
- vectors2D array of shape (n_features, n_components)
  
  The eigenvectors
var_ (NamedTuple) – Variables informations (correlation):
- totalDataFrame of shape (n_features, n_components)
  
  The total-sample correlation of variables with canonical dimensions.
- pooledDataFrame of shape (n_features, n_components)
  
  The pooled-within class correlation of variables with canonical dimensions.
- betweenDataFrame of shape (n_features, n_components)
  
  The between-class correlation of variables with canonical dimensions.

See also

fviz_candisc: Visualize Canonical Discriminant Analysis.
fviz_candisc_biplot: Visualize Canonical Discriminant Analysis (CANDISC) - Biplot of individuals and variables.
fviz_candisc_ind: Visualize Canonical Discriminant Analysis (CANDISC) - Graph of individuals.
fviz_candisc_var: Visualize Canonical Discriminant Analysis (CANDISC) - Graph of variables.
fviz_dist: Visualize distance between barycenter.
summaryCANDISC: Printing summaries of Canonical Discriminant Analysis model.
summaryDA: Printing summaries of Discriminant Analysis model.

References

[1] Lebart Ludovic, Piron Marie, & Morineau Alain (2006), « Statistique Exploratoire Multidimensionnelle », Dunod, Paris 4ed.

[2] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Version 1.0, Université Lumière Lyon 2.

[3] Saporta Gilbert (2011), « Probabilités, Analyse de données et Statistiques », Editions TECHNIP, 3ed.

[4] Tenenhaus Michel (2007), « Statistique - Méthodes pour décrire, expliquer et prévoir », Dunod.

[5] Tenenhaus Michel (1996), « Méthodes statistiques en gestion », Dunod.

[6] Tuffery Stephane (2017), « Data Mining et Statistique décisionelle », Editions TECHNIP, 5ed.

[7] Tuffery Stephane (2025), « Data Science, Statistique et Machine learning », Editions TECHNIP, 6ed.

[8] SAS/STAT User’s Guide (2013), « The CANDISC Procedure », Chapter 31.

Examples

>>> from discrimintools.datasets import load_wine
>>> from discrimintools import CANDISC
>>> D = load_wine() # load training data
>>> y, X = D["Quality"], D.drop(columns=["Quality"]) # split into X and y
>>> clf = CANDISC()
>>> clf.fit(X,y)
CANDISC()
>>> XTest = load_wine("test") # load test data
>>> print(clf.predict(XTest))
1958    bad
Name: prediction, dtype: object

__init__(n_components=2, classes=None, warn_message=True)[source]#

Methods

`__init__`([n_components, classes, warn_message])
`decision_function`(X)	Apply decision function to an input data
`eval_predict`(X, y[, verbose])	Evaluation of the prediction' quality
`fit`(X, y)	Fit the Canonical Discriminant Analysis model
`fit_transform`(X, y)	Fit to data, then transform it
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`pred_table`(X, y)	Prediction table
`predict`(X)	Predict class labels for samples in X
`predict_log_proba`(X)	Return log of posterior probabilities
`predict_proba`(X)	Estimate probability
`score`(X, y)	Return accuracy on the given input data
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Apply the dimensionality reduction on X

discrimintools.CANDISC#

This Page