discrimintools.CANDISC#
- class discrimintools.CANDISC(n_components=2, classes=None, warn_message=True)[source]#
-
Canonical Discriminant Analysis (CANDISC)
Canonical discriminant analysis is a dimension-reduction technique related to principal component analysis and canonical correlation. The methodology that is used in deriving the canonical coefficients parallels that of a one-way multivariate analysis of variance (MANOVA). MANOVA tests for equality of the mean vector across class levels. Canonical discriminant analysis finds linear combinations of the quantitative variables that provide maximal separation between classes or groups. Given a classification variable and several quantitative variables, the CANDISC procedure derives canonical variables, which are linear combinations of the quantitative variables that summarize between-class variation in much the same way that principal components summarize total variation.
The
CANDISCprocedure performs a canonical discriminant analysis, computes squared Mahalanobis distances between class means, and performs both univariate and multivariate one-way analyses of variance.- Parameters:
-
-
n_components (int or None, default = 2) –
Number of components to keep. If None set all components are kept:
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.
warn_message (bool, default = True) – Show warning messages. Raise a warning without making the program crash.
-
- Returns:
-
-
call_ (NamedTuple) – Call informations:
-
- XtotDataFrame of shape (n_samples, n_columns)
-
Input data.
-
- XDataFrame of shape (n_samples, n_features)
-
Training data.
-
- ySeries of shape (n_samples,)
-
Target values. True values for
X.
-
- targetstr
-
Name of target.
-
- featureslist
-
Names of features seen during
fit.
-
- classeslist
-
Names of classes.
-
- priorsSeries of shape (n_classes,)
-
Priors probabilities.
-
- n_samplesint
-
Number of samples.
-
- n_featuresint
-
Number of features.
-
- n_classesint
-
Number of target values
-
- max_componentsint
-
Maximum number of components.
-
- n_componentsint
-
Number of components kept.
-
-
cancoef_ (NamedTuple) – Canonical coefficients:
-
- rawDataFrame of shape (n_features + 1, n_components)
-
Raw canonical coefficients.
-
- totalDataFrame of shape (n_features, n_components)
-
Total canonical coefficients.
-
- pooledDataFrame of shape (n_features, n_components)
-
Pooled canonical coefficients.
-
cancorr_ (DataFrame of shape (n_components, 10)) – The canonical correlations test.
-
classes_ (NamedTuple) – Classes informations:
-
- infosDataFrame of shape (n_classes, 3)
-
class level information (frequency, proportion, prior probability).
-
- centerDataFrame of shape (n_classes, n_features)
-
Class means.
-
- totalDataFrame of shape (n_features, n_classes)
-
Total-sample standardized class means.
-
- pooledDataFrame of shape (n_features, n_classes)
-
Pooled-within class standardized class means.
-
- mahalDataFrame of shape (n_classes, n_classes)
-
Squared Mahalanobis distances between classes.
-
- coordDataFrame of shape (n_classes, n_components)
-
Class coordinates.
-
- euclDataFrame of shape (n_classes, n_classes)
-
The squared Euclidean distance to origin.
-
- genDataFrame shape (n_classes, n_classes)
-
The generalized squared distance to origin.
-
coef_ (DataFrame of shape (n_features + 1, n_classes)) – Linear classification functions coefficients.
-
corr_ (NamedTuple) – Correlation coefficients test:
-
- totalDataFrame of shape (C^{2}_{n_features}, 7)
-
Total-sample correlation coefficients test.
-
- withindict
-
Within-class correlation coefficients test.
-
- pooledDataFrame of shape (C^{2}_{n_features}, 7)
-
Pooled within-class correlation coefficients test.
-
- betweenDataFrame of shape (C^{2}_{n_features}, 7)
-
Between-class correlation coefficients test.
-
-
cov_ (NamedTuple) – Covariance matrices:
-
- totalDataFrame of shape (n_features, n_features)
-
Total-sample covariance matrix.
-
- btotalDataFrame of shape (n_features, n_features)
-
Biased total-sample covariance matrix.
-
- withindict
-
Within-class covariance matrices.
-
- bwithindict
-
Biased within-class covariance matrices.
-
- pooledDataFrame of shape (n_features, n_features)
-
Pooled within-class covariance matrix.
-
- bpooledDataFrame of shape (n_features, n_features)
-
Biased pooled within-class covariance matrix.
-
- betweenDataFrame of shape (n_features, n_features)
-
Between-class covariance matrix
-
- bbetweenDataFrame of shape (n_features, n_features)
-
biased between-class covariance matrix.
-
eig_ (DataFrame of shape (n_components, 4)) – The eigenvalues, the difference between each eigenvalue, the percentage of variance and the cumulative percentage of variance
-
ind_ (NamedTuple) – Individuals informations:
-
- coordDataFrame of shape (n_samples, n_components)
-
The coordinates of individuals.
-
- mahalDataFrame of shape (n_samples, n_classes)
-
The squared Mahalanobis distance to origin.
-
- euclDataFrame of shape (n_samples, n_classes)
-
The squared Euclidean distance to origin.
-
- genDataFrame shape (n_samples, n_classes)
-
The generalized squared distance to origin.
-
- scoresDataFrame of shape (n_samples, n_classes)
-
The total scores of individuals.
-
model_ (str, default = “candisc”) – Name of model fitted.
-
sscp_ (NamedTuple) – Sum of square cross product (SSCP) matrices:
-
- totalDataFrame of shape (n_features, n_features)
-
Total-sample SSCP matrix.
-
- withindict
-
Within-class SSCP matrices
-
- pooled: DataFrame of shape (n_features, n_features)
-
Pooled within-class SSCP matrix.
-
- betweenDataFrame of shape (n_features, n_features)
-
Between-class SSCP matrix.
-
-
statistics_ (NamedTuple) – Statistics results:
-
- anovaDataFrame of shape (n_features, 11)
-
Analysis of variance test.
-
- manovaDataFrame of shape (4, 5)
-
Multivariate analysis of variance test.
-
- average_rsqDataFrame of shape (1, 2)
-
Average R-square.
-
- performanceDataFrame of shape (3, 3)
-
The model global performance.
-
-
summary_ (NamedTuple) – Summary informations:
-
- infosDataFrame of shape (3, 4)
-
Summary informations (total sample size, number of features, number of classes, total degree of freedom, within-class degree of freedom, between-class degree of freedom).
-
- totalDataFrame of shape (n_features, 8)
-
Total-sample statistics, see pandas.Describe.
-
- withindict
-
Within-class statistics
-
-
svd_ (Namedtuple) – Singular value decomposition:
-
- value1D array of shape (n_components,)
-
The eigenvalues
-
- vectors2D array of shape (n_features, n_components)
-
The eigenvectors
-
-
var_ (NamedTuple) – Variables informations (correlation):
-
- totalDataFrame of shape (n_features, n_components)
-
The total-sample correlation of variables with canonical dimensions.
-
- pooledDataFrame of shape (n_features, n_components)
-
The pooled-within class correlation of variables with canonical dimensions.
-
- betweenDataFrame of shape (n_features, n_components)
-
The between-class correlation of variables with canonical dimensions.
-
-
See also
fviz_candisc-
Visualize Canonical Discriminant Analysis.
fviz_candisc_biplot-
Visualize Canonical Discriminant Analysis (CANDISC) - Biplot of individuals and variables.
fviz_candisc_ind-
Visualize Canonical Discriminant Analysis (CANDISC) - Graph of individuals.
fviz_candisc_var-
Visualize Canonical Discriminant Analysis (CANDISC) - Graph of variables.
fviz_dist-
Visualize distance between barycenter.
summaryCANDISC-
Printing summaries of Canonical Discriminant Analysis model.
summaryDA-
Printing summaries of Discriminant Analysis model.
References
[1] Lebart Ludovic, Piron Marie, & Morineau Alain (2006), « Statistique Exploratoire Multidimensionnelle », Dunod, Paris 4ed.
[2] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Version 1.0, Université Lumière Lyon 2.
[3] Saporta Gilbert (2011), « Probabilités, Analyse de données et Statistiques », Editions TECHNIP, 3ed.
[4] Tenenhaus Michel (2007), « Statistique - Méthodes pour décrire, expliquer et prévoir », Dunod.
[5] Tenenhaus Michel (1996), « Méthodes statistiques en gestion », Dunod.
[6] Tuffery Stephane (2017), « Data Mining et Statistique décisionelle », Editions TECHNIP, 5ed.
[7] Tuffery Stephane (2025), « Data Science, Statistique et Machine learning », Editions TECHNIP, 6ed.
[8] SAS/STAT User’s Guide (2013), « The CANDISC Procedure », Chapter 31.
Examples
>>> from discrimintools.datasets import load_wine >>> from discrimintools import CANDISC >>> D = load_wine() # load training data >>> y, X = D["Quality"], D.drop(columns=["Quality"]) # split into X and y >>> clf = CANDISC() >>> clf.fit(X,y) CANDISC() >>> XTest = load_wine("test") # load test data >>> print(clf.predict(XTest)) 1958 bad Name: prediction, dtype: object
Methods
__init__([n_components, classes, warn_message])decision_function(X)Apply decision function to an input data
eval_predict(X, y[, verbose])Evaluation of the prediction' quality
fit(X, y)Fit the Canonical Discriminant Analysis model
fit_transform(X, y)Fit to data, then transform it
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
pred_table(X, y)Prediction table
predict(X)Predict class labels for samples in X
predict_log_proba(X)Return log of posterior probabilities
predict_proba(X)Estimate probability
score(X, y)Return accuracy on the given input data
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Apply the dimensionality reduction on X