discrimintools.DiCA#
- class discrimintools.DiCA(n_components=2, classes=None)[source]#
-
Discriminant Correspondence Analysis (DiCA)
Performs Discriminant Correspondence Analysis to classify each observation into groups
This component performs a canonical discriminant analysis (CANDISC) where we want to characterize the groups of individuals (described by a discrete target attribute) from a set of discrete descriptors. This approach is based on a correspondence analysis (CA) on an overall crosstab which os a concatenation of individual crosstabs between the target attribute with each predictive attribute (see [4]). We obtain factor scores both for values of the target attribute and the input ones which enable to explain the relationship between the variables. We obtain also a factor score coefficients which enable to calculate the coordinates of new individuals from their indicator vector description.
- Parameters:
-
n_components (int, default = 2) – Number of components to keep. If None, keep all the components.
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.
- Returns:
-
-
call_ (NamedTuple) – Call informations:
-
- XtotDataFrame of shape (n_samples, n_columns)
-
Input data.
-
- XDataFrame of shape (n_samples, n_features)
-
Training data.
-
- ySeries of shape (n_samples,)
-
Target values. True values for
X.
-
- targetstr
-
Name of target.
-
- featureslist
-
Names of features seen during
fit.
-
- classeslist
-
Names of classes.
-
- priorsSeries of shape (n_classes,)
-
Priors probabilities
-
- n_samplesint
-
Number of samples.
-
- n_featuresint
-
Number of features.
-
- n_classesint
-
Number of target values.
-
- NDataFrame of shape (n_classes, n_categories)
-
The contingence table for correspondence analysis.
-
- ZDataFrame of shape (n_classes, n_categories)
-
The standardized data.
-
- totalint
-
The total size :
total_size = n_samples * n_features
-
- row_margeSeries of shape (n_classes,)
-
The row margins of frequencies table.
-
- col_margeSeries of shape (n_categories,)
-
The column margins of frequencies table.
-
- max_componentsint
-
Maximum number of components.
-
- n_componentsint
-
Number of components kept.
-
-
cancoef_ (NamedTuple) – Coefficients of discriminant correspondence analysis:
-
- standardizeDataFrame of shape (n_categories, n_components)
-
The standardized coefficients.
-
- projectionDataFrame of shape (n_categories, n_components)
-
The projection coefficients.
-
cancorr_ (DataFrame of shape (2, 4)) – The canonical correlations test.
-
classes_ (NamedTuple) – Classes informations:
-
- infosDataFrame of shape (n_classes, 3)
-
class level information (frequency, proportion, prior probability).
-
- coordDataFrame of shape (n_classes, n_components)
-
The class coordinates.
-
- euclDataFrame of shape (n_classes, n_classes)
-
The squared Euclidean distances between classes.
-
- genDataFrame of shape (n_classes, n_classes)
-
The squared generalized distances between classes.
-
eig_ (DataFrame of shape (n_components, 4)) – The eigenvalues, the difference between each eigenvalue, the percentage of variance and the cumulative percentage of variance
-
ind_ (NamedTuple) – Individuals informations:
-
- coordDataFrame of shape (n_samples, n_components)
-
The coordinates of individuals.
-
- euclDataFrame of shape (n_samples, n_classes)
-
The squared Euclidean distance to origin.
-
- genDataFrame of shape (n_samples, n_classes)
-
The generalized squared distance to origin.
-
model_ (str, default = ‘dica’) – The model fitted.
-
svd_ (Namedtuple) – Generalized singular value decomposition (GSVD):
-
- svd1D array of shape (n_components,)
-
The singular values.
-
- U2D array of shape (n_samples, n_components)
-
The left singular vectors of generalized singular values decomposition.
-
- V2D array of shape (n_categories, n_components)
-
The right singular vectors of generalized singular values decomposition.
-
-
var_ (NamedTuple) – Variables informations:
-
- coordDataFrame of shape (n_categories, n_components)
-
The coordinates of variables.
-
- eta2DataFrame of shape (n_features, n_components)
-
The square correlation ratio - eta2.
-
-
See also
fviz_dica-
Visualize Discriminant Correspondence Analysis Analysis.
fviz_dica_biplot-
Visualize Discriminant Correspondence Analysis (DiCA) - Biplot of individuals and variables.
fviz_dica_ind-
Visualize Discriminant Correspondence Analysis (DiCA) - Graph of individuals.
fviz_dica_quali_var-
Visualize Discriminant Correspondence Analysis (DiCA) - Graph of qualitative variables.
fviz_dica_var-
Visualize Discriminant Correspondence Analysis (DiCA) - Graph of variables/categories.
fviz_dist-
Visualize distance between barycenter.
summaryDiCA-
Printing summaries of Discriminant Correspondence Analysis model.
summaryDA-
Printing summaries of Discriminant Analysis model.
References
[1] Abdi, H., and Williams, L.J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459.
[2] Abdi, H. and Williams, L.J. (2010). Correspondence analysis. In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of Research Design. Thousand Oaks (CA): Sage. pp. 267-278.
[3] Abdi, H. (2007). Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics.Thousand Oaks (CA): Sage. pp. 907-912.
[5] Ricco Rakotomalala (2020), Pratique de l’Analyse Discriminante Linéaire, Version 1.0, Université Lumière Lyon 2.
Examples
>>> from discrimintools.datasets import load_divay >>> from discrimintools import DiCA >>> D = load_divay() # load training data >>> y, X = D["Region"], D.drop(columns=["Region"]) # split into X and y >>> clf = DiCA() >>> clf.fit(X,y) DiCA()
Methods
__init__([n_components, classes])decision_function(X)Apply decision function to a an input data
eval_predict(X, y[, verbose])Evaluation of the prediction' quality
fit(X, y)Fit the Discriminant Correspondence Analysis model
fit_transform(X, y)Fit to data, then transform it
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
pred_table(X, y)Prediction table
predict(X)Predict class labels for samples in X
predict_log_proba(X)Return log of posterior probabilities
predict_proba(X)Estimate probability
score(X, y)Return accuracy on the given input data
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Apply the dimensionality reduction on X