discrimintools.DiCA#

class discrimintools.DiCA(n_components=2, classes=None)[source]#

Discriminant Correspondence Analysis (DiCA)

Performs Discriminant Correspondence Analysis to classify each observation into groups

This component performs a canonical discriminant analysis (CANDISC) where we want to characterize the groups of individuals (described by a discrete target attribute) from a set of discrete descriptors. This approach is based on a correspondence analysis (CA) on an overall crosstab which os a concatenation of individual crosstabs between the target attribute with each predictive attribute (see [4]). We obtain factor scores both for values of the target attribute and the input ones which enable to explain the relationship between the variables. We obtain also a factor score coefficients which enable to calculate the coordinates of new individuals from their indicator vector description.

Parameters:

n_components (int, default = 2) – Number of components to keep. If None, keep all the components.
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.

Returns:

call_ (NamedTuple) – Call informations:
- XtotDataFrame of shape (n_samples, n_columns)
  
  Input data.
- XDataFrame of shape (n_samples, n_features)
  
  Training data.
- ySeries of shape (n_samples,)
  
  Target values. True values for X.
- targetstr
  
  Name of target.
- featureslist
  
  Names of features seen during fit.
- classeslist
  
  Names of classes.
- priorsSeries of shape (n_classes,)
  
  Priors probabilities
- n_samplesint
  
  Number of samples.
- n_featuresint
  
  Number of features.
- n_classesint
  
  Number of target values.
- NDataFrame of shape (n_classes, n_categories)
  
  The contingence table for correspondence analysis.
- ZDataFrame of shape (n_classes, n_categories)
  
  The standardized data.
- totalint
  
  The total size : total_size = n_samples * n_features
- row_margeSeries of shape (n_classes,)
  
  The row margins of frequencies table.
- col_margeSeries of shape (n_categories,)
  
  The column margins of frequencies table.
- max_componentsint
  
  Maximum number of components.
- n_componentsint
  
  Number of components kept.
cancoef_ (NamedTuple) – Coefficients of discriminant correspondence analysis:
- standardizeDataFrame of shape (n_categories, n_components)
  
  The standardized coefficients.
- projectionDataFrame of shape (n_categories, n_components)
  
  The projection coefficients.
cancorr_ (DataFrame of shape (2, 4)) – The canonical correlations test.
classes_ (NamedTuple) – Classes informations:
- infosDataFrame of shape (n_classes, 3)
  
  class level information (frequency, proportion, prior probability).
- coordDataFrame of shape (n_classes, n_components)
  
  The class coordinates.
- euclDataFrame of shape (n_classes, n_classes)
  
  The squared Euclidean distances between classes.
- genDataFrame of shape (n_classes, n_classes)
  
  The squared generalized distances between classes.
eig_ (DataFrame of shape (n_components, 4)) – The eigenvalues, the difference between each eigenvalue, the percentage of variance and the cumulative percentage of variance
ind_ (NamedTuple) – Individuals informations:
- coordDataFrame of shape (n_samples, n_components)
  
  The coordinates of individuals.
- euclDataFrame of shape (n_samples, n_classes)
  
  The squared Euclidean distance to origin.
- genDataFrame of shape (n_samples, n_classes)
  
  The generalized squared distance to origin.
model_ (str, default = ‘dica’) – The model fitted.
svd_ (Namedtuple) – Generalized singular value decomposition (GSVD):
- svd1D array of shape (n_components,)
  
  The singular values.
- U2D array of shape (n_samples, n_components)
  
  The left singular vectors of generalized singular values decomposition.
- V2D array of shape (n_categories, n_components)
  
  The right singular vectors of generalized singular values decomposition.
var_ (NamedTuple) – Variables informations:
- coordDataFrame of shape (n_categories, n_components)
  
  The coordinates of variables.
- eta2DataFrame of shape (n_features, n_components)
  
  The square correlation ratio - eta2.

See also

fviz_dica: Visualize Discriminant Correspondence Analysis Analysis.
fviz_dica_biplot: Visualize Discriminant Correspondence Analysis (DiCA) - Biplot of individuals and variables.
fviz_dica_ind: Visualize Discriminant Correspondence Analysis (DiCA) - Graph of individuals.
fviz_dica_quali_var: Visualize Discriminant Correspondence Analysis (DiCA) - Graph of qualitative variables.
fviz_dica_var: Visualize Discriminant Correspondence Analysis (DiCA) - Graph of variables/categories.
fviz_dist: Visualize distance between barycenter.
summaryDiCA: Printing summaries of Discriminant Correspondence Analysis model.
summaryDA: Printing summaries of Discriminant Analysis model.

References

[1] Abdi, H., and Williams, L.J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459.

[2] Abdi, H. and Williams, L.J. (2010). Correspondence analysis. In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of Research Design. Thousand Oaks (CA): Sage. pp. 267-278.

[3] Abdi, H. (2007). Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics.Thousand Oaks (CA): Sage. pp. 907-912.

[5] Ricco Rakotomalala (2020), Pratique de l’Analyse Discriminante Linéaire, Version 1.0, Université Lumière Lyon 2.

Examples

>>> from discrimintools.datasets import load_divay
>>> from discrimintools import DiCA
>>> D = load_divay() # load training data
>>> y, X = D["Region"], D.drop(columns=["Region"]) # split into X and y
>>> clf = DiCA()
>>> clf.fit(X,y)
DiCA()

__init__(n_components=2, classes=None)[source]#

Methods

`__init__`([n_components, classes])
`decision_function`(X)	Apply decision function to a an input data
`eval_predict`(X, y[, verbose])	Evaluation of the prediction' quality
`fit`(X, y)	Fit the Discriminant Correspondence Analysis model
`fit_transform`(X, y)	Fit to data, then transform it
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`pred_table`(X, y)	Prediction table
`predict`(X)	Predict class labels for samples in X
`predict_log_proba`(X)	Return log of posterior probabilities
`predict_proba`(X)	Estimate probability
`score`(X, y)	Return accuracy on the given input data
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Apply the dimensionality reduction on X

discrimintools.DiCA#

This Page