discrimintools.DiCA#

class discrimintools.DiCA(n_components=2, classes=None)[source]#

Discriminant Correspondence Analysis (DiCA)

Performs Discriminant Correspondence Analysis to classify each observation into groups

This component performs a canonical discriminant analysis (CANDISC) where we want to characterize the groups of individuals (described by a discrete target attribute) from a set of discrete descriptors. This approach is based on a correspondence analysis (CA) on an overall crosstab which os a concatenation of individual crosstabs between the target attribute with each predictive attribute (see [4]). We obtain factor scores both for values of the target attribute and the input ones which enable to explain the relationship between the variables. We obtain also a factor score coefficients which enable to calculate the coordinates of new individuals from their indicator vector description.

Parameters:
  • n_components (int, default = 2) – Number of components to keep. If None, keep all the components.

  • classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.

Returns:

  • call_ (NamedTuple) – Call informations:

    • XtotDataFrame of shape (n_samples, n_columns)

      Input data.

    • XDataFrame of shape (n_samples, n_features)

      Training data.

    • ySeries of shape (n_samples,)

      Target values. True values for X.

    • targetstr

      Name of target.

    • featureslist

      Names of features seen during fit.

    • classeslist

      Names of classes.

    • priorsSeries of shape (n_classes,)

      Priors probabilities

    • n_samplesint

      Number of samples.

    • n_featuresint

      Number of features.

    • n_classesint

      Number of target values.

    • NDataFrame of shape (n_classes, n_categories)

      The contingence table for correspondence analysis.

    • ZDataFrame of shape (n_classes, n_categories)

      The standardized data.

    • totalint

      The total size : total_size = n_samples * n_features

    • row_margeSeries of shape (n_classes,)

      The row margins of frequencies table.

    • col_margeSeries of shape (n_categories,)

      The column margins of frequencies table.

    • max_componentsint

      Maximum number of components.

    • n_componentsint

      Number of components kept.

  • cancoef_ (NamedTuple) – Coefficients of discriminant correspondence analysis:

    • standardizeDataFrame of shape (n_categories, n_components)

      The standardized coefficients.

    • projectionDataFrame of shape (n_categories, n_components)

      The projection coefficients.

  • cancorr_ (DataFrame of shape (2, 4)) – The canonical correlations test.

  • classes_ (NamedTuple) – Classes informations:

    • infosDataFrame of shape (n_classes, 3)

      class level information (frequency, proportion, prior probability).

    • coordDataFrame of shape (n_classes, n_components)

      The class coordinates.

    • euclDataFrame of shape (n_classes, n_classes)

      The squared Euclidean distances between classes.

    • genDataFrame of shape (n_classes, n_classes)

      The squared generalized distances between classes.

  • eig_ (DataFrame of shape (n_components, 4)) – The eigenvalues, the difference between each eigenvalue, the percentage of variance and the cumulative percentage of variance

  • ind_ (NamedTuple) – Individuals informations:

    • coordDataFrame of shape (n_samples, n_components)

      The coordinates of individuals.

    • euclDataFrame of shape (n_samples, n_classes)

      The squared Euclidean distance to origin.

    • genDataFrame of shape (n_samples, n_classes)

      The generalized squared distance to origin.

  • model_ (str, default = ‘dica’) – The model fitted.

  • svd_ (Namedtuple) – Generalized singular value decomposition (GSVD):

    • svd1D array of shape (n_components,)

      The singular values.

    • U2D array of shape (n_samples, n_components)

      The left singular vectors of generalized singular values decomposition.

    • V2D array of shape (n_categories, n_components)

      The right singular vectors of generalized singular values decomposition.

  • var_ (NamedTuple) – Variables informations:

    • coordDataFrame of shape (n_categories, n_components)

      The coordinates of variables.

    • eta2DataFrame of shape (n_features, n_components)

      The square correlation ratio - eta2.

See also

fviz_dica

Visualize Discriminant Correspondence Analysis Analysis.

fviz_dica_biplot

Visualize Discriminant Correspondence Analysis (DiCA) - Biplot of individuals and variables.

fviz_dica_ind

Visualize Discriminant Correspondence Analysis (DiCA) - Graph of individuals.

fviz_dica_quali_var

Visualize Discriminant Correspondence Analysis (DiCA) - Graph of qualitative variables.

fviz_dica_var

Visualize Discriminant Correspondence Analysis (DiCA) - Graph of variables/categories.

fviz_dist

Visualize distance between barycenter.

summaryDiCA

Printing summaries of Discriminant Correspondence Analysis model.

summaryDA

Printing summaries of Discriminant Analysis model.

References

[1] Abdi, H., and Williams, L.J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459.

[2] Abdi, H. and Williams, L.J. (2010). Correspondence analysis. In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of Research Design. Thousand Oaks (CA): Sage. pp. 267-278.

[3] Abdi, H. (2007). Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics.Thousand Oaks (CA): Sage. pp. 907-912.

[5] Ricco Rakotomalala (2020), Pratique de l’Analyse Discriminante Linéaire, Version 1.0, Université Lumière Lyon 2.

Examples

>>> from discrimintools.datasets import load_divay
>>> from discrimintools import DiCA
>>> D = load_divay() # load training data
>>> y, X = D["Region"], D.drop(columns=["Region"]) # split into X and y
>>> clf = DiCA()
>>> clf.fit(X,y)
DiCA()
__init__(n_components=2, classes=None)[source]#

Methods

__init__([n_components, classes])

decision_function(X)

Apply decision function to a an input data

eval_predict(X, y[, verbose])

Evaluation of the prediction' quality

fit(X, y)

Fit the Discriminant Correspondence Analysis model

fit_transform(X, y)

Fit to data, then transform it

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

pred_table(X, y)

Prediction table

predict(X)

Predict class labels for samples in X

predict_log_proba(X)

Return log of posterior probabilities

predict_proba(X)

Estimate probability

score(X, y)

Return accuracy on the given input data

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Apply the dimensionality reduction on X