discrimintools.GFALDA#

class discrimintools.GFALDA(n_components=2, priors=None, classes=False)[source]#

General Factor Analysis Linear Discriminant Analysis (GFALDA)

Performs a linear discrimination analysis on principal components. It’s a classical linear discriminant analysis (LDA) carried out on the principal components of a general factor analysis (GFA) of explanatory variables. General factor analysis linear discriminant analysis (GFALDA) consists in two steps:

  1. Computation of general factor analysis (GFA) of explanatory variables:
    • If all features are numerics, general factor analysis (GFA) is a principal component analysis (PCA),

    • if all features are categorics, general factor analysis (GFA) is a multiple correspondence analysis (MCA),

    • if mixed features, general factor analysis (GFA) is a factor analysis of mixed data (FAMD).

  2. Computation of linear discriminant analysis (LDA) on principal components extract in step 1.

Parameters:
  • n_components (int or None, default = 2) – Number of components to keep. If None, keep all the components.

  • priors (str, 1-D array or Series of shape (n_classes,), default = None) – The priors statement specifies the class prior probabilities of group membership, possibles values:

    • ‘equal’ to set the prior probabilities equal.

    • ‘prop’ to set the prior probabilities proportional to the sample sizes.

    • 1-D array or Series which specify the prior probability for each level of the classification variable.

  • classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.

Returns:

  • call_ (NamedTuple) – Call informations:

    • XtotDataFrame of shape (n_samples, n_columns)

      Input data.

    • XDataFrame of shape (n_samples, n_features)

      Training data.

    • ySeries of shape (n_samples,)

      Target values. True values for X.

    • targetstr

      Name of target.

    • featureslist

      Names of features seen during fit.

    • classeslist

      Names of classes.

    • priorsSeries of shape (n_classes,)

      Priors probabilities.

    • n_samplesint

      Number of samples.

    • n_featuresint

      Number of features.

    • n_classesint

      Number of target values

    • max_componentsint

      Maximum number of components.

    • n_componentsint

      Number of components kept.

  • cancoef_ (NamedTuple) – Canonical coefficients:

    • standardizedDataFrame of shape (n_variables, n_components)

      The standardized canonical coefficients.

    • rawDataFrame of shape (n_variables+1, n_componets)

      The raw canonical coefficients.

    • projectionDataFrame of shape (n_variables+1, n_components)

      The projection canonical coefficients.

  • classes_ (NamedTuple) – Classes informations:

    • coordDataFrame of shape (n_classes, n_components)

      Class coordinates.

    • euclDataFrame of shape (n_classes, n_classes)

      The squared Euclidean distance to origin.

    • genDataFrame shape (n_classes, n_classes)

      The generalized squared distance to origin.

  • coef_ (NamedTuple) – Linear discriminant coefficients:

    • standardizedDataFrame of shape (n_variables, n_classes)

      The standardized coefficients.

    • rawDataFrame of shape (n_variables+1, n_classes)

      The raw coefficients.

    • projectionDataFrame of shape (n_variables+1, n_classes)

      The projection coefficients.

  • ind_ (NamedTuple) – Individuals informations:

    • coordDataFrame of shape (n_samples, n_components)

      Individuals coordinates.

    • scoresDataFrame of shape (n_samples, n_classes)

      The scores of individuals.

    • projectionDataFrame of shape (n_samples, n_classes)

      The projection of individuals.

    • euclDataFrame of shape (n_samples, n_classes)

      The squared Euclidean distance to origin.

    • genDataFrame shape (n_samples, n_classes)

      The generalized squared distance to origin.

  • model_ (str, default = ‘gfalda’) – The model fitted.

  • pipe_ (a sequence of data transformers with two named_steps :) –

    • gfa : generalized factor analysis (GFA)

    • lda : linear discriminant analysis (LDA)

See also

GFA

General Factor Analysis (GFA)

MDA

Mixed Discriminant Analysis (MDA)

MPCA

Mixed Principal Component Analysis (MPCA)

summaryGFA

Printing summaries of General Factor Analysis model.

summaryGFA

Printing summaries of General Factor Analysis model.

summaryMDA

Printing summaries of Mixed Discriminant Analysis model.

summaryMPCA

Printing summaries of Mixed Principal Component Analysis model.

References

[1] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Université Lumière Lyon 2, Version 1.0.

Examples

>>> from discrimintools.datasets import load_alcools, load_vote, load_heart
>>> from discrimintools import GFALDA
>>> #PCA + LDA = PCALDA
>>> D = load_alcools("train")
>>> y, X = D["TYPE"], D.drop(columns=["TYPE"])
>>> clf = GFALDA()
>>> clf.fit(X,y)
GFALDA()
>>> #MCA + LDA = DISQUAL
>>> D = load_vote("train")
>>> y, X = D["group"], D.drop(columns=["group"])
>>> clf = GFALDA()
>>> clf.fit(X,y)
GFALDA()
>>> #FAMD + LDA = DISMIX
>>> D = load_heart("subset")
>>> y, X = D["disease"], D.drop(columns=["disease"])
>>> clf = GFALDA()
>>> clf.fit(X,y)
GFALDA()
__init__(n_components=2, priors=None, classes=False)[source]#

Methods

__init__([n_components, priors, classes])

decision_function(X)

Apply decision function to an input data

eval_predict(X, y[, verbose])

Evaluation of the prediction' quality

fit(X, y)

Fit the General Factor Analysis Linear Discriminant Analysis Model

fit_transform(X, y)

Fit to data, then transform it

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

pred_table(X, y)

Prediction table

predict(X)

Predict class labels for samples in X

predict_log_proba(X)

Return log of posterior probabilities

predict_proba(X)

Estimate probability

score(X, y)

Return accuracy on the given input data

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Project data to maximize class separation