discrimintools.GFA#

class discrimintools.GFA(n_components=2)[source]#

General Factor Analysis (GFA)

General factor analysis refers to dimensionality reduction technique such as PCA, CA, MCA, FAMD which helps to reduce the number of features in a dataset while keeping the most important informations. For more about generalized factor analysis, see scientisttools.

Parameters:

n_components (int or None, default = 2) – Number of components to keep. If None, keep all the components.

Returns:

  • call_ (NamedTuple) – Call informations:

    • XtotDataFrame of shape (n_samples, n_columns)

      Input data.

    • XDataFrame of shape (n_samples, n_features)

      Training data.

    • dummiesNone or DataFrame

      Disjunctive data.

    • k1int.

      Number of numerics columns.

    • k2int.

      Number of categorical columns.

    • ZDataFrame of shape (n_samples, n_vars)

      Standardize data.

    • ZcDataFrame of shape (n_samples, n_vars)

      Csentered standardize data.

    • ind_weightsSeries of shape (n_samples,)

      Individuals weights.

    • var_weightsSeries of shape (n_vars,)

      Variables weights.

    • centerSeries of shape (n_vars,)

      Mean of variables.

    • z_centerSeries of shape (n_vars,)

      Mean of recoding variables.

    • scaleSeries of shape (n_vars,)

      Scale of variables.

    • denomSeries of shape (n_vars,)

      Number of variables.

    • max_componentsint.

      Maximum number of components.

    • n_componentsint:

      Number of components kept.

  • eig_ (DataFrame of shape (max_components, 4)) – The eigenvalues, the difference between each eigenvalue, the percentage of variance and the cumulative percentage of variance.

  • ind_ (NamedTuple) – Individuals informations:

    • coordDataFrame of shape (n_samples, n_components)

      The individuals coordinates.

  • model_ (str, default = ‘gfa’) – The model fitted.

  • svd_ (NamedTuple) – Generalized singular values decomposition:

    • vs1-D array of shape (max_components,)

      The singular values.

    • U2-D array of shape (n_samples, n_components)

      The left singular vectors.

    • V2-D array of shape (n_vars, n_components)

      The right singular vectors.

  • var_ (NamedTuple) – Variables informations:

    • coordDataFrame of shape (n_vars, n_components)

      The variables coordinates.

See also

GFALDA

General Factor Analysis Linear Discriminant Analysis (GFALDA)

MDA

Mixed Discriminant Analysis (MDA)

MPCA

Mixed Principal Component Analysis (MPCA)

summaryGFA

Printing summaries of General Factor Analysis model.

summaryGFALDA

Printing summaries of General Factor Analysis Linear Discriminant Analysis model.

summaryMDA

Printing summaries of Mixed Discriminant Analysis model.

summaryMPCA

Printing summaries of Mixed Principal Component Analysis model.

References

[1] Bry X. (1996), « Analyses factorielles multiple », Economica.

[2] Bry X. (1999), « Analyses factorielles simples », Economica.

[3] Escofier B., Pagès J. (2023), « Analyses Factorielles Simples et Multiples », 5ed, Dunod

[5] Husson, F., Le, S. and Pages, J. (2010), « Exploratory Multivariate Analysis by Example Using R, Chapman and Hall.

[6] Lebart Ludovic, Piron Marie, & Morineau Alain (2006), « Statistique Exploratoire Multidimensionnelle », Dunod, Paris 4ed.

[7] Pagès J. (2013), « Analyse factorielle multiple avec R : Pratique R », EDP sciences

[8] Rakotomalala, R. (2020), « Pratique des Méthodes Factorielles avec Python », Université Lumière Lyon 2. Version 1.0.

[8] Saporta Gilbert (2011), « Probabilités, Analyse des données et Statistiques », Editions TECHNIP, 3ed.

[9] Tenenhaus, M. (2006), « Statistique : Méthodes pour décrire, expliquer et prévoir. Dunod.

Examples

>>> from discrimintools.datasets import load_alcools, load_canines, load_heart
>>> from discrimintools import GFA
>>> #principal components analysis (PCA)
>>> D = load_alcools("train") # load training data
>>> X = D.drop(columns=["TYPE"]) # extract X
>>> clf = GFA()
>>> clf.fit(X)
GFA()
>>> #multiple correspondence analysis (MCA)
>>> D = load_canines("train") # load training data
>>> X = D.drop(columns=["Fonction"]) # extract X
>>> clf = GFA()
>>> clf.fit(X)
GFA()
>>> #factor analysis of mixed data (FAMD)
>>> D = load_heart("subset") # load subset data
>>> X = D.drop(columns=["disease"]) # extract X
>>> clf = GFA()
>>> clf.fit(X)
GFA()
__init__(n_components=2)[source]#

Methods

__init__([n_components])

fit(X[, y])

Fit the General Factor Analysis Model

fit_transform(X[, y])

Fit the model with X and apply the dimensionality reduction on X

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Apply the dimensionality reduction on X