discrimintools.GFALDA#

class discrimintools.GFALDA(n_components=2, priors=None, classes=False)[source]#

General Factor Analysis Linear Discriminant Analysis (GFALDA)

Performs a linear discrimination analysis on principal components. It’s a classical linear discriminant analysis (LDA) carried out on the principal components of a general factor analysis (GFA) of explanatory variables. General factor analysis linear discriminant analysis (GFALDA) consists in two steps:

Computation of general factor analysis (GFA) of explanatory variables:
- If all features are numerics, general factor analysis (GFA) is a principal component analysis (PCA),
- if all features are categorics, general factor analysis (GFA) is a multiple correspondence analysis (MCA),
- if mixed features, general factor analysis (GFA) is a factor analysis of mixed data (FAMD).
Computation of linear discriminant analysis (LDA) on principal components extract in step 1.

Parameters:

n_components (int or None, default = 2) – Number of components to keep. If None, keep all the components.
priors (str, 1-D array or Series of shape (n_classes,), default = None) – The priors statement specifies the class prior probabilities of group membership, possibles values:
- ‘equal’ to set the prior probabilities equal.
- ‘prop’ to set the prior probabilities proportional to the sample sizes.
- 1-D array or Series which specify the prior probability for each level of the classification variable.
classes (None, tuple or list, default = None) – Name of level in order to return. If None, classes are sorted in unique values in y.

Returns:

call_ (NamedTuple) – Call informations:
- XtotDataFrame of shape (n_samples, n_columns)
  
  Input data.
- XDataFrame of shape (n_samples, n_features)
  
  Training data.
- ySeries of shape (n_samples,)
  
  Target values. True values for X.
- targetstr
  
  Name of target.
- featureslist
  
  Names of features seen during fit.
- classeslist
  
  Names of classes.
- priorsSeries of shape (n_classes,)
  
  Priors probabilities.
- n_samplesint
  
  Number of samples.
- n_featuresint
  
  Number of features.
- n_classesint
  
  Number of target values
- max_componentsint
  
  Maximum number of components.
- n_componentsint
  
  Number of components kept.
cancoef_ (NamedTuple) – Canonical coefficients:
- standardizedDataFrame of shape (n_variables, n_components)
  
  The standardized canonical coefficients.
- rawDataFrame of shape (n_variables+1, n_componets)
  
  The raw canonical coefficients.
- projectionDataFrame of shape (n_variables+1, n_components)
  
  The projection canonical coefficients.
classes_ (NamedTuple) – Classes informations:
- coordDataFrame of shape (n_classes, n_components)
  
  Class coordinates.
- euclDataFrame of shape (n_classes, n_classes)
  
  The squared Euclidean distance to origin.
- genDataFrame shape (n_classes, n_classes)
  
  The generalized squared distance to origin.
coef_ (NamedTuple) – Linear discriminant coefficients:
- standardizedDataFrame of shape (n_variables, n_classes)
  
  The standardized coefficients.
- rawDataFrame of shape (n_variables+1, n_classes)
  
  The raw coefficients.
- projectionDataFrame of shape (n_variables+1, n_classes)
  
  The projection coefficients.
ind_ (NamedTuple) – Individuals informations:
- coordDataFrame of shape (n_samples, n_components)
  
  Individuals coordinates.
- scoresDataFrame of shape (n_samples, n_classes)
  
  The scores of individuals.
- projectionDataFrame of shape (n_samples, n_classes)
  
  The projection of individuals.
- euclDataFrame of shape (n_samples, n_classes)
  
  The squared Euclidean distance to origin.
- genDataFrame shape (n_samples, n_classes)
  
  The generalized squared distance to origin.
model_ (str, default = ‘gfalda’) – The model fitted.
pipe_ (a sequence of data transformers with two named_steps :) –
- gfa : generalized factor analysis (GFA)
- lda : linear discriminant analysis (LDA)

See also

GFA: General Factor Analysis (GFA)
MDA: Mixed Discriminant Analysis (MDA)
MPCA: Mixed Principal Component Analysis (MPCA)
summaryGFA: Printing summaries of General Factor Analysis model.
summaryGFA: Printing summaries of General Factor Analysis model.
summaryMDA: Printing summaries of Mixed Discriminant Analysis model.
summaryMPCA: Printing summaries of Mixed Principal Component Analysis model.

References

[1] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Université Lumière Lyon 2, Version 1.0.

Examples

>>> from discrimintools.datasets import load_alcools, load_vote, load_heart
>>> from discrimintools import GFALDA
>>> #PCA + LDA = PCALDA
>>> D = load_alcools("train")
>>> y, X = D["TYPE"], D.drop(columns=["TYPE"])
>>> clf = GFALDA()
>>> clf.fit(X,y)
GFALDA()
>>> #MCA + LDA = DISQUAL
>>> D = load_vote("train")
>>> y, X = D["group"], D.drop(columns=["group"])
>>> clf = GFALDA()
>>> clf.fit(X,y)
GFALDA()
>>> #FAMD + LDA = DISMIX
>>> D = load_heart("subset")
>>> y, X = D["disease"], D.drop(columns=["disease"])
>>> clf = GFALDA()
>>> clf.fit(X,y)
GFALDA()

__init__(n_components=2, priors=None, classes=False)[source]#

Methods

`__init__`([n_components, priors, classes])
`decision_function`(X)	Apply decision function to an input data
`eval_predict`(X, y[, verbose])	Evaluation of the prediction' quality
`fit`(X, y)	Fit the General Factor Analysis Linear Discriminant Analysis Model
`fit_transform`(X, y)	Fit to data, then transform it
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`pred_table`(X, y)	Prediction table
`predict`(X)	Predict class labels for samples in X
`predict_log_proba`(X)	Return log of posterior probabilities
`predict_proba`(X)	Estimate probability
`score`(X, y)	Return accuracy on the given input data
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Project data to maximize class separation

discrimintools.GFALDA#

This Page