discrimintools.GFALDA#
- class discrimintools.GFALDA(n_components=2, priors=None, classes=False)[source]#
-
General Factor Analysis Linear Discriminant Analysis (GFALDA)
Performs a linear discrimination analysis on principal components. It’s a classical linear discriminant analysis (LDA) carried out on the principal components of a general factor analysis (GFA) of explanatory variables. General factor analysis linear discriminant analysis (GFALDA) consists in two steps:
-
- Computation of general factor analysis (GFA) of explanatory variables:
-
If all features are numerics, general factor analysis (GFA) is a principal component analysis (PCA),
if all features are categorics, general factor analysis (GFA) is a multiple correspondence analysis (MCA),
if mixed features, general factor analysis (GFA) is a factor analysis of mixed data (FAMD).
Computation of linear discriminant analysis (LDA) on principal components extract in step 1.
- Parameters:
-
n_components (int or None, default = 2) – Number of components to keep. If
None, keep all the components.-
priors (str, 1-D array or Series of shape (n_classes,), default = None) – The priors statement specifies the class prior probabilities of group membership, possibles values:
‘equal’ to set the prior probabilities equal.
‘prop’ to set the prior probabilities proportional to the sample sizes.
1-D array or Series which specify the prior probability for each level of the classification variable.
classes (None, tuple or list, default = None) – Name of level in order to return. If
None, classes are sorted in unique values in y.
- Returns:
-
-
call_ (NamedTuple) – Call informations:
-
- XtotDataFrame of shape (n_samples, n_columns)
-
Input data.
-
- XDataFrame of shape (n_samples, n_features)
-
Training data.
-
- ySeries of shape (n_samples,)
-
Target values. True values for
X.
-
- targetstr
-
Name of target.
-
- featureslist
-
Names of features seen during
fit.
-
- classeslist
-
Names of classes.
-
- priorsSeries of shape (n_classes,)
-
Priors probabilities.
-
- n_samplesint
-
Number of samples.
-
- n_featuresint
-
Number of features.
-
- n_classesint
-
Number of target values
-
- max_componentsint
-
Maximum number of components.
-
- n_componentsint
-
Number of components kept.
-
-
cancoef_ (NamedTuple) – Canonical coefficients:
-
- standardizedDataFrame of shape (n_variables, n_components)
-
The standardized canonical coefficients.
-
- rawDataFrame of shape (n_variables+1, n_componets)
-
The raw canonical coefficients.
-
- projectionDataFrame of shape (n_variables+1, n_components)
-
The projection canonical coefficients.
-
-
classes_ (NamedTuple) – Classes informations:
-
- coordDataFrame of shape (n_classes, n_components)
-
Class coordinates.
-
- euclDataFrame of shape (n_classes, n_classes)
-
The squared Euclidean distance to origin.
-
- genDataFrame shape (n_classes, n_classes)
-
The generalized squared distance to origin.
-
-
coef_ (NamedTuple) – Linear discriminant coefficients:
-
- standardizedDataFrame of shape (n_variables, n_classes)
-
The standardized coefficients.
-
- rawDataFrame of shape (n_variables+1, n_classes)
-
The raw coefficients.
-
- projectionDataFrame of shape (n_variables+1, n_classes)
-
The projection coefficients.
-
-
ind_ (NamedTuple) – Individuals informations:
-
- coordDataFrame of shape (n_samples, n_components)
-
Individuals coordinates.
-
- scoresDataFrame of shape (n_samples, n_classes)
-
The scores of individuals.
-
- projectionDataFrame of shape (n_samples, n_classes)
-
The projection of individuals.
-
- euclDataFrame of shape (n_samples, n_classes)
-
The squared Euclidean distance to origin.
-
- genDataFrame shape (n_samples, n_classes)
-
The generalized squared distance to origin.
-
model_ (str, default = ‘gfalda’) – The model fitted.
-
pipe_ (a sequence of data transformers with two named_steps :) –
gfa : generalized factor analysis (GFA)
lda : linear discriminant analysis (LDA)
-
See also
GFA-
General Factor Analysis (GFA)
MDA-
Mixed Discriminant Analysis (MDA)
MPCA-
Mixed Principal Component Analysis (MPCA)
summaryGFA-
Printing summaries of General Factor Analysis model.
summaryGFA-
Printing summaries of General Factor Analysis model.
summaryMDA-
Printing summaries of Mixed Discriminant Analysis model.
summaryMPCA-
Printing summaries of Mixed Principal Component Analysis model.
References
[1] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Université Lumière Lyon 2, Version 1.0.
Examples
>>> from discrimintools.datasets import load_alcools, load_vote, load_heart >>> from discrimintools import GFALDA >>> #PCA + LDA = PCALDA >>> D = load_alcools("train") >>> y, X = D["TYPE"], D.drop(columns=["TYPE"]) >>> clf = GFALDA() >>> clf.fit(X,y) GFALDA() >>> #MCA + LDA = DISQUAL >>> D = load_vote("train") >>> y, X = D["group"], D.drop(columns=["group"]) >>> clf = GFALDA() >>> clf.fit(X,y) GFALDA() >>> #FAMD + LDA = DISMIX >>> D = load_heart("subset") >>> y, X = D["disease"], D.drop(columns=["disease"]) >>> clf = GFALDA() >>> clf.fit(X,y) GFALDA()
Methods
__init__([n_components, priors, classes])decision_function(X)Apply decision function to an input data
eval_predict(X, y[, verbose])Evaluation of the prediction' quality
fit(X, y)Fit the General Factor Analysis Linear Discriminant Analysis Model
fit_transform(X, y)Fit to data, then transform it
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
pred_table(X, y)Prediction table
predict(X)Predict class labels for samples in X
predict_log_proba(X)Return log of posterior probabilities
predict_proba(X)Estimate probability
score(X, y)Return accuracy on the given input data
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Project data to maximize class separation
-