discrimintools.STEPDISC#

class discrimintools.STEPDISC(method='forward', alpha=0.01, lambda_init=None, verbose=True)[source]#

Stepwise Discriminant Analysis (STEPDISC)

Given a classification variable and several quantitative variables, the STEPDISC class performs a stepwise discriminant analysis to select a subset of the quantitative variables for use in discriminating among the classes. The set of variables that make up each class is assumed to be multivariate normal with a common covariance matrix. The STEPDISC class can use forward selection and backward elimination, which is a useful prelude to further analyses with the CANDISC class or the DISCRIM class.

With STEPDISC, variables are chosen to enter or leave the model according to the significance level of an F test from an analysis of covariance, where the variables already chosen act as covariates and the variable under consideration is the dependent variable. Two selection methods are available: ‘forward’ and ‘backward’:

  1. Forward selection begins with no variables in the model. At each step, STEPDISC enters the variable that contributes most to the discriminatory power of the model as measured by Wilks’ lambda, the likelihood ratio criterion. When none of the unselected variables meet the entry criterion, the forward selection process stops.

  2. Backward elimination begins with all variables in the model except those that are linearly dependent on previous variables in the VAR statement. At each step, the variable that contributes least to the discriminatory power of the model as measured by Wilks’ lambda is removed. When all remaining variables meet the criterion to stay in the model, the backward elimination process stops.

Parameters:
  • method ({‘backward’,’forward’}, default=’forward’) – The feature selection method to be used, possible values: - “forward” for forward selection, - “backward” for backward elimination

  • alpha (float, default = 1e-2) – The significance level for adding or retaining variables in stepwise variable selection.

  • lambda_init (None or float, default = None) – Initial Wilks Lambda.

  • verbose (bool, default=True) – If True, print intermediary steps during feature selection (default)

Returns:

  • call_ (NamedTuple) – Call informations:

    • objclass

      An object of class CANDISC, DISCRIM

    • alphafloat

      The significance level for adding or retaining variables in stepwise variable selection.

    • targetstr

      Name of target.

    • classeslist

      Names of classes

    • priorsSeries of shape (n_classes,)

      Priors probabilities.

  • disc_ (class) – An object of class CANDISC or DISCRIM

  • model_ (str, default = “stepdisc”) – Name of model fitted.

  • summary_ (NamedTuple) – Stepwise summary informations:

    • summaryDataFrame of shape (n_selected, 6)

      Summary of stepwise selection

    • selectedlist

      Selected variables

    • removedlist

      Removed variables

See also

CANDISC

Canonical Discriminant Analysis (CANDISC)

DISCRIM

Discriminant Analysis (linear and quadratic).

summaryCANDISC

Printing summaries of Canonical Discriminant Analysis model.

summaryDISCRIM

Printing summaries of Discriminant Analysis (linear and quadratic) model.

References

[1] Ricco Rakotomalala (2008), « STEPDISC - Feature selection for LDA », Université Lumière Lyon 2.

[2] Ricco Rakotomalala (2012), « Linear Discriminant Analysis - Tools comparison », Université Lumière Lyon 2.

[3] Ricco Rakotomalala (2014), « Linear discriminant analysis (slides) », Université Lumière Lyon 2.

[4] Ricco Rakotomalala (2020), « Pratique de l’Analyse Discriminante Linéaire », Version 1.0, Université Lumière Lyon 2.

[5] SAS/STAT 13.1 User’s Guide (2013), « The STEPDISC Procedure », Chapter 93.

Examples

>>> from discrimintools.datasets import load_heart
>>> from discrimintools import DISCRIM, STEPDISC
>>> D = load_heart("train") # load training data
>>> y, X = D["disease"], D.drop(columns=["disease"]) # split into X and y
>>> clf = DISCRIM(method="linear")
>>> clf.fit(X,y)
>>> clf2 = STEPDISC(method="forward",alpha=0.01,verbose=True)
>>> clf2.fit(clf)
STEPDISC()
__init__(method='forward', alpha=0.01, lambda_init=None, verbose=True)[source]#

Methods

__init__([method, alpha, lambda_init, verbose])

decision_function(X)

Apply decision function to an input data

eval_predict(X, y[, verbose])

Evaluation of the prediction' quality

fit(obj)

Fit Stepwise Discriminant Analysis procedure

fit_transform(obj)

Fits transformer to X and returns a transformed version of samples.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

pred_table(X, y)

Prediction table

predict(X)

Predict class labels for samples in X

predict_log_proba(X)

Return log of posterior probabilities

predict_proba(X)

Estimate probability

score(X, y)

Return accuracy on the given input data

set_fit_request(*[, obj])

Configure whether metadata should be requested to be passed to the fit method.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Project data to maximize class separation or dimensionality reduction