GFALDA PCADA - alcools dataset#

[1]:

#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")

alcools dataset#

[ ]:

#alcools dataset
from discrimintools.datasets import load_alcools
D = load_alcools()
D.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   TYPE    52 non-null     object
 1   MEOH    52 non-null     float64
 2   ACET    52 non-null     float64
 3   BU1     52 non-null     float64
 4   BU2     52 non-null     float64
 5   ISOP    52 non-null     int64
 6   MEPR    52 non-null     float64
 7   PRO1    52 non-null     float64
 8   ACAL    52 non-null     float64
dtypes: float64(7), int64(1), object(1)
memory usage: 3.8+ KB

[3]:

#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])

Instanciation and training#

[4]:

#instanciation and training
from discrimintools import GFALDA
clf = GFALDA(n_components=2)

`fit` function#

[5]:

#fit function
clf.fit(X,y)

[5]:

GFALDA()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Canonical coefficients#

[6]:

#canonical coefficients
cancoef = clf.cancoef_
cancoef._fields

[6]:

('standardized', 'raw', 'projection')

Standardized canonical coefficients#

[7]:

#standardized canonical coefficients
print(cancoef.standardized)

          Can1      Can2
MEOH  0.520673  0.045841
ACET  0.060796  0.347537
BU1   0.456099 -0.026945
BU2  -0.029491  0.576889
ISOP  0.459790  0.009933
MEPR  0.482844  0.087627
PRO1 -0.195016  0.645653
ACAL  0.183659  0.344884

Pojection canonical coefficients#

[8]:

#projection canonical coefficients
print(cancoef.projection)

          Can1      Can2
MEOH  0.065084  0.005730
ACET  0.007599  0.043442
BU1   0.057012 -0.003368
BU2  -0.003686  0.072111
ISOP  0.057474  0.001242
MEPR  0.060356  0.010953
PRO1 -0.024377  0.080707
ACAL  0.022957  0.043110

Raw canonical coefficients#

[9]:

#raw canonical coefficients
print(cancoef.raw)

              Can1      Can2
Constant -3.927165 -2.192304
MEOH      0.001422  0.000125
ACET      0.000503  0.002873
BU1       0.041763 -0.002467
BU2      -0.000547  0.010691
ISOP      0.009610  0.000208
MEPR      0.026395  0.004790
PRO1     -0.000314  0.001041
ACAL      0.023025  0.043237

Coefficients#

[10]:

#coefficients
coef = clf.coef_
coef._fields

[10]:

('standardized', 'raw', 'projection')

Standardized coefficients#

[11]:

#standardized coefficients
print(coef.standardized)

            KIRSCH     MIRAB     POIRE
Constant -2.199936 -1.311627 -1.560770
MEOH     -0.658592  0.081985  0.498314
ACET     -0.038280 -0.071141  0.085894
BU1      -0.584486  0.087645  0.431080
BU2       0.102704 -0.141331  0.018700
ISOP     -0.585029  0.079604  0.437572
MEPR     -0.605651  0.065386  0.465763
PRO1      0.321480 -0.187052 -0.132969
ACAL     -0.195208 -0.048617  0.202390

Projection coefficients#

[12]:

#projection coefficients
print(coef.projection)

            KIRSCH     MIRAB     POIRE
Constant -2.199936 -1.311627 -1.560770
MEOH     -0.082324  0.010248  0.062289
ACET     -0.004785 -0.008893  0.010737
BU1      -0.073061  0.010956  0.053885
BU2       0.012838 -0.017666  0.002337
ISOP     -0.073129  0.009950  0.054696
MEPR     -0.075706  0.008173  0.058220
PRO1      0.040185 -0.023381 -0.016621
ACAL     -0.024401 -0.006077  0.025299

Raw coefficients#

[13]:

#raw coefficients
print(coef.raw)

            KIRSCH     MIRAB     POIRE
Constant  2.559072 -1.494437 -5.468820
MEOH     -0.001798  0.000224  0.001361
ACET     -0.000316 -0.000588  0.000710
BU1      -0.053519  0.008025  0.039472
BU2       0.001903 -0.002619  0.000347
ISOP     -0.012228  0.001664  0.009146
MEPR     -0.033108  0.003574  0.025461
PRO1      0.000518 -0.000302 -0.000214
ACAL     -0.024473 -0.006095  0.025373

Summary#

[14]:

#summary
from discrimintools import summaryGFALDA
summaryGFALDA(clf,detailed=True)

                     General Factor Analysis Linear Discriminant Analysis - Results

Class Level Information:
        Frequency  Proportion  Prior Probability
KIRSCH         17      0.3269             0.3269
MIRAB          15      0.2885             0.2885
POIRE          20      0.3846             0.3846

Importance of components:
      Eigenvalue  Difference  Proportion (%)  Cumulative (%)
Can1      2.7988      1.0799         34.9848         34.9848
Can2      1.7188      0.3154         21.4856         56.4703

Raw Canonical Coefficients:
            Can1    Can2
Constant -3.9272 -2.1923
MEOH      0.0014  0.0001
ACET      0.0005  0.0029
BU1       0.0418 -0.0025
BU2      -0.0005  0.0107
ISOP      0.0096  0.0002
MEPR      0.0264  0.0048
PRO1     -0.0003  0.0010
ACAL      0.0230  0.0432

Projection functions coefficients:
        Can1    Can2
MEOH  0.0651  0.0057
ACET  0.0076  0.0434
BU1   0.0570 -0.0034
BU2  -0.0037  0.0721
ISOP  0.0575  0.0012
MEPR  0.0604  0.0110
PRO1 -0.0244  0.0807
ACAL  0.0230  0.0431

Multivariate Analysis of Variance (MANOVA) Summary:
          Statistic    Value  p-value
0     Wilks' Lambda   0.4279      NaN
1  Bartlett -- C(4)  41.1720      0.0
2    Rao -- F(4,96)  12.6901      0.0

LDA Classification functions & Statistical Evaluation:
          KIRSCH   MIRAB   POIRE  Wilks' Lambda  Partial R-Square  F Value  \
Constant -2.1999 -1.3116 -1.5608            NaN               NaN      NaN
Can1     -1.2748  0.1782  0.9499         0.9608            0.4454  29.8899
Can2      0.1129 -0.2359  0.0810         0.4455            0.9604   0.9908

          Num DF  Den DF    Pr>F
Constant     NaN     NaN     NaN
Can1         2.0    48.0  0.0000
Can2         2.0    48.0  0.3787

Classification Summary for Calibration Data:

Observation Profile:
                        Read  Used
Number of Observations    52    52

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          17      0      0     17
MIRAB            1      9      5     15
POIRE            2      2     16     20
Total           20     11     21     52

Percent Classified into TYPE:
prediction    KIRSCH    MIRAB    POIRE  Total
TYPE
KIRSCH      100.0000   0.0000   0.0000  100.0
MIRAB         6.6667  60.0000  33.3333  100.0
POIRE        10.0000  10.0000  80.0000  100.0
Total        38.4615  21.1538  40.3846  100.0
Priors        0.3269   0.2885   0.3846    NaN

Error Count Estimates for TYPE:
        KIRSCH   MIRAB   POIRE   Total
Rate    0.0000  0.4000  0.2000  0.1923
Priors  0.3269  0.2885  0.3846     NaN

Classification Report for TYPE:
              precision  recall  f1-score  support
KIRSCH           0.8500  1.0000    0.9189  17.0000
MIRAB            0.8182  0.6000    0.6923  15.0000
POIRE            0.7619  0.8000    0.7805  20.0000
accuracy         0.8077  0.8077    0.8077   0.8077
macro avg        0.8100  0.8000    0.7972  52.0000
weighted avg     0.8069  0.8077    0.8003  52.0000

Evaluation of prediction on testing dataset#

[15]:

#testing data
DTest = load_alcools("test")
print(DTest.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   TYPE    50 non-null     object
 1   MEOH    50 non-null     int64
 2   ACET    50 non-null     int64
 3   BU1     50 non-null     float64
 4   BU2     50 non-null     float64
 5   ISOP    50 non-null     int64
 6   MEPR    50 non-null     int64
 7   PRO1    50 non-null     int64
 8   ACAL    50 non-null     float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB
None

[16]:

#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])
#evaluation on testing data
evl_test = clf.eval_predict(XTest,yTest,verbose=True)

Observation Profile:
                        Read  Used
Number of Observations    50    50

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          11      2      1     14
MIRAB            2     10      5     17
POIRE            2      3     14     19
Total           15     15     20     50

Percent Classified into TYPE:
prediction     KIRSCH      MIRAB      POIRE  Total
TYPE
KIRSCH      78.571429  14.285714   7.142857  100.0
MIRAB       11.764706  58.823529  29.411765  100.0
POIRE       10.526316  15.789474  73.684211  100.0
Total       30.000000  30.000000  40.000000  100.0
Priors       0.326923   0.288462   0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE     Total
Rate    0.214286  0.411765  0.263158  0.290048
Priors  0.326923  0.288462  0.384615       NaN

Classification Report for TYPE:
              precision    recall  f1-score  support
KIRSCH         0.733333  0.785714  0.758621     14.0
MIRAB          0.666667  0.588235  0.625000     17.0
POIRE          0.700000  0.736842  0.717949     19.0
accuracy       0.700000  0.700000  0.700000      0.7
macro avg      0.700000  0.703597  0.700523     50.0
weighted avg   0.698000  0.700000  0.697734     50.0