DISCRIM (LDA) - alcools dataset#

[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")

alcools dataset#

[2]:
#vins dataset
from discrimintools.datasets import load_alcools
D = load_alcools("train")
print(D.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   TYPE    52 non-null     object
 1   MEOH    52 non-null     float64
 2   ACET    52 non-null     float64
 3   BU1     52 non-null     float64
 4   BU2     52 non-null     float64
 5   ISOP    52 non-null     int64
 6   MEPR    52 non-null     float64
 7   PRO1    52 non-null     float64
 8   ACAL    52 non-null     float64
dtypes: float64(7), int64(1), object(1)
memory usage: 3.8+ KB
None
[3]:
#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])

instanciation and training#

[4]:
from discrimintools import DISCRIM
clf = DISCRIM()
clf.fit(X,y)
[4]:
DISCRIM(priors='prop')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation of prediction on training data#

[5]:
#eval_predict function
eval_train = clf.eval_predict(X,y,verbose=True)
Observation Profile:
                        Read  Used
Number of Observations    52    52

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          17      0      0     17
MIRAB            0     14      1     15
POIRE            0      2     18     20
Total           17     16     19     52

Percent Classified into TYPE:
prediction      KIRSCH      MIRAB      POIRE  Total
TYPE
KIRSCH      100.000000   0.000000   0.000000  100.0
MIRAB         0.000000  93.333333   6.666667  100.0
POIRE         0.000000  10.000000  90.000000  100.0
Total        32.692308  30.769231  36.538462  100.0
Priors        0.326923   0.288462   0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE     Total
Rate    0.000000  0.066667  0.100000  0.057692
Priors  0.326923  0.288462  0.384615       NaN

Classification Report for TYPE:
              precision    recall  f1-score    support
KIRSCH         1.000000  1.000000  1.000000  17.000000
MIRAB          0.875000  0.933333  0.903226  15.000000
POIRE          0.947368  0.900000  0.923077  20.000000
accuracy       0.942308  0.942308  0.942308   0.942308
macro avg      0.940789  0.944444  0.942101  52.000000
weighted avg   0.943699  0.942308  0.942499  52.000000
[6]:
#score function
print("Accuracy : {}%".format(100*round(clf.score(X,y),2)))
Accuracy : 94.0%
[7]:
#error rate
print("Error rate : {}%".format(100-100*round(clf.score(X,y),2)))
Error rate : 6.0%

Linear Discriminant Function#

[8]:
#Linear Discriminant Function
print(clf.coef_)
            KIRSCH      MIRAB      POIRE
Constant -5.016453 -18.840685 -24.764879
MEOH      0.003428   0.029028   0.033390
ACET      0.006390   0.016413   0.007513
BU1      -0.063681   0.405390   0.318047
BU2      -0.000883   0.071352   0.114993
ISOP      0.023082   0.029763  -0.008486
MEPR      0.037494  -0.128942   0.061780
PRO1      0.001971  -0.005413  -0.008318
ACAL      0.066184  -0.226424  -0.130332

summary#

[9]:
from discrimintools import summaryDISCRIM
summaryDISCRIM(clf,detailed=True)
                     Discriminant Analysis - Results

Summary Information:
               Infos  Value                  DF  DF value
0  Total Sample Size     52            DF Total        51
1          Variables      8   DF Within Classes        49
2            Classes      3  DF Between Classes         2

Class Level Information:
        Frequency  Proportion  Prior Probability
KIRSCH         17      0.3269             0.3269
MIRAB          15      0.2885             0.2885
POIRE          20      0.3846             0.3846

Pooled Covariance Matrix Information:
        Rank  Natural Log of the Determinant
Pooled     8                         58.3267

Linear Discriminant Function for TYPE:
          KIRSCH    MIRAB    POIRE
Constant -5.0165 -18.8407 -24.7649
MEOH      0.0034   0.0290   0.0334
ACET      0.0064   0.0164   0.0075
BU1      -0.0637   0.4054   0.3180
BU2      -0.0009   0.0714   0.1150
ISOP      0.0231   0.0298  -0.0085
MEPR      0.0375  -0.1289   0.0618
PRO1      0.0020  -0.0054  -0.0083
ACAL      0.0662  -0.2264  -0.1303

Classification Summary for Calibration Data:

Observation Profile:
                        Read  Used
Number of Observations    52    52

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          17      0      0     17
MIRAB            0     14      1     15
POIRE            0      2     18     20
Total           17     16     19     52

Percent Classified into TYPE:
prediction    KIRSCH    MIRAB    POIRE  Total
TYPE
KIRSCH      100.0000   0.0000   0.0000  100.0
MIRAB         0.0000  93.3333   6.6667  100.0
POIRE         0.0000  10.0000  90.0000  100.0
Total        32.6923  30.7692  36.5385  100.0
Priors        0.3269   0.2885   0.3846    NaN

Error Count Estimates for TYPE:
        KIRSCH   MIRAB   POIRE   Total
Rate    0.0000  0.0667  0.1000  0.0577
Priors  0.3269  0.2885  0.3846     NaN

Classification Report for TYPE:
              precision  recall  f1-score  support
KIRSCH           1.0000  1.0000    1.0000  17.0000
MIRAB            0.8750  0.9333    0.9032  15.0000
POIRE            0.9474  0.9000    0.9231  20.0000
accuracy         0.9423  0.9423    0.9423   0.9423
macro avg        0.9408  0.9444    0.9421  52.0000
weighted avg     0.9437  0.9423    0.9425  52.0000

Evaluation of prediction on testing dataset#

Testing data#

[10]:
#testining data
DTest = load_alcools("test")
DTest.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   TYPE    50 non-null     object
 1   MEOH    50 non-null     int64
 2   ACET    50 non-null     int64
 3   BU1     50 non-null     float64
 4   BU2     50 non-null     float64
 5   ISOP    50 non-null     int64
 6   MEPR    50 non-null     int64
 7   PRO1    50 non-null     int64
 8   ACAL    50 non-null     float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB
[11]:
#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])
eval_test = clf.eval_predict(XTest,yTest,verbose=True)
Observation Profile:
                        Read  Used
Number of Observations    50    50

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          14      0      0     14
MIRAB            0     14      3     17
POIRE            1      5     13     19
Total           15     19     16     50

Percent Classified into TYPE:
prediction      KIRSCH      MIRAB      POIRE  Total
TYPE
KIRSCH      100.000000   0.000000   0.000000  100.0
MIRAB         0.000000  82.352941  17.647059  100.0
POIRE         5.263158  26.315789  68.421053  100.0
Total        30.000000  38.000000  32.000000  100.0
Priors        0.326923   0.288462   0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE     Total
Rate    0.000000  0.176471  0.315789  0.172362
Priors  0.326923  0.288462  0.384615       NaN

Classification Report for TYPE:
              precision    recall  f1-score  support
KIRSCH         0.933333  1.000000  0.965517    14.00
MIRAB          0.736842  0.823529  0.777778    17.00
POIRE          0.812500  0.684211  0.742857    19.00
accuracy       0.820000  0.820000  0.820000     0.82
macro avg      0.827558  0.835913  0.828717    50.00
weighted avg   0.820610  0.820000  0.817075    50.00