DISCRIM (QDA) - alcools dataset#

[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")

alcools dataset#

[2]:
#vins dataset
from discrimintools.datasets import load_alcools
D = load_alcools("train")
print(D.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   TYPE    52 non-null     object
 1   MEOH    52 non-null     float64
 2   ACET    52 non-null     float64
 3   BU1     52 non-null     float64
 4   BU2     52 non-null     float64
 5   ISOP    52 non-null     int64
 6   MEPR    52 non-null     float64
 7   PRO1    52 non-null     float64
 8   ACAL    52 non-null     float64
dtypes: float64(7), int64(1), object(1)
memory usage: 3.8+ KB
None
[3]:
#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])

instanciation and training#

[4]:
from discrimintools import DISCRIM
clf = DISCRIM(method="quad") #warning can be disable using warn_message
clf.fit(X,y)

Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.
[4]:
DISCRIM(method='quad', priors='prop')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation on training data#

[5]:
#eval_predict function
eval_train = clf.eval_predict(X,y,verbose=True)
Observation Profile:
                        Read  Used
Number of Observations    52    52

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          17      0      0     17
MIRAB            0     15      0     15
POIRE            0      0     20     20
Total           17     15     20     52

Percent Classified into TYPE:
prediction      KIRSCH       MIRAB       POIRE  Total
TYPE
KIRSCH      100.000000    0.000000    0.000000  100.0
MIRAB         0.000000  100.000000    0.000000  100.0
POIRE         0.000000    0.000000  100.000000  100.0
Total        32.692308   28.846154   38.461538  100.0
Priors        0.326923    0.288462    0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE  Total
Rate    0.000000  0.000000  0.000000    0.0
Priors  0.326923  0.288462  0.384615    NaN

Classification Report for TYPE:
              precision  recall  f1-score  support
KIRSCH              1.0     1.0       1.0     17.0
MIRAB               1.0     1.0       1.0     15.0
POIRE               1.0     1.0       1.0     20.0
accuracy            1.0     1.0       1.0      1.0
macro avg           1.0     1.0       1.0     52.0
weighted avg        1.0     1.0       1.0     52.0
[6]:
#score function
print("Accuracy : {}%".format(100*round(clf.score(X,y),2)))
Accuracy : 100.0%
[7]:
#error rate
print("Error rate : {}%".format(100-100*round(clf.score(X,y),2)))
Error rate : 0.0%

summary#

[8]:
from discrimintools import summaryDISCRIM
summaryDISCRIM(clf,detailed=True)
                     Discriminant Analysis - Results

Summary Information:
               Infos  Value                  DF  DF value
0  Total Sample Size     52            DF Total        51
1          Variables      8   DF Within Classes        49
2            Classes      3  DF Between Classes         2

Class Level Information:
        Frequency  Proportion  Prior Probability
KIRSCH         17      0.3269             0.3269
MIRAB          15      0.2885             0.2885
POIRE          20      0.3846             0.3846

Within Covariance Matrix Information:
        Rank  Natural Log of the Determinant
Pooled     8                         58.3267
KIRSCH     8                         49.0021
MIRAB      8                         48.9038
POIRE      8                         54.6744

Test of Homogeneity of Within Covariance Matrices:
         Bartlett Value  Num DF  Den DF  F value  Pr>F  Chi Sq. Value  Pr>Chi2
Box's M        350.5115      72    6010    3.679   0.0       269.0859      0.0

Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices has been used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.

Classification Summary for Calibration Data:

Observation Profile:
                        Read  Used
Number of Observations    52    52

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          17      0      0     17
MIRAB            0     15      0     15
POIRE            0      0     20     20
Total           17     15     20     52

Percent Classified into TYPE:
prediction    KIRSCH     MIRAB     POIRE  Total
TYPE
KIRSCH      100.0000    0.0000    0.0000  100.0
MIRAB         0.0000  100.0000    0.0000  100.0
POIRE         0.0000    0.0000  100.0000  100.0
Total        32.6923   28.8462   38.4615  100.0
Priors        0.3269    0.2885    0.3846    NaN

Error Count Estimates for TYPE:
        KIRSCH   MIRAB   POIRE  Total
Rate    0.0000  0.0000  0.0000    0.0
Priors  0.3269  0.2885  0.3846    NaN

Classification Report for TYPE:
              precision  recall  f1-score  support
KIRSCH              1.0     1.0       1.0     17.0
MIRAB               1.0     1.0       1.0     15.0
POIRE               1.0     1.0       1.0     20.0
accuracy            1.0     1.0       1.0      1.0
macro avg           1.0     1.0       1.0     52.0
weighted avg        1.0     1.0       1.0     52.0

Evaluation of prediction on testing dataset#

Testing data#

[9]:
#testining data
DTest = load_alcools("test")
DTest.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   TYPE    50 non-null     object
 1   MEOH    50 non-null     int64
 2   ACET    50 non-null     int64
 3   BU1     50 non-null     float64
 4   BU2     50 non-null     float64
 5   ISOP    50 non-null     int64
 6   MEPR    50 non-null     int64
 7   PRO1    50 non-null     int64
 8   ACAL    50 non-null     float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB
[10]:
#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])
eval_test = clf.eval_predict(XTest,yTest,verbose=True)
Observation Profile:
                        Read  Used
Number of Observations    50    50

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE
KIRSCH          14      0      0     14
MIRAB            0     12      5     17
POIRE            0      2     17     19
Total           14     14     22     50

Percent Classified into TYPE:
prediction      KIRSCH      MIRAB      POIRE  Total
TYPE
KIRSCH      100.000000   0.000000   0.000000  100.0
MIRAB         0.000000  70.588235  29.411765  100.0
POIRE         0.000000  10.526316  89.473684  100.0
Total        28.000000  28.000000  44.000000  100.0
Priors        0.326923   0.288462   0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE     Total
Rate    0.000000  0.294118  0.105263  0.125327
Priors  0.326923  0.288462  0.384615       NaN

Classification Report for TYPE:
              precision    recall  f1-score  support
KIRSCH         1.000000  1.000000  1.000000    14.00
MIRAB          0.857143  0.705882  0.774194    17.00
POIRE          0.772727  0.894737  0.829268    19.00
accuracy       0.860000  0.860000  0.860000     0.86
macro avg      0.876623  0.866873  0.867821    50.00
weighted avg   0.865065  0.860000  0.858348    50.00