DISCRIM (LDA) - alcools dataset#
[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")
alcools dataset#
[2]:
#vins dataset
from discrimintools.datasets import load_alcools
D = load_alcools("train")
print(D.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TYPE 52 non-null object
1 MEOH 52 non-null float64
2 ACET 52 non-null float64
3 BU1 52 non-null float64
4 BU2 52 non-null float64
5 ISOP 52 non-null int64
6 MEPR 52 non-null float64
7 PRO1 52 non-null float64
8 ACAL 52 non-null float64
dtypes: float64(7), int64(1), object(1)
memory usage: 3.8+ KB
None
[3]:
#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])
instanciation and training#
[4]:
from discrimintools import DISCRIM
clf = DISCRIM()
clf.fit(X,y)
[4]:
DISCRIM(priors='prop')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| method | 'linear' | |
| priors | 'prop' | |
| classes | None | |
| var_select | False | |
| level | None | |
| tol | None | |
| warn_message | True |
Evaluation of prediction on training data#
[5]:
#eval_predict function
eval_train = clf.eval_predict(X,y,verbose=True)
Observation Profile:
Read Used
Number of Observations 52 52
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 17 0 0 17
MIRAB 0 14 1 15
POIRE 0 2 18 20
Total 17 16 19 52
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 100.000000 0.000000 0.000000 100.0
MIRAB 0.000000 93.333333 6.666667 100.0
POIRE 0.000000 10.000000 90.000000 100.0
Total 32.692308 30.769231 36.538462 100.0
Priors 0.326923 0.288462 0.384615 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.000000 0.066667 0.100000 0.057692
Priors 0.326923 0.288462 0.384615 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 1.000000 1.000000 1.000000 17.000000
MIRAB 0.875000 0.933333 0.903226 15.000000
POIRE 0.947368 0.900000 0.923077 20.000000
accuracy 0.942308 0.942308 0.942308 0.942308
macro avg 0.940789 0.944444 0.942101 52.000000
weighted avg 0.943699 0.942308 0.942499 52.000000
[6]:
#score function
print("Accuracy : {}%".format(100*round(clf.score(X,y),2)))
Accuracy : 94.0%
[7]:
#error rate
print("Error rate : {}%".format(100-100*round(clf.score(X,y),2)))
Error rate : 6.0%
Linear Discriminant Function#
[8]:
#Linear Discriminant Function
print(clf.coef_)
KIRSCH MIRAB POIRE
Constant -5.016453 -18.840685 -24.764879
MEOH 0.003428 0.029028 0.033390
ACET 0.006390 0.016413 0.007513
BU1 -0.063681 0.405390 0.318047
BU2 -0.000883 0.071352 0.114993
ISOP 0.023082 0.029763 -0.008486
MEPR 0.037494 -0.128942 0.061780
PRO1 0.001971 -0.005413 -0.008318
ACAL 0.066184 -0.226424 -0.130332
summary#
[9]:
from discrimintools import summaryDISCRIM
summaryDISCRIM(clf,detailed=True)
Discriminant Analysis - Results
Summary Information:
Infos Value DF DF value
0 Total Sample Size 52 DF Total 51
1 Variables 8 DF Within Classes 49
2 Classes 3 DF Between Classes 2
Class Level Information:
Frequency Proportion Prior Probability
KIRSCH 17 0.3269 0.3269
MIRAB 15 0.2885 0.2885
POIRE 20 0.3846 0.3846
Pooled Covariance Matrix Information:
Rank Natural Log of the Determinant
Pooled 8 58.3267
Linear Discriminant Function for TYPE:
KIRSCH MIRAB POIRE
Constant -5.0165 -18.8407 -24.7649
MEOH 0.0034 0.0290 0.0334
ACET 0.0064 0.0164 0.0075
BU1 -0.0637 0.4054 0.3180
BU2 -0.0009 0.0714 0.1150
ISOP 0.0231 0.0298 -0.0085
MEPR 0.0375 -0.1289 0.0618
PRO1 0.0020 -0.0054 -0.0083
ACAL 0.0662 -0.2264 -0.1303
Classification Summary for Calibration Data:
Observation Profile:
Read Used
Number of Observations 52 52
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 17 0 0 17
MIRAB 0 14 1 15
POIRE 0 2 18 20
Total 17 16 19 52
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 100.0000 0.0000 0.0000 100.0
MIRAB 0.0000 93.3333 6.6667 100.0
POIRE 0.0000 10.0000 90.0000 100.0
Total 32.6923 30.7692 36.5385 100.0
Priors 0.3269 0.2885 0.3846 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.0000 0.0667 0.1000 0.0577
Priors 0.3269 0.2885 0.3846 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 1.0000 1.0000 1.0000 17.0000
MIRAB 0.8750 0.9333 0.9032 15.0000
POIRE 0.9474 0.9000 0.9231 20.0000
accuracy 0.9423 0.9423 0.9423 0.9423
macro avg 0.9408 0.9444 0.9421 52.0000
weighted avg 0.9437 0.9423 0.9425 52.0000
Evaluation of prediction on testing dataset#
Testing data#
[10]:
#testining data
DTest = load_alcools("test")
DTest.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TYPE 50 non-null object
1 MEOH 50 non-null int64
2 ACET 50 non-null int64
3 BU1 50 non-null float64
4 BU2 50 non-null float64
5 ISOP 50 non-null int64
6 MEPR 50 non-null int64
7 PRO1 50 non-null int64
8 ACAL 50 non-null float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB
[11]:
#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])
eval_test = clf.eval_predict(XTest,yTest,verbose=True)
Observation Profile:
Read Used
Number of Observations 50 50
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 14 0 0 14
MIRAB 0 14 3 17
POIRE 1 5 13 19
Total 15 19 16 50
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 100.000000 0.000000 0.000000 100.0
MIRAB 0.000000 82.352941 17.647059 100.0
POIRE 5.263158 26.315789 68.421053 100.0
Total 30.000000 38.000000 32.000000 100.0
Priors 0.326923 0.288462 0.384615 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.000000 0.176471 0.315789 0.172362
Priors 0.326923 0.288462 0.384615 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 0.933333 1.000000 0.965517 14.00
MIRAB 0.736842 0.823529 0.777778 17.00
POIRE 0.812500 0.684211 0.742857 19.00
accuracy 0.820000 0.820000 0.820000 0.82
macro avg 0.827558 0.835913 0.828717 50.00
weighted avg 0.820610 0.820000 0.817075 50.00