DISCRIM (QDA) - alcools dataset#
[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")
alcools dataset#
[2]:
#vins dataset
from discrimintools.datasets import load_alcools
D = load_alcools("train")
print(D.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TYPE 52 non-null object
1 MEOH 52 non-null float64
2 ACET 52 non-null float64
3 BU1 52 non-null float64
4 BU2 52 non-null float64
5 ISOP 52 non-null int64
6 MEPR 52 non-null float64
7 PRO1 52 non-null float64
8 ACAL 52 non-null float64
dtypes: float64(7), int64(1), object(1)
memory usage: 3.8+ KB
None
[3]:
#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])
instanciation and training#
[4]:
from discrimintools import DISCRIM
clf = DISCRIM(method="quad") #warning can be disable using warn_message
clf.fit(X,y)
Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.
[4]:
DISCRIM(method='quad', priors='prop')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| method | 'quad' | |
| priors | 'prop' | |
| classes | None | |
| var_select | False | |
| level | None | |
| tol | None | |
| warn_message | True |
Evaluation on training data#
[5]:
#eval_predict function
eval_train = clf.eval_predict(X,y,verbose=True)
Observation Profile:
Read Used
Number of Observations 52 52
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 17 0 0 17
MIRAB 0 15 0 15
POIRE 0 0 20 20
Total 17 15 20 52
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 100.000000 0.000000 0.000000 100.0
MIRAB 0.000000 100.000000 0.000000 100.0
POIRE 0.000000 0.000000 100.000000 100.0
Total 32.692308 28.846154 38.461538 100.0
Priors 0.326923 0.288462 0.384615 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.000000 0.000000 0.000000 0.0
Priors 0.326923 0.288462 0.384615 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 1.0 1.0 1.0 17.0
MIRAB 1.0 1.0 1.0 15.0
POIRE 1.0 1.0 1.0 20.0
accuracy 1.0 1.0 1.0 1.0
macro avg 1.0 1.0 1.0 52.0
weighted avg 1.0 1.0 1.0 52.0
[6]:
#score function
print("Accuracy : {}%".format(100*round(clf.score(X,y),2)))
Accuracy : 100.0%
[7]:
#error rate
print("Error rate : {}%".format(100-100*round(clf.score(X,y),2)))
Error rate : 0.0%
summary#
[8]:
from discrimintools import summaryDISCRIM
summaryDISCRIM(clf,detailed=True)
Discriminant Analysis - Results
Summary Information:
Infos Value DF DF value
0 Total Sample Size 52 DF Total 51
1 Variables 8 DF Within Classes 49
2 Classes 3 DF Between Classes 2
Class Level Information:
Frequency Proportion Prior Probability
KIRSCH 17 0.3269 0.3269
MIRAB 15 0.2885 0.2885
POIRE 20 0.3846 0.3846
Within Covariance Matrix Information:
Rank Natural Log of the Determinant
Pooled 8 58.3267
KIRSCH 8 49.0021
MIRAB 8 48.9038
POIRE 8 54.6744
Test of Homogeneity of Within Covariance Matrices:
Bartlett Value Num DF Den DF F value Pr>F Chi Sq. Value Pr>Chi2
Box's M 350.5115 72 6010 3.679 0.0 269.0859 0.0
Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices has been used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.
Classification Summary for Calibration Data:
Observation Profile:
Read Used
Number of Observations 52 52
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 17 0 0 17
MIRAB 0 15 0 15
POIRE 0 0 20 20
Total 17 15 20 52
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 100.0000 0.0000 0.0000 100.0
MIRAB 0.0000 100.0000 0.0000 100.0
POIRE 0.0000 0.0000 100.0000 100.0
Total 32.6923 28.8462 38.4615 100.0
Priors 0.3269 0.2885 0.3846 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.0000 0.0000 0.0000 0.0
Priors 0.3269 0.2885 0.3846 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 1.0 1.0 1.0 17.0
MIRAB 1.0 1.0 1.0 15.0
POIRE 1.0 1.0 1.0 20.0
accuracy 1.0 1.0 1.0 1.0
macro avg 1.0 1.0 1.0 52.0
weighted avg 1.0 1.0 1.0 52.0
Evaluation of prediction on testing dataset#
Testing data#
[9]:
#testining data
DTest = load_alcools("test")
DTest.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TYPE 50 non-null object
1 MEOH 50 non-null int64
2 ACET 50 non-null int64
3 BU1 50 non-null float64
4 BU2 50 non-null float64
5 ISOP 50 non-null int64
6 MEPR 50 non-null int64
7 PRO1 50 non-null int64
8 ACAL 50 non-null float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB
[10]:
#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])
eval_test = clf.eval_predict(XTest,yTest,verbose=True)
Observation Profile:
Read Used
Number of Observations 50 50
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 14 0 0 14
MIRAB 0 12 5 17
POIRE 0 2 17 19
Total 14 14 22 50
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 100.000000 0.000000 0.000000 100.0
MIRAB 0.000000 70.588235 29.411765 100.0
POIRE 0.000000 10.526316 89.473684 100.0
Total 28.000000 28.000000 44.000000 100.0
Priors 0.326923 0.288462 0.384615 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.000000 0.294118 0.105263 0.125327
Priors 0.326923 0.288462 0.384615 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 1.000000 1.000000 1.000000 14.00
MIRAB 0.857143 0.705882 0.774194 17.00
POIRE 0.772727 0.894737 0.829268 19.00
accuracy 0.860000 0.860000 0.860000 0.86
macro avg 0.876623 0.866873 0.867821 50.00
weighted avg 0.865065 0.860000 0.858348 50.00