GFALDA PCADA - alcools dataset#
[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")
alcools dataset#
[ ]:
#alcools dataset
from discrimintools.datasets import load_alcools
D = load_alcools()
D.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TYPE 52 non-null object
1 MEOH 52 non-null float64
2 ACET 52 non-null float64
3 BU1 52 non-null float64
4 BU2 52 non-null float64
5 ISOP 52 non-null int64
6 MEPR 52 non-null float64
7 PRO1 52 non-null float64
8 ACAL 52 non-null float64
dtypes: float64(7), int64(1), object(1)
memory usage: 3.8+ KB
[3]:
#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])
Instanciation and training#
[4]:
#instanciation and training
from discrimintools import GFALDA
clf = GFALDA(n_components=2)
fit function#
[5]:
#fit function
clf.fit(X,y)
[5]:
GFALDA()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| n_components | 2 | |
| priors | None | |
| classes | False |
Canonical coefficients#
[6]:
#canonical coefficients
cancoef = clf.cancoef_
cancoef._fields
[6]:
('standardized', 'raw', 'projection')
Standardized canonical coefficients#
[7]:
#standardized canonical coefficients
print(cancoef.standardized)
Can1 Can2
MEOH 0.520673 0.045841
ACET 0.060796 0.347537
BU1 0.456099 -0.026945
BU2 -0.029491 0.576889
ISOP 0.459790 0.009933
MEPR 0.482844 0.087627
PRO1 -0.195016 0.645653
ACAL 0.183659 0.344884
Pojection canonical coefficients#
[8]:
#projection canonical coefficients
print(cancoef.projection)
Can1 Can2
MEOH 0.065084 0.005730
ACET 0.007599 0.043442
BU1 0.057012 -0.003368
BU2 -0.003686 0.072111
ISOP 0.057474 0.001242
MEPR 0.060356 0.010953
PRO1 -0.024377 0.080707
ACAL 0.022957 0.043110
Raw canonical coefficients#
[9]:
#raw canonical coefficients
print(cancoef.raw)
Can1 Can2
Constant -3.927165 -2.192304
MEOH 0.001422 0.000125
ACET 0.000503 0.002873
BU1 0.041763 -0.002467
BU2 -0.000547 0.010691
ISOP 0.009610 0.000208
MEPR 0.026395 0.004790
PRO1 -0.000314 0.001041
ACAL 0.023025 0.043237
Coefficients#
[10]:
#coefficients
coef = clf.coef_
coef._fields
[10]:
('standardized', 'raw', 'projection')
Standardized coefficients#
[11]:
#standardized coefficients
print(coef.standardized)
KIRSCH MIRAB POIRE
Constant -2.199936 -1.311627 -1.560770
MEOH -0.658592 0.081985 0.498314
ACET -0.038280 -0.071141 0.085894
BU1 -0.584486 0.087645 0.431080
BU2 0.102704 -0.141331 0.018700
ISOP -0.585029 0.079604 0.437572
MEPR -0.605651 0.065386 0.465763
PRO1 0.321480 -0.187052 -0.132969
ACAL -0.195208 -0.048617 0.202390
Projection coefficients#
[12]:
#projection coefficients
print(coef.projection)
KIRSCH MIRAB POIRE
Constant -2.199936 -1.311627 -1.560770
MEOH -0.082324 0.010248 0.062289
ACET -0.004785 -0.008893 0.010737
BU1 -0.073061 0.010956 0.053885
BU2 0.012838 -0.017666 0.002337
ISOP -0.073129 0.009950 0.054696
MEPR -0.075706 0.008173 0.058220
PRO1 0.040185 -0.023381 -0.016621
ACAL -0.024401 -0.006077 0.025299
Raw coefficients#
[13]:
#raw coefficients
print(coef.raw)
KIRSCH MIRAB POIRE
Constant 2.559072 -1.494437 -5.468820
MEOH -0.001798 0.000224 0.001361
ACET -0.000316 -0.000588 0.000710
BU1 -0.053519 0.008025 0.039472
BU2 0.001903 -0.002619 0.000347
ISOP -0.012228 0.001664 0.009146
MEPR -0.033108 0.003574 0.025461
PRO1 0.000518 -0.000302 -0.000214
ACAL -0.024473 -0.006095 0.025373
Summary#
[14]:
#summary
from discrimintools import summaryGFALDA
summaryGFALDA(clf,detailed=True)
General Factor Analysis Linear Discriminant Analysis - Results
Class Level Information:
Frequency Proportion Prior Probability
KIRSCH 17 0.3269 0.3269
MIRAB 15 0.2885 0.2885
POIRE 20 0.3846 0.3846
Importance of components:
Eigenvalue Difference Proportion (%) Cumulative (%)
Can1 2.7988 1.0799 34.9848 34.9848
Can2 1.7188 0.3154 21.4856 56.4703
Raw Canonical Coefficients:
Can1 Can2
Constant -3.9272 -2.1923
MEOH 0.0014 0.0001
ACET 0.0005 0.0029
BU1 0.0418 -0.0025
BU2 -0.0005 0.0107
ISOP 0.0096 0.0002
MEPR 0.0264 0.0048
PRO1 -0.0003 0.0010
ACAL 0.0230 0.0432
Projection functions coefficients:
Can1 Can2
MEOH 0.0651 0.0057
ACET 0.0076 0.0434
BU1 0.0570 -0.0034
BU2 -0.0037 0.0721
ISOP 0.0575 0.0012
MEPR 0.0604 0.0110
PRO1 -0.0244 0.0807
ACAL 0.0230 0.0431
Multivariate Analysis of Variance (MANOVA) Summary:
Statistic Value p-value
0 Wilks' Lambda 0.4279 NaN
1 Bartlett -- C(4) 41.1720 0.0
2 Rao -- F(4,96) 12.6901 0.0
LDA Classification functions & Statistical Evaluation:
KIRSCH MIRAB POIRE Wilks' Lambda Partial R-Square F Value \
Constant -2.1999 -1.3116 -1.5608 NaN NaN NaN
Can1 -1.2748 0.1782 0.9499 0.9608 0.4454 29.8899
Can2 0.1129 -0.2359 0.0810 0.4455 0.9604 0.9908
Num DF Den DF Pr>F
Constant NaN NaN NaN
Can1 2.0 48.0 0.0000
Can2 2.0 48.0 0.3787
Classification Summary for Calibration Data:
Observation Profile:
Read Used
Number of Observations 52 52
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 17 0 0 17
MIRAB 1 9 5 15
POIRE 2 2 16 20
Total 20 11 21 52
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 100.0000 0.0000 0.0000 100.0
MIRAB 6.6667 60.0000 33.3333 100.0
POIRE 10.0000 10.0000 80.0000 100.0
Total 38.4615 21.1538 40.3846 100.0
Priors 0.3269 0.2885 0.3846 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.0000 0.4000 0.2000 0.1923
Priors 0.3269 0.2885 0.3846 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 0.8500 1.0000 0.9189 17.0000
MIRAB 0.8182 0.6000 0.6923 15.0000
POIRE 0.7619 0.8000 0.7805 20.0000
accuracy 0.8077 0.8077 0.8077 0.8077
macro avg 0.8100 0.8000 0.7972 52.0000
weighted avg 0.8069 0.8077 0.8003 52.0000
Evaluation of prediction on testing dataset#
[15]:
#testing data
DTest = load_alcools("test")
print(DTest.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TYPE 50 non-null object
1 MEOH 50 non-null int64
2 ACET 50 non-null int64
3 BU1 50 non-null float64
4 BU2 50 non-null float64
5 ISOP 50 non-null int64
6 MEPR 50 non-null int64
7 PRO1 50 non-null int64
8 ACAL 50 non-null float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB
None
[16]:
#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])
#evaluation on testing data
evl_test = clf.eval_predict(XTest,yTest,verbose=True)
Observation Profile:
Read Used
Number of Observations 50 50
Number of Observations Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 11 2 1 14
MIRAB 2 10 5 17
POIRE 2 3 14 19
Total 15 15 20 50
Percent Classified into TYPE:
prediction KIRSCH MIRAB POIRE Total
TYPE
KIRSCH 78.571429 14.285714 7.142857 100.0
MIRAB 11.764706 58.823529 29.411765 100.0
POIRE 10.526316 15.789474 73.684211 100.0
Total 30.000000 30.000000 40.000000 100.0
Priors 0.326923 0.288462 0.384615 NaN
Error Count Estimates for TYPE:
KIRSCH MIRAB POIRE Total
Rate 0.214286 0.411765 0.263158 0.290048
Priors 0.326923 0.288462 0.384615 NaN
Classification Report for TYPE:
precision recall f1-score support
KIRSCH 0.733333 0.785714 0.758621 14.0
MIRAB 0.666667 0.588235 0.625000 17.0
POIRE 0.700000 0.736842 0.717949 19.0
accuracy 0.700000 0.700000 0.700000 0.7
macro avg 0.700000 0.703597 0.700523 50.0
weighted avg 0.698000 0.700000 0.697734 50.0