PLSDA - breast dataset#
[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")
breast dataset#
[2]:
#alcools dataset
from discrimintools.datasets import load_dataset
D = load_dataset("breast")
D.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ucellsize 699 non-null int64
1 normnucl 699 non-null int64
2 mitoses 699 non-null int64
3 Class 699 non-null object
dtypes: int64(3), object(1)
memory usage: 22.0+ KB
[3]:
#split into X and y
y, X = D["Class"], D.drop(columns=["Class"])
Instanciation and training#
[4]:
#instanciation and training
from discrimintools import PLSDA
clf = PLSDA()
clf.fit(X,y)
[4]:
PLSDA()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| n_components | 2 | |
| scale | True | |
| classes | None | |
| max_iter | 500 | |
| tol | 1e-10 | |
| var_select | False | |
| threshold | 1.0 | |
| warn_message | True |
Coefficients#
[5]:
#coefficients
print(clf.coef_)
negative positive
Constant 1.080102 -0.080102
ucellsize -0.085323 0.085323
normnucl -0.053251 0.053251
mitoses -0.003001 0.003001
Summary#
[6]:
#summary
from discrimintools import summaryPLSDA
summaryPLSDA(clf,detailed=True)
Partial Least Squares Discriminant Analysis - Results
Class Level Information:
Frequency Proportion Prior Probability
negative 458 0.6552 0.6552
positive 241 0.3448 0.3448
Importance of PLS components:
Proportion (%) Cumulative (%)
Can1 69.1520 69.1520
Can2 20.1981 89.3501
Classification functions coefficients:
negative positive VIP
Constant 1.0801 -0.0801 NaN
ucellsize -0.0853 0.0853 1.2038
normnucl -0.0533 0.0533 1.0346
mitoses -0.0030 0.0030 0.6933
Classification Summary for Calibration Data:
Observation Profile:
Read Used
Number of Observations 699 699
Number of Observations Classified into Class:
prediction negative positive Total
Class
negative 448 10 458
positive 56 185 241
Total 504 195 699
Percent Classified into Class:
prediction negative positive Total
Class
negative 97.8166 2.1834 100.0
positive 23.2365 76.7635 100.0
Total 72.1030 27.8970 100.0
Priors 0.6552 0.3448 NaN
Error Count Estimates for Class:
negative positive Total
Rate 0.0218 0.2324 0.0944
Priors 0.6552 0.3448 NaN
Classification Report for Class:
precision recall f1-score support
negative 0.8889 0.9782 0.9314 458.0000
positive 0.9487 0.7676 0.8486 241.0000
accuracy 0.9056 0.9056 0.9056 0.9056
macro avg 0.9188 0.8729 0.8900 699.0000
weighted avg 0.9095 0.9056 0.9029 699.0000
Plotting#
[7]:
#plotting
from discrimintools import fviz_plsr
Graph of individuals#
[8]:
#graph of individuals
p = fviz_plsr(clf,element="ind",repel=False)
p.show()
Graph of variables#
[9]:
#graph of variables
p = fviz_plsr(clf,element="var",repel=True)
p.show()
Distance between barycenter#
[10]:
#distance between barycenter
p = fviz_plsr(clf,element="dist",repel=False,y_lim=(-0.2,0.15))
p.show()