PLSDA - breast dataset#

[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")

breast dataset#

[2]:
#alcools dataset
from discrimintools.datasets import load_dataset
D = load_dataset("breast")
D.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   ucellsize  699 non-null    int64
 1   normnucl   699 non-null    int64
 2   mitoses    699 non-null    int64
 3   Class      699 non-null    object
dtypes: int64(3), object(1)
memory usage: 22.0+ KB
[3]:
#split into X and y
y, X = D["Class"], D.drop(columns=["Class"])

Instanciation and training#

[4]:
#instanciation and training
from discrimintools import PLSDA
clf = PLSDA()
clf.fit(X,y)
[4]:
PLSDA()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Coefficients#

[5]:
#coefficients
print(clf.coef_)
           negative  positive
Constant   1.080102 -0.080102
ucellsize -0.085323  0.085323
normnucl  -0.053251  0.053251
mitoses   -0.003001  0.003001

Summary#

[6]:
#summary
from discrimintools import summaryPLSDA
summaryPLSDA(clf,detailed=True)
                     Partial Least Squares Discriminant Analysis - Results

Class Level Information:
          Frequency  Proportion  Prior Probability
negative        458      0.6552             0.6552
positive        241      0.3448             0.3448

Importance of PLS components:
      Proportion (%)  Cumulative (%)
Can1         69.1520         69.1520
Can2         20.1981         89.3501

Classification functions coefficients:
           negative  positive     VIP
Constant     1.0801   -0.0801     NaN
ucellsize   -0.0853    0.0853  1.2038
normnucl    -0.0533    0.0533  1.0346
mitoses     -0.0030    0.0030  0.6933

Classification Summary for Calibration Data:

Observation Profile:
                        Read  Used
Number of Observations   699   699

Number of Observations Classified into Class:
prediction  negative  positive  Total
Class
negative         448        10    458
positive          56       185    241
Total            504       195    699

Percent Classified into Class:
prediction  negative  positive  Total
Class
negative     97.8166    2.1834  100.0
positive     23.2365   76.7635  100.0
Total        72.1030   27.8970  100.0
Priors        0.6552    0.3448    NaN

Error Count Estimates for Class:
        negative  positive   Total
Rate      0.0218    0.2324  0.0944
Priors    0.6552    0.3448     NaN

Classification Report for Class:
              precision  recall  f1-score   support
negative         0.8889  0.9782    0.9314  458.0000
positive         0.9487  0.7676    0.8486  241.0000
accuracy         0.9056  0.9056    0.9056    0.9056
macro avg        0.9188  0.8729    0.8900  699.0000
weighted avg     0.9095  0.9056    0.9029  699.0000

Plotting#

[7]:
#plotting
from discrimintools import fviz_plsr

Graph of individuals#

[8]:
#graph of individuals
p = fviz_plsr(clf,element="ind",repel=False)
p.show()
../../_images/source_examples_18_plsda_breast_14_0.png

Graph of variables#

[9]:
#graph of variables
p = fviz_plsr(clf,element="var",repel=True)
p.show()
../../_images/source_examples_18_plsda_breast_16_0.png

Distance between barycenter#

[10]:
#distance between barycenter
p = fviz_plsr(clf,element="dist",repel=False,y_lim=(-0.2,0.15))
p.show()
../../_images/source_examples_18_plsda_breast_18_0.png