PLSLDA - breast dataset#

[1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")

breast dataset#

[2]:
#alcools dataset
from discrimintools.datasets import load_dataset
D = load_dataset("breast")
D.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   ucellsize  699 non-null    int64
 1   normnucl   699 non-null    int64
 2   mitoses    699 non-null    int64
 3   Class      699 non-null    object
dtypes: int64(3), object(1)
memory usage: 22.0+ KB
[3]:
#split into X and y
y, X = D["Class"], D.drop(columns=["Class"])

Instanciation and training#

[4]:
#instanciation and training
from discrimintools import PLSLDA
clf = PLSLDA()
clf.fit(X,y)
[4]:
PLSLDA()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Canonical coefficients#

[5]:
#canonical coefficients
cancoef = clf.cancoef_
cancoef._fields
[5]:
('standardized', 'raw')

Standardized canonical coefficients#

[6]:
#standardized canonical coefficients
print(cancoef.standardized)
               Can1      Can2
ucellsize  0.702554 -0.605737
normnucl   0.611795  0.025466
mitoses    0.363490  0.803929

Raw canonical coefficients#

[7]:
#raw canonical coefficients
print(cancoef.raw)
               Can1      Can2
Constant  -1.632918 -0.146717
ucellsize  0.230235 -0.198507
normnucl   0.200350  0.008340
mitoses    0.211938  0.468742

Coefficients#

[8]:
#coefficients
coef = clf.coef_
coef._fields
[8]:
('standardized', 'raw')

Standardized coefficients#

[9]:
#standardized coefficients
print(coef.standardized)
           negative  positive
Constant  -1.022831 -3.231980
ucellsize -1.302482  2.475256
normnucl  -0.813468  1.545927
mitoses   -0.025749  0.048934

Raw coefficients#

[10]:
#raw coefficients
print(coef.raw)
           negative  positive
Constant   1.102686 -7.271345
ucellsize -0.426839  0.811171
normnucl  -0.266394  0.506258
mitoses   -0.015013  0.028531

Summary#

[11]:
#summary
from discrimintools import summaryPLSLDA
summaryPLSLDA(clf,detailed=True)
                     Partial Least Squares Linear Discriminant Analysis - Results

Class Level Information:
          Frequency  Proportion  Prior Probability
negative        458      0.6552             0.6552
positive        241      0.3448             0.3448

Importance of PLS components:
      Proportion (%)  Cumulative (%)
Can1         69.1520         69.1520
Can2         20.1981         89.3501

Raw Canonical and Classification Functions Coefficients:
             Can1    Can2  negative  positive
Constant  -1.6329 -0.1467    1.1027   -7.2713
ucellsize  0.2302 -0.1985   -0.4268    0.8112
normnucl   0.2003  0.0083   -0.2664    0.5063
mitoses    0.2119  0.4687   -0.0150    0.0285

Multivariate Analysis of Variance (MANOVA) Summary:
          Statistic     Value  p-value
0     Wilks' Lambda    0.3042      NaN
1  Bartlett -- C(2)  828.2718      0.0
2   Rao -- F(2,696)  795.9566      0.0

LDA Classification functions & Statistical Evaluation:
          negative  positive  Wilks' Lambda  Partial R-Square    F Value  \
Constant   -1.0228   -3.2320            NaN               NaN        NaN
Can1       -1.3538    2.5728         0.9666            0.3147  1515.4447
Can2        0.5801   -1.1024         0.3376            0.9010    76.4684

          Num DF  Den DF  Pr>F
Constant     NaN     NaN   NaN
Can1         1.0   696.0   0.0
Can2         1.0   696.0   0.0

Classification Summary for Calibration Data:

Observation Profile:
                        Read  Used
Number of Observations   699   699

Number of Observations Classified into Class:
prediction  negative  positive  Total
Class
negative         447        11    458
positive          55       186    241
Total            502       197    699

Percent Classified into Class:
prediction  negative  positive  Total
Class
negative     97.5983    2.4017  100.0
positive     22.8216   77.1784  100.0
Total        71.8169   28.1831  100.0
Priors        0.6552    0.3448    NaN

Error Count Estimates for Class:
        negative  positive   Total
Rate      0.0240    0.2282  0.0944
Priors    0.6552    0.3448     NaN

Classification Report for Class:
              precision  recall  f1-score   support
negative         0.8904  0.9760    0.9312  458.0000
positive         0.9442  0.7718    0.8493  241.0000
accuracy         0.9056  0.9056    0.9056    0.9056
macro avg        0.9173  0.8739    0.8903  699.0000
weighted avg     0.9090  0.9056    0.9030  699.0000

Plotting#

[12]:
#plotting
from discrimintools import fviz_plsr

Graph of individuals#

[13]:
#graph of individuals
p = fviz_plsr(clf,element="ind",repel=False)
p.show()
../../_images/source_examples_19_plslda_breast_24_0.png

Graph of variables#

[14]:
#graph of variables
p = fviz_plsr(clf,element="var",repel=True)
p.show()
../../_images/source_examples_19_plslda_breast_26_0.png

Distance between barycenter#

[15]:
#distance between barycenter
p = fviz_plsr(clf,element="dist",repel=False,y_lim=(-0.2,0.15))
p.show()
../../_images/source_examples_19_plslda_breast_28_0.png