verticapy.machine_learning.vertica.tree.DecisionTreeClassifier.classification_report#

DecisionTreeClassifier.classification_report(metrics: None | str | list[Literal['aic', 'bic', 'accuracy', 'acc', 'balanced_accuracy', 'ba', 'auc', 'roc_auc', 'prc_auc', 'best_cutoff', 'best_threshold', 'false_discovery_rate', 'fdr', 'false_omission_rate', 'for', 'false_negative_rate', 'fnr', 'false_positive_rate', 'fpr', 'recall', 'tpr', 'precision', 'ppv', 'specificity', 'tnr', 'negative_predictive_value', 'npv', 'negative_likelihood_ratio', 'lr-', 'positive_likelihood_ratio', 'lr+', 'diagnostic_odds_ratio', 'dor', 'log_loss', 'logloss', 'f1', 'f1_score', 'mcc', 'bm', 'informedness', 'mk', 'markedness', 'ts', 'csi', 'critical_success_index', 'fowlkes_mallows_index', 'fm', 'prevalence_threshold', 'pm', 'confusion_matrix', 'classification_report']] = None, cutoff: int | float | Decimal | None = None, labels: None | str | list[str] = None, nbins: int = 10000) → float | TableSample#

Computes a classification report using multiple model evaluation metrics (auc, accuracy, f1…). For multiclass classification, it considers each category as positive and switches to the next one during the computation.

Parameters#

metrics: list, optional

List of the metrics used to compute the final report.

accuracy:
Accuracy.

\[Accuracy = \frac{TP + TN}{TP + TN + FP + FN}\]
aic:
Akaike’s Information Criterion

\[AIC = 2k - 2\ln(\hat{L})\]
auc:
Area Under the Curve (ROC).

\[AUC = \int_{0}^{1} TPR(FPR) \, dFPR\]
ba:
Balanced Accuracy.

\[BA = \frac{TPR + TNR}{2}\]
best_cutoff:
Cutoff which optimised the ROC Curve prediction.
bic:
Bayesian Information Criterion

\[BIC = -2\ln(\hat{L}) + k \ln(n)\]
bm:
Informedness

\[BM = TPR + TNR - 1\]
csi:
Critical Success Index

\[index = \frac{TP}{TP + FN + FP}\]
f1:
F1 Score

\[F_1 Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}\]
fdr:
False Discovery Rate

\[FDR = 1 - PPV\]
fm:
Fowlkes-Mallows index

\[FM = \sqrt{PPV * TPR}\]
fnr:
False Negative Rate

\[FNR = \frac{FN}{FN + TP}\]
for:
False Omission Rate

\[FOR = 1 - NPV\]
fpr:
False Positive Rate

\[FPR = \frac{FP}{FP + TN}\]
logloss:
Log Loss.

\[Loss = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right)\]
lr+:
Positive Likelihood Ratio.

\[LR+ = \frac{TPR}{FPR}\]
lr-:
Negative Likelihood Ratio.

\[LR- = \frac{FNR}{TNR}\]
dor:
Diagnostic Odds Ratio.

\[DOR = \frac{TP \times TN}{FP \times FN}\]

mc:

Matthews Correlation Coefficient .. math:

MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}

mk:
Markedness

\[MK = PPV + NPV - 1\]
npv:
Negative Predictive Value

\[NPV = \frac{TN}{TN + FN}\]
prc_auc:
Area Under the Curve (PRC)

\[AUC = \int_{0}^{1} Precision(Recall) \, dRecall\]
precision:
Precision

\[Precision = TP / (TP + FP)\]
pt:
Prevalence Threshold.

\[threshold = \frac{\sqrt{FPR}}{\sqrt{TPR} + \sqrt{FPR}}\]
recall:
Recall.

\[Recall = \frac{TP}{TP + FN}\]
specificity:
Specificity.

\[Specificity = \frac{TN}{TN + FP}\]

cutoff: PythonNumber, optional

Cutoff for which the tested category is accepted as a prediction. For multiclass classification, each tested category becomes the positives and the others are merged into the negatives. The cutoff represents the classes threshold. If it is empty, the regular cutoff (1 / number of classes) is used.

labels: str | list, optional

List of the different labels to be used during the computation.

nbins: int, optional

[Used to compute ROC AUC, PRC AUC and the best cutoff] An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. Greater values for nbins give more precise estimations of the metrics, but can potentially decrease performance. The maximum value is 999,999. If negative, the maximum value is used.

Returns#

TableSample: report.

Examples#

For this example, we will use the Iris dataset.

import verticapy.datasets as vpd

data = vpd.load_iris()

train, test = data.train_test_split(test_size = 0.2)

	123 SepalLengthCm Numeric(7)	123 SepalWidthCm Numeric(7)	123 PetalLengthCm Numeric(7)	123 PetalWidthCm Numeric(7)	Abc Species Varchar(30)
1	3.3	4.5	5.6	7.8	Iris-setosa
2	3.3	4.5	5.6	7.8	Iris-setosa
3	3.3	4.5	5.6	7.8	Iris-setosa
4	3.3	4.5	5.6	7.8	Iris-setosa
5	3.3	4.5	5.6	7.8	Iris-setosa
6	3.3	4.5	5.6	7.8	Iris-setosa
7	3.3	4.5	5.6	7.8	Iris-setosa
8	3.3	4.5	5.6	7.8	Iris-setosa
9	3.3	4.5	5.6	7.8	Iris-setosa
10	3.3	4.5	5.6	7.8	Iris-setosa
11	3.3	4.5	5.6	7.8	Iris-setosa
12	3.3	4.5	5.6	7.8	Iris-setosa
13	3.3	4.5	5.6	7.8	Iris-setosa
14	3.3	4.5	5.6	7.8	Iris-setosa
15	3.3	4.5	5.6	7.8	Iris-setosa
16	3.3	4.5	5.6	7.8	Iris-setosa
17	3.3	4.5	5.6	7.8	Iris-setosa
18	3.3	4.5	5.6	7.8	Iris-setosa
19	3.3	4.5	5.6	7.8	Iris-setosa
20	3.3	4.5	5.6	7.8	Iris-setosa
21	3.3	4.5	5.6	7.8	Iris-setosa
22	3.3	4.5	5.6	7.8	Iris-setosa
23	3.3	4.5	5.6	7.8	Iris-setosa
24	3.3	4.5	5.6	7.8	Iris-setosa
25	3.3	4.5	5.6	7.8	Iris-setosa
26	3.3	4.5	5.6	7.8	Iris-setosa
27	3.3	4.5	5.6	7.8	Iris-setosa
28	3.3	4.5	5.6	7.8	Iris-setosa
29	3.3	4.5	5.6	7.8	Iris-setosa
30	3.3	4.5	5.6	7.8	Iris-setosa
31	3.3	4.5	5.6	7.8	Iris-setosa
32	3.3	4.5	5.6	7.8	Iris-setosa
33	3.3	4.5	5.6	7.8	Iris-setosa
34	3.3	4.5	5.6	7.8	Iris-setosa
35	3.3	4.5	5.6	7.8	Iris-setosa
36	3.3	4.5	5.6	7.8	Iris-setosa
37	3.3	4.5	5.6	7.8	Iris-setosa
38	3.3	4.5	5.6	7.8	Iris-setosa
39	3.3	4.5	5.6	7.8	Iris-setosa
40	3.3	4.5	5.6	7.8	Iris-setosa
41	3.3	4.5	5.6	7.8	Iris-setosa
42	3.3	4.5	5.6	7.8	Iris-setosa
43	4.3	3.0	1.1	0.1	Iris-setosa
44	4.3	4.7	9.6	1.8	Iris-virginica
45	4.3	4.7	9.6	1.8	Iris-virginica
46	4.3	4.7	9.6	1.8	Iris-virginica
47	4.3	4.7	9.6	1.8	Iris-virginica
48	4.3	4.7	9.6	1.8	Iris-virginica
49	4.3	4.7	9.6	1.8	Iris-virginica
50	4.3	4.7	9.6	1.8	Iris-virginica
51	4.3	4.7	9.6	1.8	Iris-virginica
52	4.3	4.7	9.6	1.8	Iris-virginica
53	4.3	4.7	9.6	1.8	Iris-virginica
54	4.3	4.7	9.6	1.8	Iris-virginica
55	4.3	4.7	9.6	1.8	Iris-virginica
56	4.3	4.7	9.6	1.8	Iris-virginica
57	4.3	4.7	9.6	1.8	Iris-virginica
58	4.3	4.7	9.6	1.8	Iris-virginica
59	4.3	4.7	9.6	1.8	Iris-virginica
60	4.3	4.7	9.6	1.8	Iris-virginica
61	4.3	4.7	9.6	1.8	Iris-virginica
62	4.3	4.7	9.6	1.8	Iris-virginica
63	4.3	4.7	9.6	1.8	Iris-virginica
64	4.3	4.7	9.6	1.8	Iris-virginica
65	4.3	4.7	9.6	1.8	Iris-virginica
66	4.3	4.7	9.6	1.8	Iris-virginica
67	4.3	4.7	9.6	1.8	Iris-virginica
68	4.3	4.7	9.6	1.8	Iris-virginica
69	4.3	4.7	9.6	1.8	Iris-virginica
70	4.3	4.7	9.6	1.8	Iris-virginica
71	4.3	4.7	9.6	1.8	Iris-virginica
72	4.3	4.7	9.6	1.8	Iris-virginica
73	4.3	4.7	9.6	1.8	Iris-virginica
74	4.3	4.7	9.6	1.8	Iris-virginica
75	4.3	4.7	9.6	1.8	Iris-virginica
76	4.3	4.7	9.6	1.8	Iris-virginica
77	4.3	4.7	9.6	1.8	Iris-virginica
78	4.3	4.7	9.6	1.8	Iris-virginica
79	4.3	4.7	9.6	1.8	Iris-virginica
80	4.3	4.7	9.6	1.8	Iris-virginica
81	4.3	4.7	9.6	1.8	Iris-virginica
82	4.3	4.7	9.6	1.8	Iris-virginica
83	4.3	4.7	9.6	1.8	Iris-virginica
84	4.3	4.7	9.6	1.8	Iris-virginica
85	4.3	4.7	9.6	1.8	Iris-virginica
86	4.4	2.9	1.4	0.2	Iris-setosa
87	4.4	3.0	1.3	0.2	Iris-setosa
88	4.4	3.2	1.3	0.2	Iris-setosa
89	4.5	2.3	1.3	0.3	Iris-setosa
90	4.6	3.1	1.5	0.2	Iris-setosa
91	4.6	3.2	1.4	0.2	Iris-setosa
92	4.6	3.4	1.4	0.3	Iris-setosa
93	4.6	3.6	1.0	0.2	Iris-setosa
94	4.7	3.2	1.3	0.2	Iris-setosa
95	4.7	3.2	1.6	0.2	Iris-setosa
96	4.8	3.0	1.4	0.1	Iris-setosa
97	4.8	3.0	1.4	0.3	Iris-setosa
98	4.8	3.1	1.6	0.2	Iris-setosa
99	4.8	3.4	1.6	0.2	Iris-setosa
100	4.8	3.4	1.9	0.2	Iris-setosa

Rows: 1-100 | Columns: 5

Let’s import the model:

from verticapy.machine_learning.vertica import NearestCentroid

Then we can create the model:

model = NearestCentroid(p = 2)

We can now fit the model:

model.fit(
    train,
    [
        "SepalLengthCm",
        "SepalWidthCm",
        "PetalLengthCm",
        "PetalWidthCm",
    ],
    "Species",
    test,
)

We can get all the classification metrics using the classification_report:

model.classification_report()

	Iris-setosa	Iris-versicolor	Iris-virginica	avg_macro	avg_weighted	avg_micro
auc	1.0	0.9880952380952377	0.8631239935587762	0.9504064105513379	0.9351322751322751	[null]
prc_auc	1.0	0.9440972222222223	0.8872212333368823	0.9437728185197015	0.9391773228905214	[null]
accuracy	0.8	0.52	0.72	0.68	0.7184	0.68
log_loss	0.218126858489136	0.205467671752679	0.258999416794593	0.22753131567880266	0.2349027654318131	[null]
precision	1.0	0.25	1.0	0.75	0.88	0.52
recall	0.47368421052631576	1.0	0.391304347826087	0.6216628527841342	0.52	0.52
f1_score	0.6428571428571429	0.4	0.5625	0.5351190476190476	0.5670357142857143	0.52
mcc	0.5984575809960173	0.32732683535398854	0.507630011973488	0.47780480944116466	0.5132959799429293	0.28
informedness	0.4736842105263157	0.4285714285714286	0.3913043478260869	0.4311866623079437	0.42857142857142855	0.28
markedness	0.7560975609756098	0.25	0.6585365853658538	0.5548780487804879	0.6302439024390245	0.28
csi	0.47368421052631576	0.25	0.391304347826087	0.3716628527841342	0.4	0.35135135135135137

Rows: 1-11 | Columns: 7

Important

For this example, a specific model is utilized, and it may not correspond exactly to the model you are working with. To see a comprehensive example specific to your class of interest, please refer to that particular class.