Loading...

verticapy.machine_learning.vertica.naive_bayes.NaiveBayes.report

NaiveBayes.report(metrics: None | str | list[Literal['aic', 'bic', 'accuracy', 'acc', 'balanced_accuracy', 'ba', 'auc', 'roc_auc', 'prc_auc', 'best_cutoff', 'best_threshold', 'false_discovery_rate', 'fdr', 'false_omission_rate', 'for', 'false_negative_rate', 'fnr', 'false_positive_rate', 'fpr', 'recall', 'tpr', 'precision', 'ppv', 'specificity', 'tnr', 'negative_predictive_value', 'npv', 'negative_likelihood_ratio', 'lr-', 'positive_likelihood_ratio', 'lr+', 'diagnostic_odds_ratio', 'dor', 'log_loss', 'logloss', 'f1', 'f1_score', 'mcc', 'bm', 'informedness', 'mk', 'markedness', 'ts', 'csi', 'critical_success_index', 'fowlkes_mallows_index', 'fm', 'prevalence_threshold', 'pm', 'confusion_matrix', 'classification_report']] = None, cutoff: Annotated[int | float | Decimal, 'Python Numbers'] | None = None, labels: None | str | list[str] = None, nbins: int = 10000) float | TableSample

Computes a classification report using multiple model evaluation metrics (auc, accuracy, f1…). For multiclass classification, it considers each category as positive and switches to the next one during the computation.

Parameters

metrics: list, optional

List of the metrics used to compute the final report.

  • accuracy:

    Accuracy.

    \[Accuracy = \frac{TP + TN}{TP + TN + FP + FN}\]
  • aic:

    Akaike’s Information Criterion

    \[AIC = 2k - 2\ln(\hat{L})\]
  • auc:

    Area Under the Curve (ROC).

    \[AUC = \int_{0}^{1} TPR(FPR) \, dFPR\]
  • ba:

    Balanced Accuracy.

    \[BA = \frac{TPR + TNR}{2}\]
  • best_cutoff:

    Cutoff which optimised the ROC Curve prediction.

  • bic:

    Bayesian Information Criterion

    \[BIC = -2\ln(\hat{L}) + k \ln(n)\]
  • bm:

    Informedness

    \[BM = TPR + TNR - 1\]
  • csi:

    Critical Success Index

    \[index = \frac{TP}{TP + FN + FP}\]
  • f1:

    F1 Score

    \[F_1 Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}\]
  • fdr:

    False Discovery Rate

    \[FDR = 1 - PPV\]
  • fm:

    Fowlkes-Mallows index

    \[FM = \sqrt{PPV * TPR}\]
  • fnr:

    False Negative Rate

    \[FNR = \frac{FN}{FN + TP}\]
  • for:

    False Omission Rate

    \[FOR = 1 - NPV\]
  • fpr:

    False Positive Rate

    \[FPR = \frac{FP}{FP + TN}\]
  • logloss:

    Log Loss.

    \[Loss = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right)\]
  • lr+:

    Positive Likelihood Ratio.

    \[LR+ = \frac{TPR}{FPR}\]
  • lr-:

    Negative Likelihood Ratio.

    \[LR- = \frac{FNR}{TNR}\]
  • dor:

    Diagnostic Odds Ratio.

    \[DOR = \frac{TP \times TN}{FP \times FN}\]
  • mc:

    Matthews Correlation Coefficient .. math:

    MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}
    
  • mk:

    Markedness

    \[MK = PPV + NPV - 1\]
  • npv:

    Negative Predictive Value

    \[NPV = \frac{TN}{TN + FN}\]
  • prc_auc:

    Area Under the Curve (PRC)

    \[AUC = \int_{0}^{1} Precision(Recall) \, dRecall\]
  • precision:

    Precision

    \[Precision = TP / (TP + FP)\]
  • pt:

    Prevalence Threshold.

    \[threshold = \frac{\sqrt{FPR}}{\sqrt{TPR} + \sqrt{FPR}}\]
  • recall:

    Recall.

    \[Recall = \frac{TP}{TP + FN}\]
  • specificity:

    Specificity.

    \[Specificity = \frac{TN}{TN + FP}\]
cutoff: PythonNumber, optional

Cutoff for which the tested category is accepted as a prediction. For multiclass classification, each tested category becomes the positives and the others are merged into the negatives. The cutoff represents the classes threshold. If it is empty, the regular cutoff (1 / number of classes) is used.

labels: str | list, optional

List of the different labels to be used during the computation.

nbins: int, optional

[Used to compute ROC AUC, PRC AUC and the best cutoff] An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. Greater values for nbins give more precise estimations of the metrics, but can potentially decrease performance. The maximum value is 999,999. If negative, the maximum value is used.

Returns

TableSample

report.

Examples

For this example, we will use the Iris dataset.

import verticapy.datasets as vpd

data = vpd.load_iris()

train, test = data.train_test_split(test_size = 0.2)
123
SepalLengthCm
Numeric(7)
123
SepalWidthCm
Numeric(7)
123
PetalLengthCm
Numeric(7)
123
PetalWidthCm
Numeric(7)
Abc
Species
Varchar(30)
13.34.55.67.8Iris-setosa
23.34.55.67.8Iris-setosa
33.34.55.67.8Iris-setosa
43.34.55.67.8Iris-setosa
53.34.55.67.8Iris-setosa
63.34.55.67.8Iris-setosa
73.34.55.67.8Iris-setosa
83.34.55.67.8Iris-setosa
93.34.55.67.8Iris-setosa
103.34.55.67.8Iris-setosa
113.34.55.67.8Iris-setosa
123.34.55.67.8Iris-setosa
133.34.55.67.8Iris-setosa
143.34.55.67.8Iris-setosa
153.34.55.67.8Iris-setosa
163.34.55.67.8Iris-setosa
173.34.55.67.8Iris-setosa
183.34.55.67.8Iris-setosa
193.34.55.67.8Iris-setosa
203.34.55.67.8Iris-setosa
213.34.55.67.8Iris-setosa
223.34.55.67.8Iris-setosa
233.34.55.67.8Iris-setosa
243.34.55.67.8Iris-setosa
253.34.55.67.8Iris-setosa
263.34.55.67.8Iris-setosa
274.33.01.10.1Iris-setosa
284.34.79.61.8Iris-virginica
294.34.79.61.8Iris-virginica
304.34.79.61.8Iris-virginica
314.34.79.61.8Iris-virginica
324.34.79.61.8Iris-virginica
334.34.79.61.8Iris-virginica
344.34.79.61.8Iris-virginica
354.34.79.61.8Iris-virginica
364.34.79.61.8Iris-virginica
374.34.79.61.8Iris-virginica
384.34.79.61.8Iris-virginica
394.34.79.61.8Iris-virginica
404.34.79.61.8Iris-virginica
414.34.79.61.8Iris-virginica
424.34.79.61.8Iris-virginica
434.34.79.61.8Iris-virginica
444.34.79.61.8Iris-virginica
454.34.79.61.8Iris-virginica
464.34.79.61.8Iris-virginica
474.34.79.61.8Iris-virginica
484.34.79.61.8Iris-virginica
494.34.79.61.8Iris-virginica
504.34.79.61.8Iris-virginica
514.34.79.61.8Iris-virginica
524.34.79.61.8Iris-virginica
534.34.79.61.8Iris-virginica
544.42.91.40.2Iris-setosa
554.43.01.30.2Iris-setosa
564.43.21.30.2Iris-setosa
574.52.31.30.3Iris-setosa
584.63.11.50.2Iris-setosa
594.63.21.40.2Iris-setosa
604.63.41.40.3Iris-setosa
614.63.61.00.2Iris-setosa
624.73.21.30.2Iris-setosa
634.73.21.60.2Iris-setosa
644.83.01.40.1Iris-setosa
654.83.01.40.3Iris-setosa
664.83.11.60.2Iris-setosa
674.83.41.60.2Iris-setosa
684.83.41.90.2Iris-setosa
694.92.43.31.0Iris-versicolor
704.92.54.51.7Iris-virginica
714.93.01.40.2Iris-setosa
724.93.11.50.1Iris-setosa
734.93.11.50.1Iris-setosa
744.93.11.50.1Iris-setosa
755.02.03.51.0Iris-versicolor
765.02.33.31.0Iris-versicolor
775.03.01.60.2Iris-setosa
785.03.21.20.2Iris-setosa
795.03.31.40.2Iris-setosa
805.03.41.50.2Iris-setosa
815.03.41.60.4Iris-setosa
825.03.51.30.3Iris-setosa
835.03.51.60.6Iris-setosa
845.03.61.40.2Iris-setosa
855.12.53.01.1Iris-versicolor
865.13.31.70.5Iris-setosa
875.13.41.50.2Iris-setosa
885.13.51.40.2Iris-setosa
895.13.51.40.3Iris-setosa
905.13.71.50.4Iris-setosa
915.13.81.50.3Iris-setosa
925.13.81.60.2Iris-setosa
935.13.81.90.4Iris-setosa
945.22.73.91.4Iris-versicolor
955.23.41.40.2Iris-setosa
965.23.51.50.2Iris-setosa
975.24.11.50.1Iris-setosa
985.33.71.50.2Iris-setosa
995.43.04.51.5Iris-versicolor
1005.43.41.50.4Iris-setosa
Rows: 1-100 | Columns: 5

Let’s import the model:

from verticapy.machine_learning.vertica import NearestCentroid

Then we can create the model:

model = NearestCentroid(p = 2)

We can now fit the model:

model.fit(
    train,
    [
        "SepalLengthCm",
        "SepalWidthCm",
        "PetalLengthCm",
        "PetalWidthCm",
    ],
    "Species",
    test,
)

We can get all the classification metrics using the classification_report:

model.classification_report()
Iris-setosa
Iris-versicolor
Iris-virginica
avg_macro
avg_weighted
avg_micro
auc0.99999999999999980.97058823529411760.90322580645161250.95793801391524320.9647231326548212[null]
prc_auc1.00.90968933843933840.88245129483519570.93071354442484470.9447445503920665[null]
accuracy0.72727272727272730.61363636363636360.88636363636363640.74242424242424230.74845041322314070.7424242424242424
log_loss0.2487638476205390.1992885118141640.2009556785744420.2163360126697150.22339431226456147[null]
precision1.00.370370370370370351.00.79012345679012340.85690235690235680.6136363636363636
recall0.428571428571428551.00.61538461538461540.68131868131868120.61363636363636360.6136363636363636
f1_score0.60.54054054054054060.7619047619047620.63414843414843410.63432198432198430.6136363636363636
mcc0.53069073202876320.430331482911935240.72795228546555850.5629915001354190.56616363438126450.42045454545454547
informedness0.42857142857142860.50.61538461538461540.51465201465201470.50.4204545454545454
markedness0.6571428571428570.370370370370370240.86111111111111120.62954144620811280.65223063973063950.4204545454545454
csi0.428571428571428550.370370370370370350.61538461538461540.471442138108804750.470538720538720530.4426229508196721
Rows: 1-11 | Columns: 7

Important

For this example, a specific model is utilized, and it may not correspond exactly to the model you are working with. To see a comprehensive example specific to your class of interest, please refer to that particular class.