Loading...

verticapy.machine_learning.vertica.tree.DecisionTreeClassifier.score

DecisionTreeClassifier.score(metric: Literal['aic', 'bic', 'accuracy', 'acc', 'balanced_accuracy', 'ba', 'auc', 'roc_auc', 'prc_auc', 'best_cutoff', 'best_threshold', 'false_discovery_rate', 'fdr', 'false_omission_rate', 'for', 'false_negative_rate', 'fnr', 'false_positive_rate', 'fpr', 'recall', 'tpr', 'precision', 'ppv', 'specificity', 'tnr', 'negative_predictive_value', 'npv', 'negative_likelihood_ratio', 'lr-', 'positive_likelihood_ratio', 'lr+', 'diagnostic_odds_ratio', 'dor', 'log_loss', 'logloss', 'f1', 'f1_score', 'mcc', 'bm', 'informedness', 'mk', 'markedness', 'ts', 'csi', 'critical_success_index', 'fowlkes_mallows_index', 'fm', 'prevalence_threshold', 'pm', 'confusion_matrix', 'classification_report'] = 'accuracy', average: Literal[None, 'binary', 'micro', 'macro', 'scores', 'weighted'] | None = None, pos_label: Annotated[bool | float | str | timedelta | datetime, 'Python Scalar'] | None = None, cutoff: Annotated[int | float | Decimal, 'Python Numbers'] = 0.5, nbins: int = 10000) float | list[float]

Computes the model score.

Parameters

metric: str, optional

The metric used to compute the score.

  • accuracy:

    Accuracy.

    \[Accuracy = \frac{TP + TN}{TP + TN + FP + FN}\]
  • aic:

    Akaike’s Information Criterion

    \[AIC = 2k - 2\ln(\hat{L})\]
  • auc:

    Area Under the Curve (ROC).

    \[AUC = \int_{0}^{1} TPR(FPR) \, dFPR\]
  • ba:

    Balanced Accuracy.

    \[BA = \frac{TPR + TNR}{2}\]
  • best_cutoff:

    Cutoff which optimised the ROC Curve prediction.

  • bic:

    Bayesian Information Criterion

    \[BIC = -2\ln(\hat{L}) + k \ln(n)\]
  • bm:

    Informedness

    \[BM = TPR + TNR - 1\]
  • csi:

    Critical Success Index

    \[index = \frac{TP}{TP + FN + FP}\]
  • f1:

    F1 Score

    \[F_1 Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}\]
  • fdr:

    False Discovery Rate

    \[FDR = 1 - PPV\]
  • fm:

    Fowlkes-Mallows index

    \[FM = \sqrt{PPV * TPR}\]
  • fnr:

    False Negative Rate

    \[FNR = \frac{FN}{FN + TP}\]
  • for:

    False Omission Rate

    \[FOR = 1 - NPV\]
  • fpr:

    False Positive Rate

    \[FPR = \frac{FP}{FP + TN}\]
  • logloss:

    Log Loss.

    \[Loss = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right)\]
  • lr+:

    Positive Likelihood Ratio.

    \[LR+ = \frac{TPR}{FPR}\]
  • lr-:

    Negative Likelihood Ratio.

    \[LR- = \frac{FNR}{TNR}\]
  • dor:

    Diagnostic Odds Ratio.

    \[DOR = \frac{TP \times TN}{FP \times FN}\]
  • mc:

    Matthews Correlation Coefficient .. math:

    MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}
    
  • mk:

    Markedness

    \[MK = PPV + NPV - 1\]
  • npv:

    Negative Predictive Value

    \[NPV = \frac{TN}{TN + FN}\]
  • prc_auc:

    Area Under the Curve (PRC)

    \[AUC = \int_{0}^{1} Precision(Recall) \, dRecall\]
  • precision:

    Precision

    \[Precision = TP / (TP + FP)\]
  • pt:

    Prevalence Threshold.

    \[threshold = \frac{\sqrt{FPR}}{\sqrt{TPR} + \sqrt{FPR}}\]
  • recall:

    Recall.

    \[Recall = \frac{TP}{TP + FN}\]
  • specificity:

    Specificity.

    \[Specificity = \frac{TN}{TN + FP}\]
average: str, optional

The method used to compute the final score for multiclass-classification.

  • binary:

    considers one of the classes as positive and use the binary confusion matrix to compute the score.

  • micro:

    positive and negative values globally.

  • macro:

    average of the score of each class.

  • scores:

    scores for all the classes.

  • weighted:

    weighted average of the score of each class.

If empty, the result will depend on the input metric. Whenever it is possible, the exact score is computed. Otherwise, the behaviour is similar to the ‘scores’ option.

pos_label: PythonScalar, optional

Label to consider as positive. All the other classes will be merged and considered as negative for multiclass classification.

cutoff: PythonNumber, optional

Cutoff for which the tested category is accepted as a prediction.

nbins: int, optional

[Only when method is set to auc|prc_auc|best_cutoff] An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. Greater values for nbins give more precise estimations of the AUC, but can potentially decrease performance. The maximum value is 999,999. If negative, the maximum value is used.

Returns

float

score.

Examples

For this example, we will use the Iris dataset.

import verticapy.datasets as vpd

data = vpd.load_iris()

train, test = data.train_test_split(test_size = 0.2)
123
SepalLengthCm
Numeric(7)
123
SepalWidthCm
Numeric(7)
123
PetalLengthCm
Numeric(7)
123
PetalWidthCm
Numeric(7)
Abc
Species
Varchar(30)
13.34.55.67.8Iris-setosa
23.34.55.67.8Iris-setosa
33.34.55.67.8Iris-setosa
43.34.55.67.8Iris-setosa
53.34.55.67.8Iris-setosa
63.34.55.67.8Iris-setosa
73.34.55.67.8Iris-setosa
83.34.55.67.8Iris-setosa
93.34.55.67.8Iris-setosa
103.34.55.67.8Iris-setosa
113.34.55.67.8Iris-setosa
123.34.55.67.8Iris-setosa
133.34.55.67.8Iris-setosa
143.34.55.67.8Iris-setosa
153.34.55.67.8Iris-setosa
163.34.55.67.8Iris-setosa
173.34.55.67.8Iris-setosa
183.34.55.67.8Iris-setosa
193.34.55.67.8Iris-setosa
203.34.55.67.8Iris-setosa
213.34.55.67.8Iris-setosa
223.34.55.67.8Iris-setosa
233.34.55.67.8Iris-setosa
243.34.55.67.8Iris-setosa
253.34.55.67.8Iris-setosa
263.34.55.67.8Iris-setosa
274.33.01.10.1Iris-setosa
284.34.79.61.8Iris-virginica
294.34.79.61.8Iris-virginica
304.34.79.61.8Iris-virginica
314.34.79.61.8Iris-virginica
324.34.79.61.8Iris-virginica
334.34.79.61.8Iris-virginica
344.34.79.61.8Iris-virginica
354.34.79.61.8Iris-virginica
364.34.79.61.8Iris-virginica
374.34.79.61.8Iris-virginica
384.34.79.61.8Iris-virginica
394.34.79.61.8Iris-virginica
404.34.79.61.8Iris-virginica
414.34.79.61.8Iris-virginica
424.34.79.61.8Iris-virginica
434.34.79.61.8Iris-virginica
444.34.79.61.8Iris-virginica
454.34.79.61.8Iris-virginica
464.34.79.61.8Iris-virginica
474.34.79.61.8Iris-virginica
484.34.79.61.8Iris-virginica
494.34.79.61.8Iris-virginica
504.34.79.61.8Iris-virginica
514.34.79.61.8Iris-virginica
524.34.79.61.8Iris-virginica
534.34.79.61.8Iris-virginica
544.42.91.40.2Iris-setosa
554.43.01.30.2Iris-setosa
564.43.21.30.2Iris-setosa
574.52.31.30.3Iris-setosa
584.63.11.50.2Iris-setosa
594.63.21.40.2Iris-setosa
604.63.41.40.3Iris-setosa
614.63.61.00.2Iris-setosa
624.73.21.30.2Iris-setosa
634.73.21.60.2Iris-setosa
644.83.01.40.1Iris-setosa
654.83.01.40.3Iris-setosa
664.83.11.60.2Iris-setosa
674.83.41.60.2Iris-setosa
684.83.41.90.2Iris-setosa
694.92.43.31.0Iris-versicolor
704.92.54.51.7Iris-virginica
714.93.01.40.2Iris-setosa
724.93.11.50.1Iris-setosa
734.93.11.50.1Iris-setosa
744.93.11.50.1Iris-setosa
755.02.03.51.0Iris-versicolor
765.02.33.31.0Iris-versicolor
775.03.01.60.2Iris-setosa
785.03.21.20.2Iris-setosa
795.03.31.40.2Iris-setosa
805.03.41.50.2Iris-setosa
815.03.41.60.4Iris-setosa
825.03.51.30.3Iris-setosa
835.03.51.60.6Iris-setosa
845.03.61.40.2Iris-setosa
855.12.53.01.1Iris-versicolor
865.13.31.70.5Iris-setosa
875.13.41.50.2Iris-setosa
885.13.51.40.2Iris-setosa
895.13.51.40.3Iris-setosa
905.13.71.50.4Iris-setosa
915.13.81.50.3Iris-setosa
925.13.81.60.2Iris-setosa
935.13.81.90.4Iris-setosa
945.22.73.91.4Iris-versicolor
955.23.41.40.2Iris-setosa
965.23.51.50.2Iris-setosa
975.24.11.50.1Iris-setosa
985.33.71.50.2Iris-setosa
995.43.04.51.5Iris-versicolor
1005.43.41.50.4Iris-setosa
Rows: 1-100 | Columns: 5

Let’s import the model:

from verticapy.machine_learning.vertica import NearestCentroid

Then we can create the model:

model = NearestCentroid(p = 2)

We can now fit the model:

model.fit(
    train,
    [
        "SepalLengthCm",
        "SepalWidthCm",
        "PetalLengthCm",
        "PetalWidthCm",
    ],
    "Species",
    test,
)

We can get the score:

model.score()
Out[108]: 0.8181818181818182

To get the score of a particular class:

model.score(pos_label= "Iris-setosa")
Out[109]: 0.7272727272727273

Important

For this example, a specific model is utilized, and it may not correspond exactly to the model you are working with. To see a comprehensive example specific to your class of interest, please refer to that particular class.