Loading...

verticapy.machine_learning.metrics.roc_auc_score#

verticapy.machine_learning.metrics.roc_auc_score(y_true: str, y_score: str | list | ndarray, input_relation: str | vDataFrame, average: Literal[None, 'binary', 'micro', 'macro', 'scores', 'weighted'] = None, labels: list | ndarray | None = None, pos_label: bool | float | str | timedelta | datetime | None = None, nbins: int = 10000) float | list[float]#

Computes the ROC AUC (Area Under Curve).

Parameters#

y_true: str

Response column.

y_score: str | ArrayLike

When ‘pos_label’ and ‘labels’ are not defined, it should be a list of probabilities represented by SQL code for the different classes in the same order as the labels. Otherwise, ‘pos_label’ represents the main class, and ‘y_score’ represents its probability.

input_relation: SQLRelation

Relation to use for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x

average: str, optional

The method used to compute the final score for multiclass-classification.

  • binary:

    considers one of the classes as positive and use the binary confusion matrix to compute the score.

  • micro:

    positive and negative values globally.

  • macro:

    average of the score of each class.

  • score:

    scores for all the classes.

  • weighted :

    weighted average of the score of each class.

  • None:

    accuracy.

If empty, the behaviour is similar to the ‘scores’ option.

labels: ArrayLike, optional

List of the response column categories.

pos_label: PythonScalar, optional

To compute the metric, one of the response column classes must be the positive class. The parameter ‘pos_label’ represents this class.

nbins: int, optional

An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. Greater values for nbins give more precise estimations of the AUC, but can potentially decrease performance. The maximum value is 999,999. If negative, the maximum value is used.

Returns#

float

score.

Examples#

We should first import verticapy.

import verticapy as vp

Let’s create a small dataset that has:

  • true value

  • probability of the true value

Important

This classification metric does not use the predicted value. Instead, it measures the performance of a classification model by evaluating the likelihood of the true labels given the predicted probabilities.

data = vp.vDataFrame(
    {
        "y_true": [1, 1, 0, 0, 1],
        "y_prob": [0.5, 0.9, 0.2, 0.5, 0.6],
    },
)

Next, we import the metric:

from verticapy.machine_learning.metrics import roc_auc_score

Now we can conveniently calculate the score:

roc_auc_score(
    y_true  = "y_true",
    y_score = "y_prob",
    input_relation = data,
)

Out[4]: 0.9166666666666667

Note

For multi-class classification, we can select the average method for averaging from the following options: - binary - micro - macro - scores - weighted

It is also possible to directly compute the score from the vDataFrame:

data.score(
    y_true  = "y_true",
    y_score = "y_prob",
    metric  = "roc_auc",
)

Out[5]: 0.9166666666666667

Note

VerticaPy uses simple SQL queries to compute various metrics. You can use the set_option() function with the sql_on parameter to enable SQL generation and examine the generated queries.

See also

vDataFrame.score() : Computes the input ML metric.