Loading...

verticapy.machine_learning.metrics.best_cutoff#

verticapy.machine_learning.metrics.best_cutoff(y_true: str, y_score: str | list | ndarray, input_relation: str | vDataFrame, average: Literal[None, 'binary', 'micro', 'macro', 'scores', 'weighted'] = None, labels: list | ndarray | None = None, pos_label: bool | float | str | timedelta | datetime | None = None, nbins: int = 10000) float | list[float]#

Computes the ROC AUC (Area Under Curve).

Parameters#

y_true: str

Response column.

y_score: str | ArrayLike

Prediction.

input_relation: SQLRelation

Relation to use for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x

average: str, optional

The method used to compute the final score for multiclass-classification.

  • binary:

    considers one of the classes as positive and use the binary confusion matrix to compute the score.

  • micro:

    positive and negative values globally.

  • macro:

    average of the score of each class.

  • score:

    scores for all the classes.

  • weighted :

    weighted average of the score of each class.

  • None:

    accuracy.

If empty, the behaviour is similar to the ‘scores’ option.

labels: ArrayLike, optional

List of the response column categories.

pos_label: PythonScalar, optional

To compute the metric, one of the response column classes must be the positive class. The parameter ‘pos_label’ represents this class.

nbins: int, optional

An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. Greater values for nbins give more precise estimations of the AUC, but can potentially decrease performance. The maximum value is 999,999. If negative, the maximum value is used.

Returns#

float

score.

Examples#

We should first import verticapy.

import verticapy as vp

Let’s create a small dataset that has:

  • true value

  • predicted value

data = vp.vDataFrame(
    {
        "y_true": [1, 1, 0, 0, 1],
        "y_pred": [1, 1, 1, 0, 1],
    },
)

Next, we import the metric:

from verticapy.machine_learning.metrics import best_cutoff

Now we can conveniently calculate the score:

best_cutoff(
    y_true  = "y_true",
    y_score = "y_pred",
    input_relation = data,
)

Out[4]: 0.999

Note

For multi-class classification, we can select the average method for averaging from the following options: - binary - micro - macro - scores - weighted

It is also possible to directly compute the score from the vDataFrame:

data.score(
    y_true  = "y_true",
    y_score = "y_pred",
    metric  = "best_cutoff",
)

Out[5]: 0.999

Note

VerticaPy uses simple SQL queries to compute various metrics. You can use the set_option() function with the sql_on parameter to enable SQL generation and examine the generated queries.

See also

vDataFrame.score() : Computes the input ML metric.