verticapy.machine_learning.metrics.roc_auc_score#
- verticapy.machine_learning.metrics.roc_auc_score(y_true: str, y_score: str | list | ndarray, input_relation: str | vDataFrame, average: Literal[None, 'binary', 'micro', 'macro', 'scores', 'weighted'] = None, labels: list | ndarray | None = None, pos_label: bool | float | str | timedelta | datetime | None = None, nbins: int = 10000) float | list[float] #
Computes the ROC AUC (Area Under Curve).
Parameters#
- y_true: str
Response column.
- y_score: str | ArrayLike
When ‘pos_label’ and ‘labels’ are not defined, it should be a list of probabilities represented by SQL code for the different classes in the same order as the labels. Otherwise, ‘pos_label’ represents the main class, and ‘y_score’ represents its probability.
- input_relation: SQLRelation
Relation to use for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x
- average: str, optional
The method used to compute the final score for multiclass-classification.
- binary:
considers one of the classes as positive and use the binary confusion matrix to compute the score.
- micro:
positive and negative values globally.
- macro:
average of the score of each class.
- score:
scores for all the classes.
- weighted :
weighted average of the score of each class.
- None:
accuracy.
If empty, the behaviour is similar to the ‘scores’ option.
- labels: ArrayLike, optional
List of the response column categories.
- pos_label: PythonScalar, optional
To compute the metric, one of the response column classes must be the positive class. The parameter ‘pos_label’ represents this class.
- nbins: int, optional
An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. Greater values for nbins give more precise estimations of the AUC, but can potentially decrease performance. The maximum value is 999,999. If negative, the maximum value is used.
Returns#
- float
score.
Examples#
We should first import verticapy.
import verticapy as vp
Let’s create a small dataset that has:
true value
probability of the true value
Important
This classification metric does not use the predicted value. Instead, it measures the performance of a classification model by evaluating the likelihood of the true labels given the predicted probabilities.
data = vp.vDataFrame( { "y_true": [1, 1, 0, 0, 1], "y_prob": [0.5, 0.9, 0.2, 0.5, 0.6], }, )
Next, we import the metric:
from verticapy.machine_learning.metrics import roc_auc_score
Now we can conveniently calculate the score:
roc_auc_score( y_true = "y_true", y_score = "y_prob", input_relation = data, ) Out[4]: 0.9166666666666667
Note
For multi-class classification, we can select the
average
method for averaging from the following options: - binary - micro - macro - scores - weightedIt is also possible to directly compute the score from the vDataFrame:
data.score( y_true = "y_true", y_score = "y_prob", metric = "roc_auc", ) Out[5]: 0.9166666666666667
Note
VerticaPy uses simple SQL queries to compute various metrics. You can use the
set_option()
function with thesql_on
parameter to enable SQL generation and examine the generated queries.See also
vDataFrame.
score()
: Computes the input ML metric.