verticapy.machine_learning.metrics.best_cutoff#
- verticapy.machine_learning.metrics.best_cutoff(y_true: str, y_score: str | list | ndarray, input_relation: str | vDataFrame, average: Literal[None, 'binary', 'micro', 'macro', 'scores', 'weighted'] = None, labels: list | ndarray | None = None, pos_label: bool | float | str | timedelta | datetime | None = None, nbins: int = 10000) float | list[float] #
Computes the ROC AUC (Area Under Curve).
Parameters#
- y_true: str
Response column.
- y_score: str | ArrayLike
Prediction.
- input_relation: SQLRelation
Relation to use for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x
- average: str, optional
The method used to compute the final score for multiclass-classification.
- binary:
considers one of the classes as positive and use the binary confusion matrix to compute the score.
- micro:
positive and negative values globally.
- macro:
average of the score of each class.
- score:
scores for all the classes.
- weighted :
weighted average of the score of each class.
- None:
accuracy.
If empty, the behaviour is similar to the ‘scores’ option.
- labels: ArrayLike, optional
List of the response column categories.
- pos_label: PythonScalar, optional
To compute the metric, one of the response column classes must be the positive class. The parameter ‘pos_label’ represents this class.
- nbins: int, optional
An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. Greater values for nbins give more precise estimations of the AUC, but can potentially decrease performance. The maximum value is 999,999. If negative, the maximum value is used.
Returns#
- float
score.
Examples#
We should first import verticapy.
import verticapy as vp
Let’s create a small dataset that has:
true value
predicted value
data = vp.vDataFrame( { "y_true": [1, 1, 0, 0, 1], "y_pred": [1, 1, 1, 0, 1], }, )
Next, we import the metric:
from verticapy.machine_learning.metrics import best_cutoff
Now we can conveniently calculate the score:
best_cutoff( y_true = "y_true", y_score = "y_pred", input_relation = data, ) Out[4]: 0.999
Note
For multi-class classification, we can select the
average
method for averaging from the following options: - binary - micro - macro - scores - weightedIt is also possible to directly compute the score from the vDataFrame:
data.score( y_true = "y_true", y_score = "y_pred", metric = "best_cutoff", ) Out[5]: 0.999
Note
VerticaPy uses simple SQL queries to compute various metrics. You can use the
set_option()
function with thesql_on
parameter to enable SQL generation and examine the generated queries.See also
vDataFrame.
score()
: Computes the input ML metric.