ROC
Returns a table that displays the points on a receiver operating characteristic curve. The ROC
function tells you the accuracy of a classification model as you raise the discrimination threshold for the model.
You cannot pass any inputs to the OVER()
clause.
Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.
Syntax
ROC ( target, probabilities [USING PARAMETERS [num_bins=nBins] ]) OVER()
Arguments
target |
The column in the input table containing the response variable. Must be an integer. Greater values result in more precise approximations of the AUC. |
probabilities |
The column in the input table where the observation is of class 1. Must be a float. |
Parameters
num_bins=nBins |
(Optional) Groups rows together, based upon the probability column, for faster processing. You use this parameter to determine the number of different decision boundaries to consider. The parameter partitions the number line from 0 to 1 in nBin points that are equally spaced. It evaluates the table at each of the nBin points. Must be an integer. Default Value: 100 |
Examples
This example show how you can execute the ROC function on an input table named mtcars
. The response variables appear in the column obs
, while the prediction variables appear in the column pred
.
=> SELECT ROC(obs, prob USING PARAMETERS num_bins=2) OVER() FROM (SELECT am AS obs, PREDICT_LOGISTIC_REG (mpg, cyl, disp, hp, drat, wt, qsec, vs, gear, carb USING PARAMETERS model_name='logisticRegModel', type='probability') AS prob FROM mtcars) AS prediction_output;
decision_boundary | false_positive_rate | true_positive_rate | comment -------------------+---------------------+--------------------+--------------------------------------------- 0 | 1 | 1 | 0.5 | 0 | 1 | 1 | 0 | 0 | Of 32 rows, 32 were used and 0 were ignored (3 rows)
The first column, decision_boundary
, indicates the cut-off point for whether to classify a response as 0 or 1. For instance, in each row, if prob
is greater than decision_boundary
, the response is classified as 1. If prob
is less than decision_boundary
, the response is classified as 0.
The second column, false_positive_rate
, shows the percentage of false positives (when 0 is classified as 1) in the corresponding decision_boundary
.
The third column, true_positive_rate
, shows the percentage of rows that were classified as 1 and also belong to class 1.