ROC

Returns a table that displays the points on a receiver operating characteristic curve. The ROC function tells you the accuracy of a classification model as you raise the discrimination threshold for the model.

You cannot pass any inputs to the OVER() clause.

Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.

Syntax

ROC ( target, probabilities
        [USING PARAMETERS [num_bins=nBins] ])
     OVER()

Arguments

target

The column in the input table containing the response variable. Must be an integer.

Greater values result in more precise approximations of the AUC.

probabilities

The column in the input table where the observation is of class 1. Must be a float.

Parameters

num_bins=nBins

(Optional) Groups rows together, based upon the probability column, for faster processing. You use this parameter to determine the number of different decision boundaries to consider. The parameter partitions the number line from 0 to 1 in nBin points that are equally spaced. It evaluates the table at each of the nBin points. Must be an integer.

Default Value: 100

Examples

This example show how you can execute the ROC function on an input table named mtcars. The response variables appear in the column obs, while the prediction variables appear in the column pred.

=> SELECT ROC(obs, prob USING PARAMETERS num_bins=2) OVER() 
	FROM (SELECT am AS obs,
                    PREDICT_LOGISTIC_REG (mpg, cyl, disp, hp, drat, wt,
                                          qsec, vs, gear, carb
                                          USING PARAMETERS model_name='logisticRegModel',
                                                           type='probability') AS prob
             FROM mtcars) AS prediction_output;
 decision_boundary | false_positive_rate | true_positive_rate |                   comment
-------------------+---------------------+--------------------+---------------------------------------------
                 0 |                   1 |                  1 |
               0.5 |                   0 |                  1 |
                 1 |                   0 |                  0 | Of 32 rows, 32 were used and 0 were ignored
(3 rows)

The first column, decision_boundary, indicates the cut-off point for whether to classify a response as 0 or 1. For instance, in each row, if prob is greater than decision_boundary, the response is classified as 1. If prob is less than decision_boundary, the response is classified as 0.

The second column, false_positive_rate, shows the percentage of false positives (when 0 is classified as 1) in the corresponding decision_boundary.

The third column, true_positive_rate, shows the percentage of rows that were classified as 1 and also belong to class 1.