Vertica Analytics Platform Version 9.2.x Documentation

ROC

Returns a table that displays the points on a receiver operating characteristic curve. The ROC function tells you the accuracy of a classification model as you raise the discrimination threshold for the model.

Syntax

ROC ( targets, probabilities 
    [ USING PARAMETERS [num_bins=num‑bins]
                       [, AUC=output]
                       [, main_class=class‑name ] ) ] )
    OVER()

Arguments

targets

An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:

  • INTEGER: Uses the input column as containing the true value of the response variable.
  • BOOLEAN: Resolves Yes to 1, 0 to No.
  • CHAR/VARCHAR: Resolves the value specified by parameter main_class to 1, all other values to 0.

If the input column is of data type INTEGER or BOOLEAN, the function ignores parameter main_class.

probabilities

A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if targets is of type INTEGER.

Parameter Settings

Parameter name Set to…
num_bins

An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each num‑bin + 1 point.

Default: 100

AUC

A Boolean value that specifies whether to output the area under the curve (AUC) value.

Default: True

main_class

Used only if target is of type CHAR/VARCHAR, specifies the class to associate with the probabilities argument.

Examples

Execute ROC on input table mtcars. Observed class labels are in column obs, predicted class labels are in column prob:

=> SELECT ROC(obs::int, prob::float USING PARAMETERS num_bins=5, AUC = True) OVER() 
	FROM (SELECT am AS obs,
                    PREDICT_LOGISTIC_REG (mpg, cyl, disp, drat, wt,
                                          qsec, vs, gear, carb
                                          USING PARAMETERS model_name='myLogisticRegModel',
                                                           type='probability') AS prob
             FROM mtcars) AS prediction_output;
 decision_boundary | false_positive_rate | true_positive_rate | AUC |comment
-------------------+---------------------+--------------------+-----+-----------------------------------
0                  |                   1 |                  1 |     |
0.5                |                   0 |                  1 |     |
1                  |                   0 |                  0 |   1 | Of 32 rows,32 were used and 0 were ignoreded
(3 rows)

The function returns a table with the following results:

  • decision_boundary indicates the cut-off point for whether to classify a response as 0 or 1. In each row, if prob is equal to or greater than decision_boundary, the response is classified as 1. If prob is less than decision_boundary, the response is classified as 0.
  • false_positive_rate shows the percentage of false positives (when 0 is classified as 1) in the corresponding decision_boundary.
  • true_positive_rate shows the percentage of rows that were classified as 1 and also belong to class 1.