Vertica Analytics Platform Version 9.2.x Documentation
Returns a table that displays the points on a receiver operating characteristic curve. The
ROC function tells you the accuracy of a classification model as you raise the discrimination threshold for the model.
ROC ( targets, probabilities [ USING PARAMETERS [num_bins=num‑bins] [, AUC=output] [, main_class=class‑name ] ) ] ) OVER()
An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:
If the input column is of data type INTEGER or BOOLEAN, the function ignores parameter
A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if targets is of type INTEGER.
|Parameter name||Set to…|
An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each num‑bin + 1 point.
A Boolean value that specifies whether to output the area under the curve (AUC) value.
Used only if target is of type CHAR/VARCHAR, specifies the class to associate with the probabilities argument.
ROC on input table
mtcars. Observed class labels are in column
obs, predicted class labels are in column
=> SELECT ROC(obs::int, prob::float USING PARAMETERS num_bins=5, AUC = True) OVER() FROM (SELECT am AS obs, PREDICT_LOGISTIC_REG (mpg, cyl, disp, drat, wt, qsec, vs, gear, carb USING PARAMETERS model_name='myLogisticRegModel', type='probability') AS prob FROM mtcars) AS prediction_output;
decision_boundary | false_positive_rate | true_positive_rate | AUC |comment -------------------+---------------------+--------------------+-----+----------------------------------- 0 | 1 | 1 | | 0.5 | 0 | 1 | | 1 | 0 | 0 | 1 | Of 32 rows,32 were used and 0 were ignoreded (3 rows)
The function returns a table with the following results:
decision_boundaryindicates the cut-off point for whether to classify a response as 0 or 1. In each row, if
probis equal to or greater than
decision_boundary, the response is classified as 1. If
probis less than
decision_boundary, the response is classified as 0.
false_positive_rateshows the percentage of false positives (when 0 is classified as 1) in the corresponding
true_positive_rateshows the percentage of rows that were classified as 1 and also belong to class 1.
Was this topic helpful?