Vertica Analytics Platform Version 9.2.x Documentation

PRC

Returns a table that displays the points on a receiver precision recall (PR) curve.

Syntax

PRC ( targets, probabilities
       [ USING PARAMETERS [num_bins=num‑bins]
                          [, f1_score=return‑score ]
                          [, main_class=class‑name ] )
       OVER()

Arguments

targets

An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:

  • INTEGER: Uses the input column as containing the true value of the response variable.
  • BOOLEAN: Resolves Yes to 1, 0 to No.
  • CHAR/VARCHAR: Resolves the value specified by parameter main_class to 1, all other values to 0.

If the input column is of data type INTEGER or BOOLEAN, the function ignores parameter main_class.

probabilities

A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if targets is of type INTEGER.

Parameter Settings

Parameter name Set to…
num_bins

An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each num‑bin + 1 point.

Default: 100

f1_score

A Boolean that specifies whether to return a column that contains the f1 score—the harmonic average of the precision and recall measures, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.

Default: false

main_class

Used only if target is of type CHAR/VARCHAR, specifies the class to associate with the probabilities argument.

Examples

Execute the PRC function on an input table named mtcars. The response variables appear in the column obs, while the prediction variables appear in column pred.

=> SELECT PRC(obs::int, prob::float USING PARAMETERS num_bins=2, f1_score=true) OVER() 
	FROM (SELECT am AS obs,
                    PREDICT_LOGISTIC_REG (mpg, cyl, disp, drat, wt, qsec, vs, gear, carb
                          USING PARAMETERS model_name='myLogisticRegModel',
                                           type='probability') AS prob
             FROM mtcars) AS prediction_output;
decision_boundary | recall | precision |     f1_score      |     comment
------------------+--------+-----------+-------------------+--------------------------------------------
0                 |      1 |   0.40625 | 0.577777777777778 |
0.5               |      1 |         1 |                 1 | Of 32 rows, 32 were used and 0 were ignored
(2 rows)

The first column, decision_boundary, indicates the cut-off point for whether to classify a response as 0 or 1. For example, in each row, if the probability is equal to or greater than decision_boundary, the response is classified as 1. If the probability is less than decision_boundary, the response is classified as 0.