Vertica Analytics Platform Version 9.2.x Documentation
Returns a table that displays the points on a receiver precision recall (PR) curve.
PRC ( targets, probabilities [ USING PARAMETERS [num_bins=num‑bins] [, f1_score=return‑score ] [, main_class=class‑name ] ) OVER()
An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:
If the input column is of data type INTEGER or BOOLEAN, the function ignores parameter
A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if targets is of type INTEGER.
|Parameter name||Set to…|
An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each num‑bin + 1 point.
A Boolean that specifies whether to return a column that contains the f1 score—the harmonic average of the precision and recall measures, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
Used only if target is of type CHAR/VARCHAR, specifies the class to associate with the probabilities argument.
Execute the PRC function on an input table named
mtcars. The response variables appear in the column
obs, while the prediction variables appear in column
=> SELECT PRC(obs::int, prob::float USING PARAMETERS num_bins=2, f1_score=true) OVER() FROM (SELECT am AS obs, PREDICT_LOGISTIC_REG (mpg, cyl, disp, drat, wt, qsec, vs, gear, carb USING PARAMETERS model_name='myLogisticRegModel', type='probability') AS prob FROM mtcars) AS prediction_output; decision_boundary | recall | precision | f1_score | comment ------------------+--------+-----------+-------------------+-------------------------------------------- 0 | 1 | 0.40625 | 0.577777777777778 | 0.5 | 1 | 1 | 1 | Of 32 rows, 32 were used and 0 were ignored (2 rows)
The first column,
decision_boundary, indicates the cut-off point for whether to classify a response as 0 or 1. For example, in each row, if the probability is equal to or greater than
decision_boundary, the response is classified as 1. If the probability is less than
decision_boundary, the response is classified as 0.
Was this topic helpful?