PRC
Returns a table that displays the points on a receiver precision recall (PR) curve.
Syntax
PRC ( targets, probabilities [ USING PARAMETERS [num_bins=num‑bins] [, f1_score=return‑score ] [, main_class=class‑name ] ) OVER()
Arguments
targets |
An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:
If the input column is of data type INTEGER or BOOLEAN, the function ignores parameter |
probabilities |
A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if targets is of type INTEGER. |
Parameter Settings
Parameter name | Set to… |
---|---|
num_bins
|
An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each num‑bin + 1 point. Default: 100 |
f1_score
|
A Boolean that specifies whether to return a column that contains the f1 score—the harmonic average of the precision and recall measures, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. Default: false |
main_class
|
Used only if target is of type CHAR/VARCHAR, specifies the class to associate with the probabilities argument. |
Examples
Execute the PRC function on an input table named mtcars
. The response variables appear in the column obs
, while the prediction variables appear in column pred
.
=> SELECT PRC(obs::int, prob::float USING PARAMETERS num_bins=2, f1_score=true) OVER() FROM (SELECT am AS obs, PREDICT_LOGISTIC_REG (mpg, cyl, disp, drat, wt, qsec, vs, gear, carb USING PARAMETERS model_name='myLogisticRegModel', type='probability') AS prob FROM mtcars) AS prediction_output; decision_boundary | recall | precision | f1_score | comment ------------------+--------+-----------+-------------------+-------------------------------------------- 0 | 1 | 0.40625 | 0.577777777777778 | 0.5 | 1 | 1 | 1 | Of 32 rows, 32 were used and 0 were ignored (2 rows)
The first column, decision_boundary
, indicates the cut-off point for whether to classify a response as 0 or 1. For example, in each row, if the probability is equal to or greater than decision_boundary
, the response is classified as 1. If the probability is less than decision_boundary
, the response is classified as 0.