PREDICT_RF_CLASSIFIER_CLASSES

Applies a random forest model on an input table or view and returns the probabilities of classes. The predicted class is selected only based on the popular vote of the decision trees in the forest. Therefore, in special cases the calculated probability of the predicted class may not be the highest.

Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.

Syntax

PREDICT_RF_CLASSIFIER_CLASSES ( col1, col2, ... coln
                       USING PARAMETERS model_name = 'name_of_model'
                                                    [,key_columns = 'key_columns',]
                                                    [exclude_columns = 'col1, col2, ..., coln',] 
                                                    [classes = 'class1, class2, ..., classn', ] 
                                                    [match_by_pos = 'method'] )
              OVER([window-partition-clause]) 

Arguments

col1, col2, ..., coln

The columns to use from the input table or view.

Parameters

model_name = 'name_of_model'

The name of the random forest model. Model names are case-insensitive.

key_columns = 'key_columns'
(Optional) A comma-separated list of column names from input_relation which you use to identify each row of the output.
exclude_columns = 'col1, col2, ..., coln'
(Optional) The columns from predictor_columns that you want to exclude. This parameter is useful when using the wildcard (*) in the predictor_columns.
classes = 'class1, class2, ..., classn'
(Optional) Class labels in the model. The probability of belonging to this given class as predicted by the classifier. The values are case sensitive.
match_by_pos= 'method'

(Optional) Valid Values:

  • false (default): Input columns will be matched to features in the model based on their names.

  • true: Input columns will be matched to features in the model based on their position in the list of indicated input columns.

Return

Return data type: One VARCHAR column and multiple FLOAT columns

The predicted column, of type VARCHAR, contains the class label with the highest vote (popular vote). The first FLOAT column is named probability and contains the probability for the class reported in the predicted column. The other FLOAT columns contain the probability of each class specified in the classes input parameter.

Key columns Columns with the same value and data type as the matching input columns that are specified in the key_columns input parameter.

Examples

This example shows how you can use the PREDICT_RF_CLASSIFIER_CLASSES function.

=> SELECT PREDICT_RF_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
                               USING PARAMETERS model_name='myRFModel') OVER () FROM iris;
predicted  |    probability
-----------+-------------------
setosa     |                 1
setosa     |              0.99
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |              0.97
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |              0.99
.
.
.
(150 rows)

This example shows how you can use the PREDICT_RF_CLASSIFIER_CLASSES function , using the match_by_pos parameter:

=> SELECT PREDICT_RF_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Wdith
                          USING PARAMETERS model_name='myRFModel', match_by_pos='true') OVER () FROM iris;
predicted  |    probability
-----------+-------------------
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
setosa     |                 1
.
.
.
(150 rows)

 

See Also