RF_CLASSIFIER
Trains a random forest model for classification on an input table or view.
Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.
Syntax
RF_CLASSIFIER ( 'model_name', 'input_relation', 'response_column', 'predictor_col1, predictor_col2, ..., predictor_coln' [USING PARAMETERS [exclude_columns= 'col1, col2, ..., coln',] [ntree= value,] [mtry= value,] [sampling_size= value,] [max_depth= value,] [max_breadth= value,] [min_leaf_size= value,] [min_info_gain= value,] [nbins= value] ])
Arguments
model_name |
The name of the model stored as a result of the training. Model names are case insensitive. |
input_relation |
The table or view that contains the training samples. |
response_column |
The name of the column in This column must be of data type CHAR or VARCHAR. |
predictor_columns |
A comma-separated list of the columns in the CHAR, VARCHAR and BOOLEAN are treated as categorical data types. All other data types are treated as numeric data types. |
Parameters
exclude_columns='col1, col2, ..., coln' |
(Optional) The columns from |
ntree=value |
(Optional) A positive integer number that indicates the number of trees in the forest. Default Value: 20 Valid Range: (0 to 1000] |
mtry=value |
(Optional) A positive integer number that indicates the number of features to be considered at the split of a tree node. Default Value: When no value is specified for Valid Range: A positive integer number, smaller than or equal to the number of predictors. |
sampling_size=value |
(Optional) A number that indicates what portion of the input data set will randomly be picked for training each tree Default Value: 0.632 Valid Range:(0.0,1.0] |
max_depth=value |
(Optional) A positive integer number that specifies the maximum depth for growing each tree. Default Value: 5 Valid Range: [1 to 100] |
max_breadth=value |
(Optional) A positive integer number that specifies the maximum number of leaf nodes a tree in the forest can have. Default Value: 32 Valid Range: [1 to 1e9] |
min_leaf_size=value |
(Optional) A positive integer number that specifies the minimum samples each branch must have after splitting a node. A split that causes fewer remaining samples will be discarded. Default Value: 1 Valid Range: [1 to 1e6] |
min_info_gain=value |
(Optional) A non-negative number. Any split with information gain less than this threshold will be discarded. Default Value: 0.0 Valid Range: [0.0 to 1.0) |
nbins=value |
(Optional) A positive integer number that indicates the number of bins to use for continuous features. Default Value: 32 Valid Range: [2 to 1000] |
Privileges
To use RF_CLASSIFIER, you must either be a superuser or have CREATE privileges for the schema of the output view and SELECT privileges for the input table or view. There are no privileges needed on the function itself.
See GRANT (Schema) and GRANT (Table).
Examples
This example shows how you can use the RF_CLASSIFIER function.
=> SELECT RF_CLASSIFIER ('myRFModel', 'iris', 'Species', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width' USING PARAMETERS ntree=100, sampling_size=0.3);
RF_CLASSIFIER -------------------------------------------------- The random forest is trained (1 row)