RF_CLASSIFIER

Trains a random forest model for classification on an input relation.

Syntax

RF_CLASSIFIER ( 'model‑name', input‑relation, 'response‑column', 'predictor‑columns'                                                          
             [ USING PARAMETERS [exclude_columns='excluded‑columns']    
                                [, ntree=num‑trees]
                                [, mtry=num‑features]
                                [, sampling_size=sampling‑size]
                                [, max_depth=depth]
                                [, max_breadth=breadth]
                                [, min_leaf_size=leaf_size]
                                [, min_info_gain=threshold]
                                [, nbins=num‑bins] ] )

Arguments

model‑name	Identifies the model stored as a result of the training, where model‑name conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.
input‑relation	The table or view that contains the training samples. If the input relation is defined in Hive, use `SYNC_WITH_HCATALOG_SCHEMA` to sync the `hcatalog` schema, and then run the machine learning function.
response‑column	An input column of type numeric, CHAR/VARCHAR, or BOOLEAN that represents the dependent variable. Vertica automatically casts numeric response column values to VARCHAR.
predictor‑columns	Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter `exclude_columns` must include response‑column, and any columns that are invalid as predictor columns. All predictor columns must be of type numeric, CHAR/VARCHAR, or BOOLEAN; otherwise the model is invalid. BOOLEAN column values are converted to FLOAT values before training: 0 for false, 1 for true.

Parameter Settings

Parameter name	Set to…
`exclude_columns`	Comma-separated list of column names from input‑columns to exclude from processing.
`ntree`	The number of trees in the forest, an integer between 0 and 1000, inclusive. Default: 20
`mtry`	The number of randomly chosen features from which to pick the best feature to split on a given tree node, an integer `≤ number‑predictors`. Default: Square root of the total number of predictors
`sampling_size`	The portion of the input data set that is randomly picked for training each tree, a FLOAT between 0.0 and 1.0, inclusive. Default: 0.632
`max_depth`	The maximum depth for growing each tree, an integer between 1 and 100, inclusive. Default: 5
`max_breadth`	The maximum number of leaf nodes a tree in the forest can have, an integer between 1 and 1e9, inclusive. Default: 32
`min_leaf_size`	The minimum number of samples each branch must have after splitting a node, an integer between 1 and 1e6, inclusive. A split that causes fewer remaining samples is discarded. Default: 1
`min_info_gain`	The minimum threshold for including a split, a FLOAT between 0.0 and 1.0, inclusive. A split with information gain less than this threshold is discarded. Default: 0.0
`nbins`	The number of bins to use for continuous features, an integer between 2 and 1000, inclusive. Default: 32

Model Attributes

Attribute	Description
`data`	Data for the function, including: `predictorNames`: The name of the predictors in the same order they were specified for training the model. `predictorTypes`: The type of the predictors in the same order as their names in predictorNames.
`ntree`	Number of trees in the model.
`skippedRows`	Number of rows in `input_relation` that were skipped because they contained an invalid value.
`processedRows`	Total number of rows in `input_relation` minus `skippedRows`.
`callStr`	Value of all input arguments that were specified at the time the function was called.

Examples

=> SELECT RF_CLASSIFIER ('myRFModel', 'iris', 'Species', 'Sepal_Length, Sepal_Width, 
Petal_Length, Petal_Width' USING PARAMETERS ntree=100, sampling_size=0.3);

RF_CLASSIFIER
--------------------------------------------------
Finished training
(1 row)

RF_CLASSIFIER

Syntax

Arguments

Parameter Settings

Model Attributes

Examples

See Also

Version 9.2.x Documentation