RandomForestRegressor

In [ ]:
RandomForestRegressor(name: str,
                      cursor = None,
                      n_estimators: int = 10,
                      max_features = "auto",
                      max_leaf_nodes: int = 1e9, 
                      sample: float = 0.632,
                      max_depth: int = 5,
                      min_samples_leaf: int = 1,
                      min_info_gain: float = 0.0,
                      nbins: int = 32)

Creates a RandomForestRegressor object using the Vertica Random Forest function on the data. It is one of the ensemble learning methods for classification that operates by constructing a multitude of decision trees at training time and outputting the mean prediction.

Parameters

Name Type Optional Description
name
str
Name of the model to be stored in the database.
cursor
DBcursor
Vertica DB cursor.
n_estimators
int
The number of trees in the forest, an integer between 0 and 1000, inclusive.
max_features
str
The number of randomly chosen features from which to pick the best feature to split on a given tree node. It can be an integer or one of the two following methods.
  • auto : square root of the total number of predictors.
  • max : number of predictors.
max_leaf_nodes
int
The maximum number of leaf nodes a tree in the forest can have, an integer between 1 and 1e9, inclusive.
sample
float
The portion of the input data set that is randomly picked for training each tree, a float between 0.0 and 1.0, inclusive.
max_depth
int
The maximum depth for growing each tree, an integer between 1 and 100, inclusive.
min_samples_leaf
int
The minimum number of samples each branch must have after splitting a node, an integer between 1 and 1e6, inclusive. A split that causes fewer remaining samples is discarded.
min_info_gain
float
The minimum threshold for including a split, a float between 0.0 and 1.0, inclusive. A split with information gain less than this threshold is discarded.
nbins
int
The number of bins to use for continuous features, an integer between 2 and 1000, inclusive.

Attributes

After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:

Name Type Description
input_relation
str
Training relation.
X
list
List of the predictors.
y
str
Response column.
test_relation
str
Relation to use to test the model. All model methods are abstractions that simplify the process. The testing relation will be used by the methods to evaluate the model. If empty, the training relation will be used instead. This attribute can be changed at any time.

Methods

Name Description
deploySQL Returns the SQL code needed to deploy the model.
drop Drops the model from the Vertica DB.
export_graphviz Converts the input tree to graphviz.
features_importance Computes the model features importance using the Gini Index.
fit Trains the model.
get_attr Returns the model attribute.
get_params Returns the model Parameters.
get_tree Returns a tablesample with all the input tree information.
plot Draws the Model.
plot_tree Draws the input tree. The module anytree must be installed in the machine.
predict Predicts using the input relation.
regression_report Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error...).
score Computes the model score.
set_cursor Sets a new DB cursor.
set_params Sets the parameters of the model.
shapExplainer Creates a shapExplainer for the model.
to_sklearn Converts the Vertica model to an sklearn model.

Example

In [3]:
from verticapy.learn.ensemble import RandomForestRegressor
model = RandomForestRegressor(name = "public.rf_winequality",
                              n_estimators = 20,
                              max_features = "auto",
                              max_leaf_nodes = 32, 
                              sample = 0.7,
                              max_depth = 3,
                              min_samples_leaf = 5,
                              min_info_gain = 0.0,
                              nbins = 32)
display(model)
<RandomForestRegressor>