RandomForestRegressor¶
In [ ]:
RandomForestRegressor(name: str,
cursor = None,
n_estimators: int = 10,
max_features = "auto",
max_leaf_nodes: int = 1e9,
sample: float = 0.632,
max_depth: int = 5,
min_samples_leaf: int = 1,
min_info_gain: float = 0.0,
nbins: int = 32)
Creates a RandomForestRegressor object using the Vertica Random Forest function on the data. It is one of the ensemble learning methods for classification that operates by constructing a multitude of decision trees at training time and outputting the mean prediction.
Parameters¶
Name  Type  Optional  Description 

name  str  ❌  Name of the model to be stored in the database. 
cursor  DBcursor  ✓  Vertica DB cursor. 
n_estimators  int  ✓  The number of trees in the forest, an integer between 0 and 1000, inclusive. 
max_features  str  ✓  The number of randomly chosen features from which to pick the best feature to split on a given tree node. It can be an integer or one of the two following methods.

max_leaf_nodes  int  ✓  The maximum number of leaf nodes a tree in the forest can have, an integer between 1 and 1e9, inclusive. 
sample  float  ✓  The portion of the input data set that is randomly picked for training each tree, a float between 0.0 and 1.0, inclusive. 
max_depth  int  ✓  The maximum depth for growing each tree, an integer between 1 and 100, inclusive. 
min_samples_leaf  int  ✓  The minimum number of samples each branch must have after splitting a node, an integer between 1 and 1e6, inclusive. A split that causes fewer remaining samples is discarded. 
min_info_gain  float  ✓  The minimum threshold for including a split, a float between 0.0 and 1.0, inclusive. A split with information gain less than this threshold is discarded. 
nbins  int  ✓  The number of bins to use for continuous features, an integer between 2 and 1000, inclusive. 
Attributes¶
After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:
Name  Type  Description 

input_relation  str  Training relation. 
X  list  List of the predictors. 
y  str  Response column. 
test_relation  str  Relation to use to test the model. All model methods are abstractions that simplify the process. The testing relation will be used by the methods to evaluate the model. If empty, the training relation will be used instead. This attribute can be changed at any time. 
Methods¶
Name  Description 

deploySQL  Returns the SQL code needed to deploy the model. 
drop  Drops the model from the Vertica DB. 
export_graphviz  Converts the input tree to graphviz. 
features_importance  Computes the model features importance using the Gini Index. 
fit  Trains the model. 
get_attr  Returns the model attribute. 
get_params  Returns the model Parameters. 
get_tree  Returns a tablesample with all the input tree information. 
plot  Draws the Model. 
plot_tree  Draws the input tree. The module anytree must be installed in the machine. 
predict  Predicts using the input relation. 
regression_report  Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error...). 
score  Computes the model score. 
set_cursor  Sets a new DB cursor. 
set_params  Sets the parameters of the model. 
shapExplainer  Creates a shapExplainer for the model. 
to_sklearn  Converts the Vertica model to an sklearn model. 
Example¶
In [3]:
from verticapy.learn.ensemble import RandomForestRegressor
model = RandomForestRegressor(name = "public.rf_winequality",
n_estimators = 20,
max_features = "auto",
max_leaf_nodes = 32,
sample = 0.7,
max_depth = 3,
min_samples_leaf = 5,
min_info_gain = 0.0,
nbins = 32)
display(model)