RandomForestRegressor¶
In [ ]:
RandomForestRegressor(name: str,
cursor = None,
n_estimators: int = 10,
max_features = "auto",
max_leaf_nodes: int = 1e9,
sample: float = 0.632,
max_depth: int = 5,
min_samples_leaf: int = 1,
min_info_gain: float = 0.0,
nbins: int = 32)
Creates a RandomForestRegressor object using the Vertica Random Forest function on the data. It is one of the ensemble learning methods for classification that operates by constructing a multitude of decision trees at training time and outputting the mean prediction.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
name | str | ❌ | Name of the model to be stored in the database. |
cursor | DBcursor | ✓ | Vertica DB cursor. |
n_estimators | int | ✓ | The number of trees in the forest, an integer between 0 and 1000, inclusive. |
max_features | str | ✓ | The number of randomly chosen features from which to pick the best feature to split on a given tree node. It can be an integer or one of the two following methods.
|
max_leaf_nodes | int | ✓ | The maximum number of leaf nodes a tree in the forest can have, an integer between 1 and 1e9, inclusive. |
sample | float | ✓ | The portion of the input data set that is randomly picked for training each tree, a float between 0.0 and 1.0, inclusive. |
max_depth | int | ✓ | The maximum depth for growing each tree, an integer between 1 and 100, inclusive. |
min_samples_leaf | int | ✓ | The minimum number of samples each branch must have after splitting a node, an integer between 1 and 1e6, inclusive. A split that causes fewer remaining samples is discarded. |
min_info_gain | float | ✓ | The minimum threshold for including a split, a float between 0.0 and 1.0, inclusive. A split with information gain less than this threshold is discarded. |
nbins | int | ✓ | The number of bins to use for continuous features, an integer between 2 and 1000, inclusive. |
Attributes¶
After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:
Name | Type | Description |
---|---|---|
input_relation | str | Training relation. |
X | list | List of the predictors. |
y | str | Response column. |
test_relation | str | Relation to use to test the model. All model methods are abstractions that simplify the process. The testing relation will be used by the methods to evaluate the model. If empty, the training relation will be used instead. This attribute can be changed at any time. |
Methods¶
Name | Description |
---|---|
deploySQL | Returns the SQL code needed to deploy the model. |
drop | Drops the model from the Vertica DB. |
export_graphviz | Converts the input tree to graphviz. |
features_importance | Computes the model features importance using the Gini Index. |
fit | Trains the model. |
get_attr | Returns the model attribute. |
get_params | Returns the model Parameters. |
get_tree | Returns a tablesample with all the input tree information. |
plot | Draws the Model. |
plot_tree | Draws the input tree. The module anytree must be installed in the machine. |
predict | Predicts using the input relation. |
regression_report | Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error...). |
score | Computes the model score. |
set_cursor | Sets a new DB cursor. |
set_params | Sets the parameters of the model. |
shapExplainer | Creates a shapExplainer for the model. |
to_sklearn | Converts the Vertica model to an sklearn model. |
Example¶
In [3]:
from verticapy.learn.ensemble import RandomForestRegressor
model = RandomForestRegressor(name = "public.rf_winequality",
n_estimators = 20,
max_features = "auto",
max_leaf_nodes = 32,
sample = 0.7,
max_depth = 3,
min_samples_leaf = 5,
min_info_gain = 0.0,
nbins = 32)
display(model)