XGBoostRegressor

In [ ]:
XGBoostRegressor(name: str,
                 cursor=None,
                 max_ntree: int = 10,
                 max_depth: int = 5,
                 nbins: int = 32,
                 objective: str = 'squarederror',
                 split_proposal_method: str = 'global',
                 tol: float = 0.001,
                 learning_rate: float = 0.1,
                 min_split_loss: float = 0,
                 weight_reg: float = 0,
                 sample: float = 1,)

Creates a XGBoostRegressor object by using the Vertica XGBoost algorithm on the data.

Parameters

Name Type Optional Description
name
str
Name of the model to be stored in the database.
cursor
DBcursor
Vertica DB cursor.
max_ntree
int
Maximum number of trees that will be created.
max_depth
int
Maximum depth of each tree.
nbins
int
Number of bins to use for finding splits in each column, more splits leads to longer runtime but more fine-grained and possibly better splits.
objective
str
The objective/loss function that will be used to iteratively improve the model.
max_depth
int
The maximum depth for growing each tree, an integer between 1 and 100, inclusive.
split_proposal_method
str
approximate splitting strategy. Can be 'global' or 'local' (not yet supported).
tol
float
approximation error of quantile summary structures used in the approximate split finding method.
learning_rate
float
weight applied to each tree's prediction, reduces each tree's impact allowing for later trees to contribute, keeping earlier trees from 'hogging' all the improvements.
min_split_loss
float
Each split must improve the objective function value of the model by at least this much in order to not be pruned. Value of 0 is the same as turning off this parameter (trees will still be pruned based on positive/negative objective function values).
weight_reg
float
Regularization term that is applied to the weights of the leaves in the regression tree. The higher this value is, the more sparse/smooth the weights will be, which often helps prevent overfitting.
sample
float
Fraction of rows to use in training per iteration.

Attributes

After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:

Name Type Description
input_relation
str
Training relation.
X
list
List of the predictors.
y
str
Response column.
test_relation
str
Relation to use to test the model. All model methods are abstractions that simplify the process. The testing relation will be used by the methods to evaluate the model. If empty, the training relation will be used instead. This attribute can be changed at any time.

Methods

Name Description
deploySQL Returns the SQL code needed to deploy the model.
drop Drops the model from the Vertica DB.
export_graphviz Converts the input tree to graphviz.
features_importance Computes the model features importance using the Gini Index.
fit Trains the model.
get_attr Returns the model attribute.
get_params Returns the model Parameters.
get_tree Returns a tablesample with all the input tree information.
plot Draws the Model.
plot_tree Draws the input tree. The module anytree must be installed in the machine.
predict Predicts using the input relation.
regression_report Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error...).
score Computes the model score.
set_cursor Sets a new DB cursor.
set_params Sets the parameters of the model.
shapExplainer Creates a shapExplainer for the model.
to_sklearn Converts the Vertica model to an sklearn model.

Example

In [10]:
from verticapy.learn.ensemble import XGBoostRegressor
model = XGBoostRegressor(name = "public.xgb_titanic",
                         max_ntree = 20,
                         max_depth = 15,)
display(model)
<XGBoostRegressor>