XGBoostClassifier

In [ ]:
XGBoostClassifier(name: str,
                  cursor=None,
                  max_ntree: int = 10,
                  max_depth: int = 5,
                  nbins: int = 32,
                  objective: str = 'squarederror',
                  split_proposal_method: str = 'global',
                  tol: float = 0.001,
                  learning_rate: float = 0.1,
                  min_split_loss: float = 0,
                  weight_reg: float = 0,
                  sample: float = 1,)

Creates a XGBoostClassifier object by using the Vertica XGBoost algorithm on the data.

Parameters

Name Type Optional Description
name
str
Name of the model to be stored in the database.
cursor
DBcursor
Vertica DB cursor.
max_ntree
int
Maximum number of trees that will be created.
max_depth
int
Maximum depth of each tree.
nbins
int
Number of bins to use for finding splits in each column, more splits leads to longer runtime but more fine-grained and possibly better splits.
objective
str
The objective/loss function that will be used to iteratively improve the model.
max_depth
int
The maximum depth for growing each tree, an integer between 1 and 100, inclusive.
split_proposal_method
str
approximate splitting strategy. Can be 'global' or 'local' (not yet supported).
tol
float
approximation error of quantile summary structures used in the approximate split finding method.
learning_rate
float
weight applied to each tree's prediction, reduces each tree's impact allowing for later trees to contribute, keeping earlier trees from 'hogging' all the improvements.
min_split_loss
float
Each split must improve the objective function value of the model by at least this much in order to not be pruned. Value of 0 is the same as turning off this parameter (trees will still be pruned based on positive/negative objective function values).
weight_reg
float
Regularization term that is applied to the weights of the leaves in the regression tree. The higher this value is, the more sparse/smooth the weights will be, which often helps prevent overfitting.
sample
float
Fraction of rows to use in training per iteration.

Attributes

After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:

Name Type Description
classes_
list
List of all the response classes.
input_relation
str
Training relation.
X
list
List of the predictors.
y
str
Response column.
test_relation
str
Relation to use to test the model. All model methods are abstractions that simplify the process. The testing relation will be used by the methods to evaluate the model. If empty, the training relation will be used instead. This attribute can be changed at any time.

Methods

Name Description
classification_report Computes a classification report using multiple metrics to evaluate the model (AUC, accuracy, PRC AUC, F1...). In case of multiclass classification, it will consider each category as positive and switch to the next one during the computation.
confusion_matrix Computes the model confusion matrix.
cutoff_curve Draws the model's Cutoff curve.
deploySQL Returns the SQL code needed to deploy the model.
drop Drops the model from the Vertica DB.
export_graphviz Converts the input tree to graphviz.
features_importance Computes the model features importance using the Gini Index.
fit Trains the model.
get_attr Returns the model attribute.
get_params Returns the model Parameters.
get_tree Returns a tablesample with all the input tree information.
lift_chart Draws the model Lift Chart.
plot_tree Draws the input tree. The module anytree must be installed in the machine.
prc_curve Draws the model PRC curve.
predict Predicts using the input relation.
roc_curve Draws the model ROC curve.
score Computes the model score.
set_cursor Sets a new DB cursor.
set_params Sets the parameters of the model.
shapExplainer Creates a shapExplainer for the model.
to_sklearn Converts the Vertica model to an sklearn model.

Example

In [9]:
from verticapy.learn.ensemble import XGBoostClassifier
model = XGBoostClassifier(name = "public.xgb_titanic",
                          max_ntree = 20,
                          max_depth = 15,)
display(model)
<XGBoostClassifier>