XGBoostClassifier

In [ ]:
XGBoostClassifier(name: str,
                  max_ntree: int = 10,
                  max_depth: int = 5,
                  nbins: int = 32,
                  split_proposal_method: str = "global",
                  tol: float = 0.001,
                  learning_rate: float = 0.1,
                  min_split_loss: float = 0.0,
                  weight_reg: float = 0.0,
                  sample: float = 1.0,
                  col_sample_by_tree: float = 1.0,
                  col_sample_by_node: float = 1.0,)

Creates an XGBoostClassifier object using the Vertica XGB_CLASSIFIER algorithm.

Parameters

Name Type Optional Description
name
str
Name of the model to be stored in the database.
max_ntree
int
Maximum number of trees that will be created.
max_depth
int
Maximum depth of each tree.
nbins
int
Number of bins to use for finding splits in each column, more splits leads to longer runtime but more fine-grained and possibly better splits.
split_proposal_method
str
approximate splitting strategy. Can be 'global' or 'local' (not yet supported).
tol
float
approximation error of quantile summary structures used in the approximate split finding method.
learning_rate
float
weight applied to each tree's prediction, reduces each tree's impact allowing for later trees to contribute, keeping earlier trees from 'hogging' all the improvements.
min_split_loss
float
Each split must improve the objective function value of the model by at least this much in order to not be pruned. Value of 0 is the same as turning off this parameter (trees will still be pruned based on positive/negative objective function values).
weight_reg
float
Regularization term that is applied to the weights of the leaves in the regression tree. The higher this value is, the more sparse/smooth the weights will be, which often helps prevent overfitting.
sample
float
Fraction of rows to use in training per iteration.
col_sample_by_tree
float
Float in the range (0,1] that specifies the fraction of columns (features), chosen at random, to use when building each tree.
col_sample_by_node
float
Float in the range (0,1] that specifies the fraction of columns (features), chosen at random, to use when evaluating each split.

Attributes

After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:

Name Type Description
classes_
list
List of all the response classes.
input_relation
str
Training relation.
X
list
List of the predictors.
y
str
Response column.
test_relation
str
Relation to use to test the model. All model methods are abstractions that simplify the process. The testing relation will be used by the methods to evaluate the model. If empty, the training relation will be used instead. This attribute can be changed at any time.
prior_
float / list
The value(s) used to score the model. For binary classification, this is the average of the response column. Otherwise, this is the list of the probabilities of each class.

Methods

Name Description
classification_report / report Computes a classification report using multiple metrics to evaluate the model (AUC, accuracy, PRC AUC, F1...). In case of multiclass classification, it will consider each category as positive and switch to the next one during the computation.
confusion_matrix Computes the model confusion matrix.
contour Draws the model's contour plot.
cutoff_curve Draws the model's Cutoff curve.
deploySQL Returns the SQL code needed to deploy the model.
drop Drops the model from the Vertica DB.
features_importance Computes the model features importance using the Gini Index.
fit Trains the model.
get_attr Returns the model attribute.
get_params Returns the model Parameters.
get_tree Returns a tablesample with all the input tree information.
lift_chart Draws the model Lift Chart.
plot_tree Draws the input tree (requires the graphviz module).
prc_curve Draws the model PRC curve.
predict Predicts using the input relation.
predict_proba Returns the model's probabilities using the input relation.
roc_curve Draws the model ROC curve.
score Computes the model score.
set_params Sets the parameters of the model.
to_graphviz Converts the input tree to a Graphviz tree.
to_json Creates a Python XGBoost JSON file that can be imported into the Python XGBoost API.
to_memmodel Converts a specified Vertica model to a memModel model.
to_python Returns the Python code needed to deploy the model without using built-in Vertica functions.
to_sql Returns the SQL code needed to deploy the model without using Vertica built-in functions.

Example

In [9]:
from verticapy.learn.ensemble import XGBoostClassifier
model = XGBoostClassifier(name = "public.xgb_titanic",
                          max_ntree = 20,
                          max_depth = 15,)
display(model)
<XGBoostClassifier>