XGBoostClassifier¶
In [ ]:
XGBoostClassifier(name: str,
max_ntree: int = 10,
max_depth: int = 5,
nbins: int = 32,
split_proposal_method: str = "global",
tol: float = 0.001,
learning_rate: float = 0.1,
min_split_loss: float = 0.0,
weight_reg: float = 0.0,
sample: float = 1.0,
col_sample_by_tree: float = 1.0,
col_sample_by_node: float = 1.0,)
Creates an XGBoostClassifier object using the Vertica XGB_CLASSIFIER algorithm.
Parameters¶
| Name | Type | Optional | Description |
|---|---|---|---|
name | str | ❌ | Name of the model to be stored in the database. |
max_ntree | int | ✓ | Maximum number of trees that will be created. |
max_depth | int | ✓ | Maximum depth of each tree. |
nbins | int | ✓ | Number of bins to use for finding splits in each column, more splits leads to longer runtime but more fine-grained and possibly better splits. |
split_proposal_method | str | ✓ | approximate splitting strategy. Can be 'global' or 'local' (not yet supported). |
tol | float | ✓ | approximation error of quantile summary structures used in the approximate split finding method. |
learning_rate | float | ✓ | weight applied to each tree's prediction, reduces each tree's impact allowing for later trees to contribute, keeping earlier trees from 'hogging' all the improvements. |
min_split_loss | float | ✓ | Each split must improve the objective function value of the model by at least this much in order to not be pruned. Value of 0 is the same as turning off this parameter (trees will still be pruned based on positive/negative objective function values). |
weight_reg | float | ✓ | Regularization term that is applied to the weights of the leaves in the regression tree. The higher this value is, the more sparse/smooth the weights will be, which often helps prevent overfitting. |
sample | float | ✓ | Fraction of rows to use in training per iteration. |
col_sample_by_tree | float | ✓ | Float in the range (0,1] that specifies the fraction of columns (features), chosen at random, to use when building each tree. |
col_sample_by_node | float | ✓ | Float in the range (0,1] that specifies the fraction of columns (features), chosen at random, to use when evaluating each split. |
Attributes¶
After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:
| Name | Type | Description |
|---|---|---|
classes_ | list | List of all the response classes. |
input_relation | str | Training relation. |
X | list | List of the predictors. |
y | str | Response column. |
test_relation | str | Relation to use to test the model. All model methods are abstractions that simplify the process. The testing relation will be used by the methods to evaluate the model. If empty, the training relation will be used instead. This attribute can be changed at any time. |
prior_ | float / list | The value(s) used to score the model. For binary classification, this is the average of the response column. Otherwise, this is the list of the probabilities of each class. |
Methods¶
| Name | Description |
|---|---|
| classification_report / report | Computes a classification report using multiple metrics to evaluate the model (AUC, accuracy, PRC AUC, F1...). In case of multiclass classification, it will consider each category as positive and switch to the next one during the computation. |
| confusion_matrix | Computes the model confusion matrix. |
| contour | Draws the model's contour plot. |
| cutoff_curve | Draws the model's Cutoff curve. |
| deploySQL | Returns the SQL code needed to deploy the model. |
| drop | Drops the model from the Vertica DB. |
| features_importance | Computes the model features importance using the Gini Index. |
| fit | Trains the model. |
| get_attr | Returns the model attribute. |
| get_params | Returns the model Parameters. |
| get_tree | Returns a tablesample with all the input tree information. |
| lift_chart | Draws the model Lift Chart. |
| plot_tree | Draws the input tree (requires the graphviz module). |
| prc_curve | Draws the model PRC curve. |
| predict | Predicts using the input relation. |
| predict_proba | Returns the model's probabilities using the input relation. |
| roc_curve | Draws the model ROC curve. |
| score | Computes the model score. |
| set_params | Sets the parameters of the model. |
| to_graphviz | Converts the input tree to a Graphviz tree. |
| to_json | Creates a Python XGBoost JSON file that can be imported into the Python XGBoost API. |
| to_memmodel | Converts a specified Vertica model to a memModel model. |
| to_python | Returns the Python code needed to deploy the model without using built-in Vertica functions. |
| to_sql | Returns the SQL code needed to deploy the model without using Vertica built-in functions. |
Example¶
In [9]:
from verticapy.learn.ensemble import XGBoostClassifier
model = XGBoostClassifier(name = "public.xgb_titanic",
max_ntree = 20,
max_depth = 15,)
display(model)
