vDataFrame.chaid¶
In [ ]:
vDataFrame.chaid(response: str,
columns: list,
nbins: int = 4,
method: str = "same_width",
RFmodel_params: dict = {},)
Returns a CHAID (Chi-square Automatic Interaction Detector) tree. CHAID is a decision tree technique based on adjusted significance testing (Bonferroni test).
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
response | str | ❌ | Categorical response vColumn. |
columns | list | ❌ | List of the vColumn names. The maximum number of categories for each categorical columns is 16. Categorical columns with a higher cardinality are discarded. |
nbins | int | ✓ | Integer in the range [2,16], the number of bins used to discretize the numerical features. |
method | str | ✓ | The method to use to discretize the numerical vColumns.
|
RFmodel_params | dict | ✓ | Dictionary of the parameters of the random forest model used to compute the best splits when 'method' is 'smart'. If the response column is numerical (but not of type int or bool), this function trains and uses a random forest regressor. Otherwise, this function trains a random forest classifier. For example, to train a random forest with 20 trees and a maximum depth of 10, use: {"n_estimators": 20, "max_depth": 10}. |
Returns¶
memModel : An independent model containing the result. For more information, see learn.memmodel.
Example¶
In [4]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
display(titanic)
In [ ]:
tree = titanic.chaid(response = "survived",
columns = ["sex", "pclass"])
tree.plot_tree()
See Also¶
vDataFrame.hexbin | Draws the Hexbin plot of two vColumns based on an aggregation. |
vDataFrame.iv_woe | Computes the Information Value (IV) table. |
vDataFrame.pivot | Returns the pivot of the vDataFrame using the input aggregation. |
vDataFrame.pivot_table | Draws the pivot table of one or two columns based on an aggregation. |
vDataFrame.pivot_table_chi2 | Returns the chi-square term using the pivot table of the response vColumn against the input vColumns. |