
verticapy.machine_learning.memmodel.tree.BinaryTreeAnomaly¶
- class verticapy.machine_learning.memmodel.tree.BinaryTreeAnomaly(children_left: Annotated[list | ndarray, 'Array Like Structure'], children_right: Annotated[list | ndarray, 'Array Like Structure'], feature: Annotated[list | ndarray, 'Array Like Structure'], threshold: Annotated[list | ndarray, 'Array Like Structure'], value: Annotated[list | ndarray, 'Array Like Structure'], psy: int = 1)¶
InMemoryModel
implementation of binary trees for anomaly detection.Parameters¶
- children_left: ArrayLike
A list of node IDs, where
children_left[i]
is the node id of the left child of node i.- children_right: ArrayLike
A list of node IDs, where
children_right[i]
is the node id of the right child of node i.- feature: ArrayLike
A list of features, where
feature[i]
is the feature to split on for the internal node i.- threshold: ArrayLike
A list of thresholds, where
threshold[i]
is thethreshold
for the internal node i.- value: ArrayLike
Contains the constant prediction value of each node. If used for classification and
return_proba
is set toTrue
, each element of thelist
must be a sublist with the probabilities of each class.- psy: int, optional
Sampling size used to compute the final score.
Attributes¶
Attributes are identical to the input parameters, followed by an underscore (‘_’).
Examples¶
Initalization
Import the required module.
from verticapy.machine_learning.memmodel.tree import BinaryTreeAnomaly
A BinaryTreeAnomaly model is defined by its left and right child node id’s,
feature
andthreshold
value to split a node. Final values at leaf nodes are also required. Let’s create aBinaryTreeAnomaly
model:from verticapy.machine_learning.memmodel.tree import BinaryTreeAnomaly # Different Attributes children_left = [1, 3, None, None, None] children_right = [2, 4, None, None, None] feature = [0, 1, None, None, None] threshold = ["female", 30, None, None, None] value = [None, None, [2, 10], [3, 4], [7, 8]] # Building the Model model_bta = BinaryTreeAnomaly( children_left = children_left, children_right = children_right, feature = feature, threshold = threshold, value = value, psy = 100, )
Important
The parameter
psy
corresponds to the sampling size used to compute the final score. This parameter is needed to compute the final score. A wrong parameter can lead to a wrong computation.Note
For
BinaryTreeAnomaly
, the parametervalue
represent the number of points classified as outliers and those that are not. Leaves are then alist
of two elements.Create a dataset.
data = [["male", 100], ["female", 20], ["female", 50]]
Making In-Memory Predictions
Use
predict()
method to do predictions.model_bta.predict(data) Out[10]: array([0.68728112, 0.58001754, 1.23092129])
Deploy SQL Code
Let’s use the following column names:
cnames = ["sex", "fare"]
Use
predict_sql()
method to get the SQL code needed to deploy the model using its attributes.model_bta.predict_sql(cnames) Out[12]: "(CASE WHEN sex = 'female' THEN (CASE WHEN fare < 30 THEN 0.5800175392069298 ELSE 1.2309212867903394 END) ELSE 0.6872811212546747 END)"
Hint
This object can be pickled and used in any in-memory environment, just like SKLEARN models.
Drawing Tree
Use
to_graphviz()
method to generate code for a Graphviz tree.model_bta.to_graphviz() Out[13]: 'digraph Tree {\ngraph [bgcolor="#FFFFFFDD"];\n0 [label="X0", shape="box", style="filled", fillcolor="#FFFFFFDD", fontcolor="#000000", color="#000000"]\n0 -> 1 [label="= female", color="#000000", fontcolor="#000000"]\n0 -> 2 [label="!= female", color="#000000", fontcolor="#000000"]\n1 [label="X1", shape="box", style="filled", fillcolor="#FFFFFFDD", fontcolor="#000000", color="#000000"]\n1 -> 3 [label="<= 30", color="#000000", fontcolor="#000000"]\n1 -> 4 [label="> 30", color="#000000", fontcolor="#000000"]\n2 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#FFFFFFDD"><FONT color="#000000"><b>leaf</b></FONT></td></tr><tr><td port="port0" border="1" align="left">leaf_path_length: 2 </td></tr><tr><td port="port1" border="1" align="left"> training_row_count: 10 </td></tr><tr><td port="port2" border="1" align="left" bgcolor="#ffc1c1"><FONT color="#111111"> anomaly_score: 0.6210231175271977 </FONT> </td></tr></table>>, fillcolor="#FFFFFFDD", fontcolor="#000000", shape="none", color="#000000"]\n3 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#FFFFFFDD"><FONT color="#000000"><b>leaf</b></FONT></td></tr><tr><td port="port0" border="1" align="left">leaf_path_length: 3 </td></tr><tr><td port="port1" border="1" align="left"> training_row_count: 4 </td></tr><tr><td port="port2" border="1" align="left" bgcolor="#ffa8a8"><FONT color="#111111"> anomaly_score: 0.6689556446813831 </FONT> </td></tr></table>>, fillcolor="#FFFFFFDD", fontcolor="#000000", shape="none", color="#000000"]\n4 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#FFFFFFDD"><FONT color="#000000"><b>leaf</b></FONT></td></tr><tr><td port="port0" border="1" align="left">leaf_path_length: 7 </td></tr><tr><td port="port1" border="1" align="left"> training_row_count: 8 </td></tr><tr><td port="port2" border="1" align="left" bgcolor="#ffffff"><FONT color="#111111"> anomaly_score: 0.42604529186600343 </FONT> </td></tr></table>>, fillcolor="#FFFFFFDD", fontcolor="#000000", shape="none", color="#000000"]\n}'
Use
plot_tree()
method to draw the input tree.model_bta.plot_tree()
Important
plot_tree()
requires the Graphviz module.Note
The above example is a very basic one. For other more detailed examples and customization options, please see :ref:`chart_gallery.tree`_
- __init__(children_left: Annotated[list | ndarray, 'Array Like Structure'], children_right: Annotated[list | ndarray, 'Array Like Structure'], feature: Annotated[list | ndarray, 'Array Like Structure'], threshold: Annotated[list | ndarray, 'Array Like Structure'], value: Annotated[list | ndarray, 'Array Like Structure'], psy: int = 1) None ¶
Methods
__init__
(children_left, children_right, ...)Returns the model attributes.
plot_tree
([pic_path])Draws the input tree.
predict
(X)Predicts using the
BinaryTree
model.Returns the model probabilities.
Returns the SQL code needed to deploy the model probabilities.
predict_sql
(X)Returns the SQL code needed to deploy the model.
set_attributes
(**kwargs)Sets the model attributes.
to_graphviz
([feature_names, classes_color, ...])Returns the code for a Graphviz tree.
Attributes
Must be overridden in child class