Loading...

verticapy.machine_learning.memmodel.tree.BinaryTreeAnomaly

class verticapy.machine_learning.memmodel.tree.BinaryTreeAnomaly(children_left: Annotated[list | ndarray, 'Array Like Structure'], children_right: Annotated[list | ndarray, 'Array Like Structure'], feature: Annotated[list | ndarray, 'Array Like Structure'], threshold: Annotated[list | ndarray, 'Array Like Structure'], value: Annotated[list | ndarray, 'Array Like Structure'], psy: int = 1)

InMemoryModel implementation of binary trees for anomaly detection.

Parameters

children_left: ArrayLike

A list of node IDs, where children_left[i] is the node id of the left child of node i.

children_right: ArrayLike

A list of node IDs, where children_right[i] is the node id of the right child of node i.

feature: ArrayLike

A list of features, where feature[i] is the feature to split on for the internal node i.

threshold: ArrayLike

A list of thresholds, where threshold[i] is the threshold for the internal node i.

value: ArrayLike

Contains the constant prediction value of each node. If used for classification and return_proba is set to True, each element of the list must be a sublist with the probabilities of each class.

psy: int, optional

Sampling size used to compute the final score.

Attributes

Attributes are identical to the input parameters, followed by an underscore (‘_’).

Examples

Initalization

Import the required module.

from verticapy.machine_learning.memmodel.tree import BinaryTreeAnomaly

A BinaryTreeAnomaly model is defined by its left and right child node id’s, feature and threshold value to split a node. Final values at leaf nodes are also required. Let’s create a BinaryTreeAnomaly model:

from verticapy.machine_learning.memmodel.tree import BinaryTreeAnomaly

# Different Attributes
children_left = [1, 3, None, None, None]

children_right = [2, 4, None, None, None]

feature = [0, 1, None, None, None]

threshold = ["female", 30, None, None, None]

value = [None, None, [2, 10], [3, 4], [7, 8]]

# Building the Model
model_bta = BinaryTreeAnomaly(
    children_left = children_left,
    children_right = children_right,
    feature = feature,
    threshold = threshold,
    value = value,
    psy = 100,
)

Important

The parameter psy corresponds to the sampling size used to compute the final score. This parameter is needed to compute the final score. A wrong parameter can lead to a wrong computation.

Note

For BinaryTreeAnomaly, the parameter value represent the number of points classified as outliers and those that are not. Leaves are then a list of two elements.

Create a dataset.

data = [["male", 100], ["female", 20], ["female", 50]]

Making In-Memory Predictions

Use predict() method to do predictions.

model_bta.predict(data)
Out[10]: array([0.68728112, 0.58001754, 1.23092129])

Deploy SQL Code

Let’s use the following column names:

cnames = ["sex", "fare"]

Use predict_sql() method to get the SQL code needed to deploy the model using its attributes.

model_bta.predict_sql(cnames)
Out[12]: "(CASE WHEN sex = 'female' THEN (CASE WHEN fare < 30 THEN 0.5800175392069298 ELSE 1.2309212867903394 END) ELSE 0.6872811212546747 END)"

Hint

This object can be pickled and used in any in-memory environment, just like SKLEARN models.

Drawing Tree

Use to_graphviz() method to generate code for a Graphviz tree.

model_bta.to_graphviz()
Out[13]: 'digraph Tree {\ngraph [bgcolor="#FFFFFFDD"];\n0 [label="X0", shape="box", style="filled", fillcolor="#FFFFFFDD", fontcolor="#000000", color="#000000"]\n0 -> 1 [label="= female", color="#000000", fontcolor="#000000"]\n0 -> 2 [label="!= female", color="#000000", fontcolor="#000000"]\n1 [label="X1", shape="box", style="filled", fillcolor="#FFFFFFDD", fontcolor="#000000", color="#000000"]\n1 -> 3 [label="<= 30", color="#000000", fontcolor="#000000"]\n1 -> 4 [label="> 30", color="#000000", fontcolor="#000000"]\n2 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#FFFFFFDD"><FONT color="#000000"><b>leaf</b></FONT></td></tr><tr><td port="port0" border="1" align="left">leaf_path_length: 2 </td></tr><tr><td port="port1" border="1" align="left"> training_row_count: 10 </td></tr><tr><td port="port2" border="1" align="left" bgcolor="#ffc1c1"><FONT color="#111111"> anomaly_score: 0.6210231175271977 </FONT> </td></tr></table>>, fillcolor="#FFFFFFDD", fontcolor="#000000", shape="none", color="#000000"]\n3 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#FFFFFFDD"><FONT color="#000000"><b>leaf</b></FONT></td></tr><tr><td port="port0" border="1" align="left">leaf_path_length: 3 </td></tr><tr><td port="port1" border="1" align="left"> training_row_count: 4 </td></tr><tr><td port="port2" border="1" align="left" bgcolor="#ffa8a8"><FONT color="#111111"> anomaly_score: 0.6689556446813831 </FONT> </td></tr></table>>, fillcolor="#FFFFFFDD", fontcolor="#000000", shape="none", color="#000000"]\n4 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#FFFFFFDD"><FONT color="#000000"><b>leaf</b></FONT></td></tr><tr><td port="port0" border="1" align="left">leaf_path_length: 7 </td></tr><tr><td port="port1" border="1" align="left"> training_row_count: 8 </td></tr><tr><td port="port2" border="1" align="left" bgcolor="#ffffff"><FONT color="#111111"> anomaly_score: 0.42604529186600343 </FONT> </td></tr></table>>, fillcolor="#FFFFFFDD", fontcolor="#000000", shape="none", color="#000000"]\n}'

Use plot_tree() method to draw the input tree.

model_bta.plot_tree()
../_images/machine_learning_memmodel_tree_binarytreeanomaly.png

Important

plot_tree() requires the Graphviz module.

Note

The above example is a very basic one. For other more detailed examples and customization options, please see :ref:`chart_gallery.tree`_

__init__(children_left: Annotated[list | ndarray, 'Array Like Structure'], children_right: Annotated[list | ndarray, 'Array Like Structure'], feature: Annotated[list | ndarray, 'Array Like Structure'], threshold: Annotated[list | ndarray, 'Array Like Structure'], value: Annotated[list | ndarray, 'Array Like Structure'], psy: int = 1) None

Methods

__init__(children_left, children_right, ...)

get_attributes()

Returns the model attributes.

plot_tree([pic_path])

Draws the input tree.

predict(X)

Predicts using the BinaryTree model.

predict_proba(X)

Returns the model probabilities.

predict_proba_sql(X)

Returns the SQL code needed to deploy the model probabilities.

predict_sql(X)

Returns the SQL code needed to deploy the model.

set_attributes(**kwargs)

Sets the model attributes.

to_graphviz([feature_names, classes_color, ...])

Returns the code for a Graphviz tree.

Attributes

object_type

Must be overridden in child class