memModel

In [ ]:
memModel(model_type: str,
         attributes: dict,)

Creates platform-independent machine learning models that you can export as SQL or Python code for deployment in other environments.

Parameters

Name Type Optional Description
model_type
str
The model type, one of the following: 'BinaryTreeClassifier', 'BinaryTreeRegressor', 'BisectingKMeans', 'CHAID', 'KMeans', 'LinearSVC', 'LinearSVR', 'LinearRegression', 'LogisticRegression', 'NaiveBayes', 'NearestCentroid', 'Normalizer', 'OneHotEncoder', 'PCA', 'RandomForestClassifier', 'RandomForestRegressor', 'SVD', 'XGBoostClassifier', 'XGBoostRegressor'.
attributes
dict
Dictionary which includes all the model's attributes.
  • For BisectingKMeans:
    {"clusters": List of the model's cluster centers.
    "left_child": List of the model's left children IDs.
    "right_child": List of the model's right children IDs.
    "p": The p corresponding to the one of the p-distances.}
  • For BinaryTreeClassifier, BinaryTreeRegressor:
    {"children_left": A list of node IDs, where children_left[i] is the node id of the left A list of node IDs, where child of node i.
    "children_right": children_right[i] is the node id of the right child of node i.
    "feature": A list of features, where feature[i] is the feature to split on, for the internal node i.
    "threshold": threshold[i] is the threshold for the internal node i.
    "value": Contains the constant prediction value of each node.
    "classes": [Only for Classifier] The classes for the binary tree model.}
  • For CHAID:
    {"tree": CHAID tree. This tree can be generated using the vDataFrame.chaid method.
    "classes": The classes for the CHAID model.}
  • For KMeans:
    {"clusters": List of the model's cluster centers.
    "p": The p corresponding to the one of the p-distances.}
  • For LinearSVC, LinearSVR, LinearRegression, LogisticRegression:
    {"coefficients": List of the model's coefficients.
    "intercept": Intercept or constant value.}
  • For NaiveBayes:
    {"classes": The classes for the naive bayes model.
    "prior": The model probabilities of each class.
    "attributes": List of the model's attributes. Each feature is represented by a dictionary, the contents of which differs for each distribution type.
    • For 'gaussian': Key 'type' must have the value 'gaussian'. Each of the model's classes must include a dictionary with two keys:
      sigma_sq: Square root of the standard deviation.
      mu: Average.
      Example: {'type': 'gaussian', 'C': {'mu': 63.9878308300395, 'sigma_sq': 7281.87598377196}, 'Q': {'mu': 13.0217386792453, 'sigma_sq': 211.626862330204}, 'S': {'mu': 27.6928120412844, 'sigma_sq': 1428.57067393938}}
    • For 'multinomial': Key 'type' must have the value 'multinomial'. Each of the model's classes must be represented by a key with its probability as the value.
      Example: {'type': 'multinomial', 'C': 0.771666666666667, 'Q': 0.910714285714286, 'S': 0.878216123499142}
    • For 'bernoulli': Key 'type' must have the value 'bernoulli'. Each of the model's classes must be represented by a key with its probability as the value.
      Example: {'type': 'bernoulli', 'C': 0.537254901960784, 'Q': 0.277777777777778, 'S': 0.324942791762014}
    • For 'categorical': Key 'type' must have the value 'categorical'. Each of the model's classes must include a dictionary with all the feature categories.
      Example: {'type': 'categorical', 'C': {'female': 0.407843137254902, 'male': 0.592156862745098}, 'Q': {'female': 0.416666666666667, 'male': 0.583333333333333}, 'S': {'female': 0.311212814645309, 'male': 0.688787185354691}}}
    }
  • For NearestCentroids:
    {"clusters": List of the model's cluster centers.
    "p": The p corresponding to the one of the p-distances.
    "classes": Represents the classes of the nearest centroids.}
  • For Normalizer:
    {"values": List of tuples including the model's attributes. The required tuple depends on the specified method: 'zscore': (mean, std) ; 'robust_zscore': (median, mad) ; 'minmax': (min, max) ;
    "method": The model's category, one of the following: 'zscore', 'robust_zscore', or 'minmax'.}
  • For OneHotEncoder:
    {"categories": List of the different feature categories.
    "drop_first": Boolean, whether the first category should be dropped.
    "column_naming": Appends categorical levels to column names according to the specified method. It can be set to 'indices' or 'values'.}
  • For PCA:
    {"principal_components": Matrix of the principal components.
    "mean": List of the input predictors average.}
  • For RandomForestClassifier, RandomForestRegressor, XGBoostClassifier, XGBoostRegressor:
    {"trees": list of memModels of type 'BinaryTreeRegressor' or 'BinaryTreeClassifier'
    "learning_rate": [Only for XGBoostClassifier and XGBoostRegressor] Learning rate.
    "mean": [Only for XGBoostRegressor] Average of the response column.
    "logodds": [Only for XGBoostClassifier] List of the logodds of the response classes.}
  • For SVD:
    {"vectors": Matrix of the right singular vectors.
    "values": List of the singular values.}

Attributes

After the object is created, the following parameters become attributes of the model.

Name Type Description
attributes_
dict
Model's attributes.
model_type_
str
Model's type.
represent_
str
Model's summary.

Methods

Name Description
get_attributes Returns model's attributes.
plot_tree Draws the input tree. Requires the graphviz module.
predict Predicts using the model's attributes.
predict_proba Predicts probabilities using the model's attributes.
predict_proba_sql Returns the SQL code needed to deploy the probabilities model.
predict_sql Returns the SQL code needed to deploy the model.
rotate Performs a Oblimin (Varimax, Quartimax) rotation on the the model's PCA matrix.
set_attributes Sets new model's attributes.
to_graphviz Returns the code for a Graphviz tree.
transform Transforms the data using the model's attributes.
transform_sql Returns the SQL code needed to deploy the model.

Example

Building a memModel

memModels are defined entirely by their attributes. For example, the following attributes define a linear regression model:

In [3]:
coefficients = [0.5, 1.2]
intercept = 2.0

To build a linear regression model from its memModel, specify the model type and its attributes:

In [5]:
from verticapy.learn.memmodel import memModel

model = memModel(model_type = "LinearRegression",
                 attributes = {"coefficients": coefficients,
                               "intercept": intercept})
model
Out[5]:
<LinearRegression>

coefficients = [0.5 1.2]

intercept = 2.0

Making predictions

The predict method returns the model's predictions on a given set of data:

In [6]:
model.predict([[1.0,  0.3], 
               [2.0, -0.6]])
Out[6]:
array([2.86, 2.28])

The predict_sql method generates the SQL code for deploying the model in Veritca.

In [7]:
model.predict_sql(["x_1", "x_2"])
Out[7]:
'2.0 + 0.5 * x_1 + 1.2 * x_2'

Computing class probabilities

Multiclass classifiers calculate the probabilities for each class. The BinaryTreeClassifier is one such classifier:

In [8]:
model = memModel("BinaryTreeClassifier", {"children_left": [1, 3, None, None, None], 
                                          "children_right": [2, 4, None, None, None],
                                          "feature": [0, 1, None, None, None],
                                          "threshold": ['female', 30, None, None, None],
                                          "value": [None, None, [0.8, 0.1, 0.1], [0.1, 0.8, 0.1], [0.2, 0.2, 0.6]],
                                          "classes": ['a', 'b', 'c',]})
model.predict_proba([['female', 500.0],
                     ['male', 300.0],
                     ['female', 6.0],])
Out[8]:
array([[0.2, 0.2, 0.6],
       [0.8, 0.1, 0.1],
       [0.1, 0.8, 0.1]])

The predict_proba_sql method generates the SQL code for deploying the BinaryTreeClassifier model:

In [9]:
model.predict_proba_sql(["sex", "fare"])
Out[9]:
["(CASE WHEN sex = 'female' THEN (CASE WHEN fare < '30' THEN 0.1 ELSE 0.2 END) ELSE 0.8 END)",
 "(CASE WHEN sex = 'female' THEN (CASE WHEN fare < '30' THEN 0.8 ELSE 0.2 END) ELSE 0.1 END)",
 "(CASE WHEN sex = 'female' THEN (CASE WHEN fare < '30' THEN 0.1 ELSE 0.6 END) ELSE 0.1 END)"]

Preprocessing new data

Transformer models have the 'transform' method to preprocess new data in-memory:

In [10]:
model = memModel("Normalizer", {"values": [(0.4, 0.5), (0.3, 0.2),],
                                "method": "minmax"})
model.transform([[1.0,  0.3], 
                 [2.0, -0.6]])
Out[10]:
array([[ 6., -0.],
       [16.,  9.]])

To deploy a transformer model with SQL, use 'transform_sql':

In [11]:
model.transform_sql(["x_1", "x_2"])
Out[11]:
['(x_1 - 0.4) / 0.09999999999999998', '(x_2 - 0.3) / -0.09999999999999998']

Special methods

PCA memModels have the special 'rotate' method:

In [25]:
model = memModel("PCA", {"principal_components": [[0.4, 0.5], [0.3, 0.2],],
                         "mean": [0.1, 0.3]})
model.rotate()
model.get_attributes()
Out[25]:
{'mean': array([0.1, 0.3]),
 'principal_components': array([[0.07739604, 0.63561769],
        [0.15004968, 0.32784919]])}

Trees have special methods 'to_graphviz' and 'plot_tree':

In [20]:
model = memModel("BinaryTreeClassifier", {"children_left": [1, 3, None, None, None], 
                                          "children_right": [2, 4, None, None, None],
                                          "feature": [0, 1, None, None, None],
                                          "threshold": ['female', 30, None, None, None],
                                          "value": [None, None, [0.8, 0.1, 0.1], [0.1, 0.8, 0.1], [0.2, 0.2, 0.6]],
                                          "classes": ['a', 'b', 'c',]})
model.to_graphviz()
Out[20]:
'digraph Tree{\n0 [label="X0", shape="box", style="filled"]\n0 -> 1 [label="= female"]\n0 -> 2 [label="!= female"]\n1 [label="X1", shape="box", style="filled"]\n1 -> 3 [label="<= 30"]\n1 -> 4 [label="> 30"]\n2 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#87cefa"><b> prediction: a </b></td></tr><tr><td port="port0" border="1" align="left"> prob(a): 0.8 </td></tr><tr><td port="port1" border="1" align="left"> prob(b): 0.1 </td></tr><tr><td port="port2" border="1" align="left"> prob(c): 0.1 </td></tr></table>>, shape="none"]\n3 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#efc5b5"><b> prediction: b </b></td></tr><tr><td port="port0" border="1" align="left"> prob(a): 0.1 </td></tr><tr><td port="port1" border="1" align="left"> prob(b): 0.8 </td></tr><tr><td port="port2" border="1" align="left"> prob(c): 0.1 </td></tr></table>>, shape="none"]\n4 [label=<<table border="0" cellspacing="0"> <tr><td port="port1" border="1" bgcolor="#d4ede3"><b> prediction: c </b></td></tr><tr><td port="port0" border="1" align="left"> prob(a): 0.2 </td></tr><tr><td port="port1" border="1" align="left"> prob(b): 0.2 </td></tr><tr><td port="port2" border="1" align="left"> prob(c): 0.6 </td></tr></table>>, shape="none"]\n}'
In [ ]:
model.plot_tree()