verticapy.machine_learning.memmodel.naive_bayes.NaiveBayes#

class verticapy.machine_learning.memmodel.naive_bayes.NaiveBayes(attributes: list[dict], prior: list | ndarray, classes: list | ndarray)#

InMemoryModel implementation of the NaiveBayes algorithm.

Parameters#

attributes: list

List of the model’s attributes. Each feature must be represented by a dictionary, which differs based on the distribution.

For ‘gaussian’:

Key “type” must have ‘gaussian’ as value. Each of the model’s classes must include a dictionary with two keys:

sigma_sq:: Square root of the standard deviation.
mu:: Average.

Example:

{
    'type': 'gaussian',
    'C': {
        'mu': 63.9878308300395,
        'sigma_sq': 7281.87598377196
    },
    'Q': {
        'mu': 13.0217386792453,
        'sigma_sq': 211.626862330204
    },
    'S': {
        'mu': 27.6928120412844,
        'sigma_sq': 1428.57067393938,
    },
}

For ‘multinomial’:
Key “type” must have ‘multinomial’ as value. Each of the model’s classes must be represented by a key with its probability as the value.

Example:
{ 'type': 'multinomial', 'C': 0.771666666666667, 'Q': 0.910714285714286, 'S': 0.878216123499142, }
For ‘bernoulli’:
Key “type” must have ‘bernoulli’ as value. Each of the model’s classes must be represented by a key with its probability as the value.

Example:
{ 'type': 'bernoulli', 'C': 0.537254901960784, 'Q': 0.277777777777778, 'S': 0.324942791762014, }

For ‘categorical’:

Key “type” must have ‘categorical’ as value. Each of the model’s classes must include a dictionary with all the feature categories.

Example:

{
    'type': 'categorical',
    'C': {
        'female': 0.407843137254902,
        'male': 0.592156862745098
    },
    'Q': {
        'female': 0.416666666666667,
        'male': 0.583333333333333,
    },
    'S': {
        'female': 0.311212814645309,
        'male': 0.688787185354691,
    },
}

prior: ArrayLike: The model’s classes probabilities.
classes: ArrayLike: The model’s classes.

Note

memmodel() are defined entirely by their attributes. For example, prior probabilities, classes and input feature attributes specific to the type of distribution, defines a NaiveBayes model.

Attributes#

Attributes are identical to the input parameters, followed by an underscore (‘_’).

Examples#

Initalization

Import the required module.

from verticapy.machine_learning.memmodel.naive_bayes import NaiveBayes

Here we will be using attributes of model trained on well known titanic dataset.

It tries to predict the port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton), using age (continous), pclass (discrete), survived (boolean) and sex (categorical) as input features.

Let’s define attributes representing: each input feature:

attributes = [
    {
        "type": "gaussian",
        "C": {"mu": 63.9878308300395, "sigma_sq": 7281.87598377196},
        "Q": {"mu": 13.0217386792453, "sigma_sq": 211.626862330204},
        "S": {"mu": 27.6928120412844, "sigma_sq": 1428.57067393938},
    },
    {
        "type": "multinomial",
        "C": 0.771666666666667,
        "Q": 0.910714285714286,
        "S": 0.878216123499142,
    },
    {
        "type": "bernoulli",
        "C": 0.771666666666667,
        "Q": 0.910714285714286,
        "S": 0.878216123499142,
    },
    {
        "type": "categorical",
        "C": {
            "female": 0.407843137254902,
            "male": 0.592156862745098,
        },
        "Q": {
            "female": 0.416666666666667,
            "male": 0.583333333333333,
        },
        "S": {
            "female": 0.406666666666667,
            "male": 0.593333333333333,
        },
    },
]

We also need to provide class names and their prior probabilities.

prior = [0.8, 0.1, 0.1]

classes = ["C", "Q", "S"]

Let’s create a NaiveBayes model.

model_nb = NaiveBayes(attributes, prior, classes)

Create a dataset.

data = [
    [40.0, 1, True, "male"],
    [60.0, 3, True, "male"],
    [15.0, 2, False, "female"],
]

Making In-Memory Predictions

Use predict() method to do predictions.

model_nb.predict(data)
Out[7]: array(['C', 'C', 'Q'], dtype='<U1')

Use predict_proba() method to calculate the predicted probabilities for each class.

model_nb.predict_proba(data)
Out[8]: 
array([[0.64564673, 0.12105224, 0.23330103],
       [0.74783083, 0.00570541, 0.24646376],
       [0.34471925, 0.49592024, 0.15936051]])

Deploy SQL Code

Let’s use the following column names:

cnames = ["age", "pclass", "survived", "sex"]

Use predict_sql() method to get the SQL code needed to deploy the model using its attributes.

model_nb.predict_sql(cnames)
Out[10]: "CASE WHEN age IS NULL OR pclass IS NULL OR survived IS NULL OR sex IS NULL THEN NULL WHEN 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1 >= 0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 AND 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1 >= 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 THEN 'S' WHEN 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 >= 0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 THEN 'Q' ELSE 'C' END"

Use predict_proba_sql() method to get the SQL code needed to deploy the model that computes predicted probabilities.

model_nb.predict_proba_sql(cnames)
Out[11]: 
["(0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8) / (0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 + 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 + 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1)",
 "(0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1) / (0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 + 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 + 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1)",
 "(0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1) / (0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 + 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 + 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1)"]

Hint

This object can be pickled and used in any in-memory environment, just like SKLEARN models.

__init__(attributes: list[dict], prior: list | ndarray, classes: list | ndarray) → None#

Methods

`__init__`(attributes, prior, classes)
`get_attributes`()	Returns the model attributes.
`predict`(X)	Predicts using the input matrix.
`predict_proba`(X)	Computes the model's probabilites using the input matrix.
`predict_proba_sql`(X)	Returns the SQL code needed to deploy the model probabilities using its attributes.
`predict_sql`(X)	Returns the SQL code needed to deploy the model.
`set_attributes`(**kwargs)	Sets the model attributes.

Attributes

object_type

Must be overridden in child class