verticapy.machine_learning.memmodel.naive_bayes.NaiveBayes#
- class verticapy.machine_learning.memmodel.naive_bayes.NaiveBayes(attributes: list[dict], prior: list | ndarray, classes: list | ndarray)#
InMemoryModel
implementation of theNaiveBayes
algorithm.Parameters#
- attributes: list
List of the model’s attributes. Each feature must be represented by a
dictionary
, which differs based on the distribution.- For ‘gaussian’:
Key “type” must have ‘gaussian’ as value. Each of the model’s classes must include a
dictionary
with two keys:- sigma_sq:
Square root of the standard deviation.
- mu:
Average.
Example:
{ 'type': 'gaussian', 'C': { 'mu': 63.9878308300395, 'sigma_sq': 7281.87598377196 }, 'Q': { 'mu': 13.0217386792453, 'sigma_sq': 211.626862330204 }, 'S': { 'mu': 27.6928120412844, 'sigma_sq': 1428.57067393938, }, }
- For ‘multinomial’:
Key “type” must have ‘multinomial’ as value. Each of the model’s classes must be represented by a key with its probability as the value.
Example:
{ 'type': 'multinomial', 'C': 0.771666666666667, 'Q': 0.910714285714286, 'S': 0.878216123499142, }
- For ‘bernoulli’:
Key “type” must have ‘bernoulli’ as value. Each of the model’s classes must be represented by a key with its probability as the value.
Example:
{ 'type': 'bernoulli', 'C': 0.537254901960784, 'Q': 0.277777777777778, 'S': 0.324942791762014, }
- For ‘categorical’:
Key “type” must have ‘categorical’ as value. Each of the model’s classes must include a
dictionary
with all the feature categories.Example:
{ 'type': 'categorical', 'C': { 'female': 0.407843137254902, 'male': 0.592156862745098 }, 'Q': { 'female': 0.416666666666667, 'male': 0.583333333333333, }, 'S': { 'female': 0.311212814645309, 'male': 0.688787185354691, }, }
- prior: ArrayLike
The model’s classes probabilities.
- classes: ArrayLike
The model’s classes.
Note
memmodel()
are defined entirely by their attributes. For example,prior
probabilities,classes
and input featureattributes
specific to the type of distribution, defines aNaiveBayes
model.Attributes#
Attributes are identical to the input parameters, followed by an underscore (‘_’).
Examples#
Initalization
Import the required module.
from verticapy.machine_learning.memmodel.naive_bayes import NaiveBayes
Here we will be using attributes of model trained on well known titanic dataset.
It tries to predict the port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton), using age (continous), pclass (discrete), survived (boolean) and sex (categorical) as input features.
- Let’s define attributes representing
each input feature:
attributes = [ { "type": "gaussian", "C": {"mu": 63.9878308300395, "sigma_sq": 7281.87598377196}, "Q": {"mu": 13.0217386792453, "sigma_sq": 211.626862330204}, "S": {"mu": 27.6928120412844, "sigma_sq": 1428.57067393938}, }, { "type": "multinomial", "C": 0.771666666666667, "Q": 0.910714285714286, "S": 0.878216123499142, }, { "type": "bernoulli", "C": 0.771666666666667, "Q": 0.910714285714286, "S": 0.878216123499142, }, { "type": "categorical", "C": { "female": 0.407843137254902, "male": 0.592156862745098, }, "Q": { "female": 0.416666666666667, "male": 0.583333333333333, }, "S": { "female": 0.406666666666667, "male": 0.593333333333333, }, }, ]
We also need to provide class names and their prior probabilities.
prior = [0.8, 0.1, 0.1] classes = ["C", "Q", "S"]
Let’s create a
NaiveBayes
model.model_nb = NaiveBayes(attributes, prior, classes)
Create a dataset.
data = [ [40.0, 1, True, "male"], [60.0, 3, True, "male"], [15.0, 2, False, "female"], ]
Making In-Memory Predictions
Use
predict()
method to do predictions.model_nb.predict(data) Out[7]: array(['C', 'C', 'Q'], dtype='<U1')
Use
predict_proba()
method to calculate the predicted probabilities for each class.model_nb.predict_proba(data) Out[8]: array([[0.64564673, 0.12105224, 0.23330103], [0.74783083, 0.00570541, 0.24646376], [0.34471925, 0.49592024, 0.15936051]])
Deploy SQL Code
Let’s use the following column names:
cnames = ["age", "pclass", "survived", "sex"]
Use
predict_sql()
method to get the SQL code needed to deploy the model using its attributes.model_nb.predict_sql(cnames) Out[10]: "CASE WHEN age IS NULL OR pclass IS NULL OR survived IS NULL OR sex IS NULL THEN NULL WHEN 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1 >= 0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 AND 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1 >= 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 THEN 'S' WHEN 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 >= 0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 THEN 'Q' ELSE 'C' END"
Use
predict_proba_sql()
method to get the SQL code needed to deploy the model that computes predicted probabilities.model_nb.predict_proba_sql(cnames) Out[11]: ["(0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8) / (0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 + 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 + 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1)", "(0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1) / (0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 + 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 + 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1)", "(0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1) / (0.004675073323276673 * EXP(- POWER(age - 63.9878308300395, 2) / 14563.75196754392) * POWER(0.771666666666667, pclass) * (CASE WHEN survived THEN 0.771666666666667 ELSE 0.22833333333333306 END) * DECODE(sex, 'female', 0.407843137254902, 'male', 0.592156862745098) * 0.8 + 0.027423612860412977 * EXP(- POWER(age - 13.0217386792453, 2) / 423.253724660408) * POWER(0.910714285714286, pclass) * (CASE WHEN survived THEN 0.910714285714286 ELSE 0.08928571428571397 END) * DECODE(sex, 'female', 0.416666666666667, 'male', 0.583333333333333) * 0.1 + 0.010555023401917874 * EXP(- POWER(age - 27.6928120412844, 2) / 2857.14134787876) * POWER(0.878216123499142, pclass) * (CASE WHEN survived THEN 0.878216123499142 ELSE 0.12178387650085798 END) * DECODE(sex, 'female', 0.406666666666667, 'male', 0.593333333333333) * 0.1)"]
Hint
This object can be pickled and used in any in-memory environment, just like SKLEARN models.
- __init__(attributes: list[dict], prior: list | ndarray, classes: list | ndarray) None #
Methods
__init__
(attributes, prior, classes)Returns the model attributes.
predict
(X)Predicts using the input matrix.
Computes the model's probabilites using the input matrix.
Returns the SQL code needed to deploy the model probabilities using its attributes.
predict_sql
(X)Returns the SQL code needed to deploy the model.
set_attributes
(**kwargs)Sets the model attributes.
Attributes
Must be overridden in child class