NAIVE_BAYES
Executes the Naive Bayes algorithm on an input relation. The result is a Naive Bayes model.
Columns are treated according to data type:
- FLOAT: Values are assumed to follow some Gaussian distribution.
- INTEGER: Values are assumed to belong to one multinomial distribution.
- CHAR/VARCHAR: Values are assumed to follow some categorical distribution. The string values stored in these columns must not be greater than 128 characters.
- BOOLEAN: Values are treated as categorical with two values.
Syntax
NAIVE_BAYES ( 'model‑name', 'input‑relation', 'response‑column', 'predictor‑columns' [ USING PARAMETERS[exclude_columns='excluded‑columns'] [, alpha=alpha-value] ] )
Arguments
model‑name |
Identifies the model to create, where model‑name conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. |
input‑relation |
The table or view that contains the training data for building the model. If the input relation is defined in Hive, use |
response‑column |
Name of the input column that represents the dependent variable, or outcome. This column must contain discrete labels that represent different class labels. The response column must be of type numeric, CHAR/VARCHAR, or BOOLEAN; otherwise the model is invalid. Vertica automatically casts numeric response column values to VARCHAR. |
predictor‑columns |
Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter All predictor columns must be of type numeric, CHAR/VARCHAR, or BOOLEAN; otherwise the model is invalid. BOOLEAN column values are converted to FLOAT values before training: 0 for false, 1 for true. |
Parameter Settings
Parameter name | Set to… |
---|---|
exclude_columns
|
Comma-separated list of columns from predictor‑columns to exclude from processing. |
alpha
|
A FLOAT that specifies use of Laplace smoothing if the event model is categorical, multinomial, or Bernoulli. Default: 1.0 |
Model Attributes
Attribute | Description |
---|---|
colsInfo
|
The information from the response and predictor columns used in training:
|
alpha
|
The smooth parameter value. |
prior
|
The percentage of each class among all training samples:
|
nRowsTotal
|
The number of samples accepted for training from the data set. |
nRowsRejected
|
The number of samples rejected for training. |
callStr
|
The SQL statement used to replicate the training. |
Gaussian
|
The Gaussian model conditioned on the class indicated by the class_name:
|
Multinominal
|
The Multinomial model conditioned on the class indicated by the class_name:
|
Bernoulli
|
The Bernoulli model conditioned on the class indicated by the class_name:
|
Categorical
|
The Gaussian model conditioned on the class indicated by the class_name:
|
Privileges
Superuser, or SELECT privileges on the input relation.
Examples
=> SELECT NAIVE_BAYES('naive_house84_model', 'house84_train', 'party', '*' USING PARAMETERS exclude_columns='party, id'); NAIVE_BAYES -------------------------------------------------- Finished. Accepted Rows: 324 Rejected Rows: 0 (1 row)