Classifying Data Using Naive Bayes
This Naive Bayes example uses the HouseVotes84 data set to show you how to build a model. With this model, you can predict which party the member of the United States Congress is affiliated based on their voting record. To aid in classifying the data it has been cleaned, and any missed votes have been replaced. The cleaned data replaces missed votes with the voter's party majority vote. For example, suppose a member of the Democrats had a missing value for vote1 and majority of the Democrats voted in favor. This example replaces all missing Democrats' votes for vote1 with a vote in favor.
In this example, approximately 75% of the cleaned HouseVotes84 data is randomly selected and copied to a training table. The remaining cleaned HouseVotes84 data is used as a testing table.
Before you begin the example, make sure that you have loaded the Machine Learning sample data.
- Create the Naive Bayes model, named
naive_house84_model
, using thehouse84_train
training data. - View the summary output of
naive_house84_model
. - Create a new table, named
predicted_party_naive
. Populate this table with the prediction outputs you obtain from the PREDICT_NAIVE_BAYES function on your test data. - Calculate the accuracy of the model's predictions.
=> SELECT NAIVE_BAYES('naive_house84_model', 'house84_train', 'party', '*' USING PARAMETERS exclude_columns='party, id'); NAIVE_BAYES ------------------------------------------------ Finished. Accepted Rows: 315 Rejected Rows: 0 (1 row)
=> SELECT SUMMARIZE_MODEL('naive_house84_model'); SUMMARIZE_MODEL ------------------------------------------------------------ ============================= Classes and Prior Probability ============================= |democrat|republican -----+--------+---------- Prior|0.59816 | 0.40184 =============== List of Columns =============== |Column Index|Model Type ------+------------+----------- party | 0 | LABEL vote1 | 1 |Categorical vote2 | 2 |Categorical vote3 | 3 |Categorical vote4 | 4 |Categorical vote5 | 5 |Categorical vote6 | 6 |Categorical vote7 | 7 |Categorical vote8 | 8 |Categorical vote9 | 9 |Categorical vote10| 10 |Categorical vote11| 11 |Categorical vote12| 12 |Categorical vote13| 13 |Categorical vote14| 14 |Categorical vote15| 15 |Categorical vote16| 16 |Categorical =============================== Columns with Categorical Models =============================== Conditional Probabilities for Column vote1 (1) |democrat|republican -+--------+---------- n|0.40102 | 0.80451 y|0.59898 | 0.19549 Conditional Probabilities for Column vote2 (2) |democrat|republican -+--------+---------- n|0.44670 | 0.36090 y|0.55330 | 0.63910 . . . (1 row)
=> CREATE TABLE predicted_party_naive AS SELECT party, PREDICT_NAIVE_BAYES (vote1, vote2, vote3, vote4, vote5, vote6, vote7, vote8, vote9, vote10, vote11, vote12, vote13, vote14, vote15, vote16 USING PARAMETERS model_name = 'naive_house84_model', type = 'response') AS Predicted_Party FROM house84_test; CREATE TABLE
=> SELECT (Predictions.Num_Correct_Predictions / Count.Total_Count) AS Percent_Accuracy FROM ( SELECT COUNT(Predicted_Party) AS Num_Correct_Predictions FROM predicted_party_naive WHERE party = Predicted_Party ) AS Predictions, ( SELECT COUNT(party) AS Total_Count FROM predicted_party_naive ) AS Count; Percent_Accuracy ---------------------- 0.933333333333333333 (1 row)
The model correctly predicted the party of the members of Congress based on their voting patterns with 93% accuracy.
Viewing the Probability of Each Class
You can also view the probability of each class. Use PREDICT_NAIVE_BAYES_CLASSES to see the probability of each class.
=> SELECT PREDICT_NAIVE_BAYES_CLASSES (id, vote1, vote2, vote3, vote4, vote5, vote6, vote7, vote8, vote9, vote10, vote11, vote12, vote13, vote14, vote15, vote16 USING PARAMETERS model_name = 'naive_house84_model', key_columns = 'id', exclude_columns = 'id', classes = 'democrat, republican') OVER() FROM house84_test; id | Predicted | Probability | democrat | republican -----+------------+-------------------+----------------------+---------------------- 368 | democrat | 1 | 1 | 0 372 | democrat | 1 | 1 | 0 374 | democrat | 1 | 1 | 0 378 | republican | 0.999999962214987 | 3.77850125111219e-08 | 0.999999962214987 384 | democrat | 1 | 1 | 0 387 | democrat | 1 | 1 | 0 406 | republican | 0.999999945980143 | 5.40198564592332e-08 | 0.999999945980143 419 | democrat | 1 | 1 | 0 421 | republican | 0.922808855631005 | 0.0771911443689949 | 0.922808855631005 . . . (109 rows)