Building a Logistic Regression Model

This logistic regression example uses a small data set named mtcars. The example shows you how to build a model to predict the value of am (whether the car has an automatic or a manual transmission). It uses the given values of all the other features in the data set.

In this example, roughly 60% of the data is used as training data to create a model. The remaining 40% of the data is used as testing data against which you can test your logistic regression model.

Before you begin the example, make sure that you have loaded the Machine Learning sample data.

  1. Create the logistic regression model, named logistic_reg_mtcars, using the mtcars_train training data.

    => SELECT LOGISTIC_REG('logistic_reg_mtcars', 'mtcars_train', 'am', 'cyl, wt' 
                             USING PARAMETERS exclude_columns='hp');
            LOGISTIC_REG
    ----------------------------
     Finished in 15 iterations
    
    (1 row)
  2. View the summary output of logistic_reg_mtcars.

    => SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='logistic_reg_mtcars');
    --------------------------------------------------------------------------------
    =======
    details
    =======
    predictor|coefficient|  std_err  |z_value |p_value
    ---------+-----------+-----------+--------+--------
    Intercept| 262.39898 |44745.77338| 0.00586| 0.99532
    cyl      | 16.75892  |5987.23236 | 0.00280| 0.99777
    wt       |-119.92116 |17237.03154|-0.00696| 0.99445
    
    ==============
    regularization
    ==============
    type| lambda
    ----+--------
    none| 1.00000
    
    ===========
    call_string
    ===========
    logistic_reg('public.logistic_reg_mtcars', 'mtcars_train', '"am"', 'cyl, wt'
    USING PARAMETERS exclude_columns='hp', optimizer='newton', epsilon=1e-06, 
    max_iterations=100, regularization='none', lambda=1)
    
    
    ===============
    Additional Info
    ===============
    Name              |Value
    ------------------+-----
    iteration_count   | 20
    rejected_row_count|  0
    accepted_row_count| 20
    (1 row)
  3. Create a new table, named mtcars_predict_results. Populate this table with the prediction outputs you obtain from running the PREDICT_LOGISTIC_REG function on your test data. View the results in the mtcars_predict_results table.

    => CREATE TABLE mtcars_predict_results AS
       (SELECT car_model, am, PREDICT_LOGISTIC_REG(cyl, wt
                                                USING PARAMETERS model_name='logistic_reg_mtcars') 
                                                AS Prediction FROM mtcars_test);
    CREATE TABLE
    
    
    => SELECT * FROM mtcars_predict_results;
       car_model    | am | Prediction
    ----------------+----+------------
     AMC Javelin    |  0 |          0
     Hornet 4 Drive |  0 |          0
     Maserati Bora  |  1 |          0
     Merc 280       |  0 |          0
     Merc 450SL     |  0 |          0
     Toyota Corona  |  0 |          1
     Volvo 142E     |  1 |          1
     Camaro Z28     |  0 |          0
     Datsun 710     |  1 |          1
     Honda Civic    |  1 |          1
     Porsche 914-2  |  1 |          1
     Valiant        |  0 |          0
    (12 rows)
  4. Evaluate the accuracy of the PREDICT_LOGISTIC_REG function, using the CONFUSION_MATRIX evaluation function.

    => SELECT CONFUSION_MATRIX(obs::int, pred::int USING PARAMETERS num_classes=2) OVER()
            FROM (SELECT am AS obs, Prediction AS pred FROM mtcars_predict_results) AS prediction_output;
     class | 0 | 1 |                   comment
    -------+---+---+---------------------------------------------
         0 | 6 | 1 |
         1 | 1 | 4 | Of 12 rows, 12 were used and 0 were ignored
    (2 rows)

    In this case, PREDICT_LOGISTIC_REG correctly predicted that four out of five cars with a value of 1 in the am column have a value of 1. Out of the seven cars which had a value of 0 in the am column, six were correctly predicted to have the value 0. One car was incorrectly classified as having the value 1.

See Also