Building a Linear Regression Model

This linear regression example uses a small data set named faithful. The example shows how you can build a model to predict the value of eruptions, given the value of the waiting feature.

Before you begin the example, make sure that you have loaded the Machine Learning sample data.

  1. Create the linear regression model, named linear_reg_faithful, using the faithful_training training data:

    => SELECT LINEAR_REG('linear_reg_faithful', 'faithful_training', 'eruptions', 'waiting'
                          USING PARAMETERS optimizer='BFGS');
            LINEAR_REG
    ---------------------------
     Finished in 6 iterations
    
    (1 row)
    
  2. View the summary output of linear_reg_faithful:

    => SELECT SUMMARIZE_MODEL('linear_reg_faithful');
    -[ RECORD 1 ]---+-----------------------------------------------------------------------------------------------------------
    SUMMARIZE_MODEL | coeff names : {Intercept, waiting}
    coeffecients: {-2.067947819, 0.07875927383}
    std_err:      {0.21063, 0.0029203}
    t_value:      {-9.8178, 26.969}
    p_value:      {4.2157e-18, 2.1094e-61}
    Number of iterations: 6, Number of skipped samples: 0, Number of processed samples: 162
    Call: linearReg(model_name=linear_reg_faithful, input_table=faithful_training, response_column=eruptions, 
    predictor_columns=waiting,
    exclude_columns=, optimizer=bfgs, epsilon=0.0001, max_iterations=50)
  3. Create a new table that contains the response values from running the PREDICT_LINEAR_REG function on your test data. Name this table pred_faithful_results:

    => CREATE TABLE pred_faithful_results AS
           (SELECT id, eruptions, PREDICT_LINEAR_REG(waiting USING PARAMETERS model_name='linear_reg_faithful') 
            AS pred FROM faithful_testing);
    CREATE TABLE
    
  4. View the results in the pred_faithful_results table:

    => SELECT * FROM pred_faithful_results ORDER BY id;
     id  | eruptions |       pred
    -----+-----------+------------------
       4 |     2.283 |  2.8151271587036
       5 |     4.533 | 4.62659045686076
       8 |       3.6 | 4.62659045686076
       9 |      1.95 | 1.94877514654148
      11 |     1.833 | 2.18505296804024
      12 |     3.917 | 4.54783118302784
      14 |      1.75 |  1.6337380512098
      20 |      4.25 | 4.15403481386324
      22 |      1.75 |  1.6337380512098
    .
    .
    .
    (110 rows)
    

Calculating the Mean Squared Error (MSE)

Another way that you can calculate how well your model fits the data is by using the MSE function. MSE returns the average of the squared differences between actual value and predicted values.

=> SELECT MSE (eruptions::float, pred::float) OVER() FROM 
(SELECT eruptions, pred FROM pred_faithful_results) AS prediction_output;
        mse        |                   Comments
-------------------+-----------------------------------------------
 0.252925741352641 | Of 110 rows, 110 were used and 0 were ignored
(1 row)

See Also