Building an SVM for Regression Model

This SVM for regression example uses a small data set named faithful, based on the Old Faithful geyser in Yellowstone National Park. The data set contains values about the waiting time between eruptions and the duration of eruptions of the geyser. The example shows how you can build a model to predict the value of eruptions, given the value of the waiting feature.

Before you begin the example, make sure that you have loaded the Machine Learning sample data.

  1. Create the SVM model, named svm_faithful, using the faithful_training training data:

    => SELECT SVM_REGRESSOR('svm_faithful', 'faithful_training', 'eruptions', 'waiting'
                          USING PARAMETERS error_tolerance=0.1, max_iterations=100);
            SVM_REGRESSOR
    ---------------------------
     Finished in 5 iterations
    
    Accepted Rows: 162   Rejected Rows: 0
    (1 row)
    
  2. View the summary output of svm_faithful:

    => SELECT SUMMARIZE_MODEL('svm_faithful');
    
    ------------------------------------------------------------------
    ===========================
    Predictors and Coefficients
    ===========================
             |Coefficients
    ---------+------------
    Intercept|  -1.59007
    waiting  |   0.07217
    =====
    Notes
    =====
    Call string:
    SELECT svm_regressor('svm_faithful', 'faithful_training', '"eruptions"', 'waiting'
    USING PARAMETERS error_tolerance = 0.1, C=1, max_iterations=100, epsilon=0.001);
    Number of rows accepted: 162
    Number of rows rejected: 0
    Training finished in 5 iterations.
    (1 row)
  3. Create a new table that contains the response values from running the PREDICT_SVM_REGRESSOR function on your test data. Name this table pred_faithful_results:

    => CREATE TABLE pred_faithful AS
           (SELECT id, eruptions, PREDICT_SVM_REGRESSOR(waiting USING PARAMETERS model_name='svm_faithful') 
            AS pred FROM faithful_testing);
    CREATE TABLE
    
  4. View the results in the pred_faithful_results table:

    => SELECT * FROM pred_faithful ORDER BY id;
     id  | eruptions |       pred
    -----+-----------+------------------
       4 |     2.283 | 2.88444568755189
       5 |     4.533 | 4.54434581879796
       8 |       3.6 | 4.54434581879796
       9 |      1.95 | 2.09058040739072
      11 |     1.833 | 2.30708912016195
      12 |     3.917 | 4.47217624787422
      14 |      1.75 | 1.80190212369576
      20 |      4.25 | 4.11132839325551
      22 |      1.75 | 1.80190212369576
    .
    .
    .
    (110 rows)
    

Calculating the Mean Squared Error (MSE)

Another way that you can calculate how well your model fits the data is by using the MSE function. MSE returns the average of the squared differences between actual value and predicted values.

=> SELECT MSE(obs::float, prediction::float) OVER()
   FROM (SELECT eruptions AS obs, pred AS prediction
         FROM pred_faithful) AS prediction_output;
        mse        |                   Comments
-------------------+-----------------------------------------------
 0.254499811834235 | Of 110 rows, 110 were used and 0 were ignored
(1 row)

See Also