Importing and Predicting With PMML Models

As a Vertica user, you can train ML models in other platforms, convert them to standard PMML format, and then import them into Vertica for in-database prediction on data stored in Vertica relations.

Here is an example of how to import a PMML model trained in Spark. The following diagram shows the workflow of the example.

You can use the IMPORT_MODELS function in a simple statement to import the PMML model. The imported model then appears in a system table named V_CATALOG.MODELS which lists the archived ML models in Vertica.

=> -- importing the PMML model trained and generated in Spark
=> SELECT import_models('/data/username/temp/spark_logistic_reg' USING PARAMETERS category='PMML');
 import_models 
---------------
 Success
(1 row)

=> -- Looking at the models table=> SELECT model_name, schema_name, category, model_type, create_time, size FROM models;
     model_name     | schema_name | category |      model_type       |          create_time          | size 
--------------------+-------------+----------+-----------------------+-------------------------------+------
 spark_logistic_reg | public      | PMML     | PMML_REGRESSION_MODEL | 2020-07-28 00:12:29.389709-04 | 5831
(1 row)

You can look at the summary of the model using GET_MODEL_SUMMARY function.

=> \t
Showing only tuples.
=> SELECT get_model_summary(USING PARAMETERS model_name='spark_logistic_reg');
 
=============
function_name
=============
classification

===========
data_fields
===========
 name  |dataType|  optype   
-------+--------+-----------
field_0| double |continuous 
field_1| double |continuous 
field_2| double |continuous 
field_3| double |continuous 
field_4| double |continuous 
field_5| double |continuous 
field_6| double |continuous 
field_7| double |continuous 
field_8| double |continuous 
target | string |categorical


==========
predictors
==========
 name  |exponent|coefficient
-------+--------+-----------
field_0|   1    | -0.23318  
field_1|   1    |  0.73623  
field_2|   1    |  0.29964  
field_3|   1    |  0.12809  
field_4|   1    | -0.66857  
field_5|   1    |  0.51675  
field_6|   1    | -0.41026  
field_7|   1    |  0.30829  
field_8|   1    | -0.17788  


===============
Additional Info
===============
    Name     | Value  
-------------+--------
is_supervised|   1    
  intercept  |-1.20173

You can also retrieve the model's attributes using the GET_MODEL_ATTRIBUTE function.

=> \t
Tuples only is off.
=> -- The list of the attributes of the PMML model
=> SELECT get_model_attribute(USING PARAMETERS model_name='spark_logistic_reg');
   attr_name   |         attr_fields         | #_of_rows 
---------------+-----------------------------+-----------
 is_supervised | is_supervised               |         1
 function_name | function_name               |         1
 data_fields   | name, dataType, optype      |        10
 intercept     | intercept                   |         1
 predictors    | name, exponent, coefficient |         9
(5 rows)

=> -- The coefficients of the PMML model
=> SELECT get_model_attribute(USING PARAMETERS model_name='spark_logistic_reg', attr_name='predictors');
  name   | exponent |    coefficient     
---------+----------+--------------------
 field_0 |        1 |   -0.2331769167607
 field_1 |        1 |  0.736227459496199
 field_2 |        1 |   0.29963728232024
 field_3 |        1 |  0.128085369856188
 field_4 |        1 | -0.668573096260048
 field_5 |        1 |  0.516750679584637
 field_6 |        1 |  -0.41025989394959
 field_7 |        1 |  0.308289533913736
 field_8 |        1 | -0.177878773139411
(9 rows)

You can then use the PREDICT_PMML function to apply the imported model on a relation for in-database prediction. The internal parameters of the model can be matched to the column names of the input relation by their names or their listed position. Direct input values can also be fed to the function as displayed below.

=> -- Using the imported PMML model for scoring direct input values
=> SELECT predict_pmml(1.5,0.5,2,1,0.75,4.2,3.1,0.9,1.1
username(>                     USING PARAMETERS model_name='spark_logistic_reg', match_by_pos=true);
 predict_pmml 
--------------
 1
(1 row)

=> -- Using the imported PMML model for scoring samples in a table
=> SELECT predict_pmml(* USING PARAMETERS model_name='spark_logistic_reg') AS prediction
=>     FROM test_data;
 prediction 
------------
 1
 0
(2 rows)

See Also