Integrating FICO® Blaze Advisor® with Vertica

FICO® Blaze Advisor® decision rules management system allows businesses to operationalize business rules and analytic insight quickly. You can deploy decision rules applications built with Blaze Advisor on a number of platforms based on the purpose of the application and the source of the data processed by the rules. When decision rules must be executed with high data volume, bringing the decision engine close to the source of the data results in better performance. Integrating Blaze Advisor with the Vertica demonstrates the benefits of performing the analytics at the data source.

This document describes the technical details of this integration using a Blaze Advisor example that Is deployed as a Java object and a Vertica function. The Blaze Advisor example includes a deployable analytics library that contains a predictive score model and variable transformation functions generated from the FICO® Model Builder.

Note This example shows just one way that Blaze Advisor can inject analytics into decision services. For example, Blaze Advisor also allows you to import Predictive Model Markup Language (PMML) files and to convert SAS data steps into Blaze Advisor Structured Rule Language (SRL).

Vertica provides a way to create a Java library that can be deployed as a JAR file with associated functions. The Vertica function is a Java class implemented using a Vertica SDK. You can configure a Vertica function to invoke a Blaze Advisor decision rules application using data stored in Vertica. Doing so reduces the latency between data retrieval and decision making. This approach brings data and decision rules closer to each other, prepares data with transformation functions, and executes decision rules on the transformed data for the purpose of rapid operationalization or analytic insight.

Integration Example

The example in this document calculates a bank's exposure for a given loan in the event that the customer defaults on the loan.

Workflow

The following graphic illustrates the high-level workflow for this process:

Use Blaze Advisor Quick Deployer to convert the artifacts for the decision rules model into Java stand-alone deployments for a Blaze Advisor project.
Note The Blaze artifacts are discussed in more detail later in this document.
Create an Eclipse project to implement a Vertica function that will embed the FICO model and related ADB file, server configuration file, and related JAR files to perform the calculations
Create an integrated JAR file that can be deployed on your Vertica cluster.
In Vertica, create a library and the Vertica function to deploy the artifacts across your Vertica cluster.
Invoke the Vertica function with a Vertica SELECT statement to perform the decision rule calculations on the data stored in Vertica. Vertica calls the underlying FICO decision rules model to run on the data residing in the Vertica cluster. You don't need to push the large volume of data from Vertica into analytics tools, and you can utilize the MPP power of Vertica for excellent performance.

Exposure at Default (EAD)

This example uses loan data to determine the extent of a bank's exposure for each loan. If the outstanding amount of a loan is higher than a recent valuation of the property, the score is a negative number. The higher the score for a given loan, the less exposure for the bank.

The sample data has the following fields:

Account number: Unique identifier for each account
Recent index valuation: Value of the purchase
Purchase price: Price paid for the purchase
Balance outstanding: Amount outstanding on the loan

The FICO decision rules model calculates Outst_Bal_Perc_Rec_Idx_Val, the outstanding balance as a percentage of the index valuation. The lower this value, the less probability that the loan will default.

For example, one record may have the following data.

Field	Value
Account number	1
Recent index valuation	334912
Purchase price	300000
Outstanding balance	16617

For this data, the exposure score is 4.961693774.

Performing Scoring Calculations

There are two ways to perform scoring calculations on data stored into Vertica:

Read all the data from Vertica, pass the data to the FICO engine, and calculate the score.
Deploy the FICO models on Vertica and calculate the score in Vertica. This way, you don't have to take the time to move data out of Vertica.

Banks have to perform these calculations on a regular basis. The second approach is more efficient and scales well for very large amounts of data.

This rest of this document explains how to implement the second approach. Read on.

Blaze Advisor Deployment Artifacts

A Blaze Advisor Quick Deployer generates the deployment artifacts for a Blaze Advisor project. One of the important artifacts is the server configuration file. It contains information such as the name of the function that will invoke the model. This function is referred to as the entry point.

Another important artifact is the .adb file. You can serialize your Blaze project into an .adb file.

For this example, FICO provided the server configuration file, the .adb file, and the other required JAR files to Vertica to create the scoring example. You can create the adb and server configuration file by executing the generateADB.bat script that ships with the package you downloaded.

Note To learn more about the server configuration file and how to generate an .adb file, see the FICO Blaze documentation that installs with your software.

Creating the Vertica+Blaze Integration JAR File

This example uses the verticablaze Eclipse project to create a JAR file for integrating with Vertica. The specific files needed for the EAD model are highlighted in yellow in the following graphic:

Vertica function (Java class): java
Invocation of a Blaze Advisor function (Java class): java
Plain Old Java Object (POJO) deployment: java
Server configuration file: server
.adb file (compiled Blaze Advisor project): adb

Writing the Vertica Function

The Vertica function that calls the Blaze Advisor SRL function to do the calculations is a Java class file that you create using the Vertica SDK. The following Vertica scalar function is associated with the Java function.

public class score extends ScalarFunction {
 }

This Vertica function executes with data from a Vertica database, invoking the Blaze SRL function on that data, as shown in the following example. The Vertica function uses the server file and the entry function getEADScore_1 specified in the server configuration file.

public void processBlock(ServerInterface srvInterface, BlockReader argReader, BlockWriter resWriter)
throws UdfException, DestroyInvocation {
       do {
             double a = argReader.getDouble(0);
             double b = argReader.getDouble(1);
             double c = argReader.getDouble(2);
 
             double result = -9999.0D;
             EADModelInExternal req = new EADModelInExternal();
 
             try {
                    req.setOutst_Bal_Perc_Rec_Idx_Val(a);
                    req.setPurchase_Price(b);
                    req.setRecent_Indexed_Valuation(c);
                    result = new VerticaEadPdModelExample().invokeBlazeFunction(req, "ModelExecutionService", "ModelExecutionService_ser.server", "getEADScore_1");
             } catch (NdServiceSessionException e) {
                    throw new UdfException(0, e);
             } catch (NdServiceException e) {
                    throw new UdfException(0, e);
             } catch (NdServerException e) {
                    throw new UdfException(0, e);
             }
              // Write the integer output value.
              Writer.setDouble(result);

             // Advance the output BlocKWriter to the next row.
             resWriter.next();
 
             // Continue processing input rows until there are no more.
             } while (argReader.next());           
}

The required parameter EADModelInExternal is a Plain Old Java Object (POJO) needed for the execution of the Blaze Advisor SRL function. The package structure of this POJO should be the same as for the Blaze Advisor project. This POJO defines the functions as follows:

public class EADModelInExternal {
       private double Outst_Bal_Perc_Rec_Idx_Val;
       private double Purchase_Price;
       private double Recent_Indexed_Valuation;
       private double score;
       public double getOutst_Bal_Perc_Rec_Idx_Val() {
             return Outst_Bal_Perc_Rec_Idx_Val;
       }
       public void setOutst_Bal_Perc_Rec_Idx_Val(double outst_Bal_Perc_Rec_Idx_Val) {
             Outst_Bal_Perc_Rec_Idx_Val = outst_Bal_Perc_Rec_Idx_Val;

       }
       public double getPurchase_Price() {
             return Purchase_Price;
       }
       public void setPurchase_Price(double purchase_Price) {
             Purchase_Price = purchase_Price;
       }
       public double getRecent_Indexed_Valuation() {
             return Recent_Indexed_Valuation;
       }
       public void setRecent_Indexed_Valuation(double recent_Indexed_Valuation) {
             Recent_Indexed_Valuation = recent_Indexed_Valuation;
       }
       public double getScore() {
             return score;
       }
       public void setScore(double score) {
             this.score = score;
       }

Creating the JAR File

The download package contains the script InstallJars.bat. This script creates the integrated JAR file VerticaLoanApplication-trunk.jar, which you can deploy in your Vertica environment.

For this example, the Eclipse project is compiled against Java version 1.7 and Vertica SDK v7.2.2-1. You can compile it to match the versions in your Vertica environment.

Deploying the JAR File and Creating Vertica Functions

To deploy the JAR file as a Java library in Vertica and create the Vertica functions, the high-level steps are as follows.

Note The script that is included with this document performs these steps for you.

Deploy the JAR file as a Java library.

=> CREATE LIBRARY ficoEADPD AS 
   '/home/dbadmin/Builds/scripts/jar/VerticaBlazeApplication-1.0.jar' 
   LANGUAGE 'Java';

Create a function for the Java library.

=> CREATE FUNCTION scoreEAD AS LANGUAGE 'Java' 
   NAME 'com.blaze.vertica.EADModelFactory' LIBRARY ficoEADPD;

Test the function on the Vertica server that invokes the Blaze Advisor SRL function.

=> SELECT account_number, purchase_price, Recent_Indexed_Valuation, 
   Outst_Bal_Perc_Rec_Idx_Val, scoreEAD(Outst_Bal_Perc_Rec_Idx_Val, 
   Purchase_Price, Recent_Indexed_Valuation) AS EAD_ESTIMATE 
   FROM EAD_DATASET;

  account_number | purchase_price | Recent_Indexed_Valuation | Outst_Bal_Perc_Rec_Idx_Val |   EAD_ESTIMATE
 ----------------+----------------+--------------------------+----------------------------+-------------------
               1 |         300000 |                   334912 |          4.961693774000000 | -37645.2852262864
               2 |         140000 |                   154571 |         22.924013590000000 | -78958.9842684706
               3 |         165125 |                   161023 |         17.864888020000000 | -65349.9408090219
               4 |         325000 |                   345408 |         31.875252620000000 | 20110.4389424053
               5 |         350000 |                   376018 |         14.990185220000000 | -8507.77831017182
               6 |         260000 |                   257841 |         62.017901640000000 |   74906.823918807

In the preceding output, the data with the highest and lowest EAD estimates are highlighted in yellow.

Sample Script

The sample database script (dbscript.sh) included with your download performs the following tasks:

Creates the library and required functions in Vertica:

=> DROP LIBRARY IF EXISTS ficoEADPD CASCADE;
DROP LIBRARY
 
=> CREATE LIBRARY ficoEADPD AS '/home/dbadmin/Builds/scripts/jar/VerticaBlazeApplication-1.0.jar' LANGUAGE 'Java';
CREATE LIBRARY
 
=> CREATE FUNCTION scoreEAD AS LANGUAGE 'Java' NAME 'com.blaze.vertica.EADModelFactory' LIBRARY ficoEADPD;
CREATE FUNCTION

Creates the required Vertica tables:

=> DROP TABLE IF EXISTS scoringRunMetrics;
DROP TABLE
 
=> DROP TABLE IF EXISTS scoringInput;
DROP TABLE
 
=> DROP TABLE IF EXISTS scoringResults;
DROP TABLE
 
=> DROP TABLE IF EXISTS EAD_DATASET;
DROP TABLE
 
=> DROP TABLE IF EXISTS FICO_EAD_RESULTS;
DROP TABLE
 
=> CREATE TABLE EAD_DATASET (
    RDAR_ID auto_increment,
    Account_Number int,
    Recent_Indexed_Valuation int,
    Balance_Outstanding int,
    Purchase_Price int,
    Purchase_Valuation_Date int,
    Outst_Bal_Perc_Rec_Idx_Val numeric(31,15),
    Property_Type varchar(50),
    Region varchar(50),
    Current_Months_In_Arrears int,
    PD numeric(31,15),
    PD_LRA numeric(31,15),
    EAD numeric(31,15),
    EAD_LRA numeric(31,15),
    LGD numeric(31,15),
    LGD_LRA numeric(31,15),
    EL numeric(31,15),
    EL_LRA numeric(31,15),
    RWA_LRA numeric(31,15),
    RW_LRA numeric(31,15),
    PD_LRA_Floored numeric(31,15),
    LGD_LRA_Floored numeric(31,15),
    EL_Floored numeric(31,15),
    RWA_Floored numeric(31,15),
    RW_Floored numeric(31,15),
    setID int,
    EAD_estimate int,
    RDAR_JOB_ID int
);
CREATE TABLE
 
=> CREATE TABLE FICO_EAD_RESULTS (
    RDAR_JOB_ID int,
    RDAR_ID int,
    EAD_ESTIMATE numeric(31,16),
    SCORE_TS VARCHAR(44) NOT NULL DEFAULT TIMEOFDAY()--TIMESTAMP 'NOW' --CURRENT_TIMESTAMP
);
CREATE TABLE
 
=> CREATE TABLE scoringRunMetrics (
        id AUTO_INCREMENT
        , runType VARCHAR(20)
        , beginDT DATETIME
        , endDt DATETIME
        , totalRunTime INTERVAL
        , numberOfRecords INT
);
CREATE TABLE
 
=> CREATE TABLE scoringInput (
        id AUTO_INCREMENT
        , percentMaxBalance NUMERIC
        , percentDelinquent NUMERIC
        , expectedScore NUMERIC
);
CREATE TABLE
 
=> CREATE TABLE scoringResults (
        id AUTO_INCREMENT
        , runKey VARCHAR(40)
        , sourceRowID INT
        , percentMaxBalance FLOAT
        , percentDelinquent FLOAT
        , expectedScore NUMERIC
        , actualScore FLOAT
        , score_ts VARCHAR(44) NOT NULL DEFAULT TIMEOFDAY()--TIMESTAMP 'NOW' --CURRENT_TIMESTAMP
);
CREATE TABLE

Loads the test data:

=> COPY EAD_DATASET (
        Account_Number,
        Recent_Indexed_Valuation,
        Balance_Outstanding,
        Purchase_Price,
        Purchase_Valuation_Date,
        Outst_Bal_Perc_Rec_Idx_Val,
        Property_Type,
        Region,
        Current_Months_In_Arrears,
        PD,
        PD_LRA,
        EAD,
        EAD_LRA,
        LGD,
        LGD_LRA,
        EL,
        EL_LRA,
        RWA_LRA,
        RW_LRA,
        PD_LRA_Floored,
        LGD_LRA_Floored,
        EL_Floored,
        RWA_Floored,
        RW_Floored,
        setID,
        EAD_estimate
) FROM LOCAL '/home/dbadmin/Builds/scripts/data/EAD_dataset.csv'
        WITH DELIMITER AS ',' SKIP 1;-- skips header row
 Rows Loaded
-------------
       10000
(1 row)

Invokes the Vertica function:

\set scoreRunInputRecordCount ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT COUNT(1) FROM EAD_DATASET;'`''''
--\echo records to score :scoreRunInputRecordCount
 
-- get number of existing records in results table.
\set scoreRunBeginRecordCount ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT COUNT(1) FROM FICO_EAD_RESULTS;'`''''
--\echo get count of existing records in the results table
 
-- get BEGIN time of day.
\set scoreRunBeginTime ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT TIMEOFDAY();'`''''
SELECT :scoreRunBeginTime AS beginTime;
               beginTime
---------------------------------------
  Fri Jun 03 15:24:18.387493 2016 EDT
(1 row)
 
-- execute EAD scoring SQL
=> INSERT INTO FICO_EAD_RESULTS (RDAR_ID, RDAR_JOB_ID, EAD_ESTIMATE) SELECT RDAR_ID, RDAR_JOB_ID, scoreEAD (Outst_Bal_Perc_Rec_Idx_Val, Purchase_Price, Recent_Indexed_Valuation) AS EAD_ESTIMATE FROM EAD_DATASET;
 OUTPUT
--------
  10000
(1 row)
 
commit;
COMMIT
 
-- get END time of day.
\set scoreRunEndTime ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT TIMEOFDAY();'`''''
=> SELECT :scoreRunEndTime AS endTime;
                endTime
---------------------------------------
  Fri Jun 03 15:24:25.867694 2016 EDT
(1 row)

Generates the results:

\set scoreRunEndRecordCount ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT COUNT(1) FROM FICO_EAD_RESULTS;'`''''
 
=> INSERT INTO scoringRunMetrics (runType, beginDT, endDt, totalRunTime, numberOfRecords) VALUES ('EAD', :scoreRunBeginTime::DATETIME, :scoreRunEndTime::DATETIME, (:scoreRunEndTime::DATETIME - :scoreRunBeginTime::DATETIME), ((:scoreRunEndRecordCount::INT) - (:scoreRunBeginRecordCount::INT)));
 OUTPUT
--------
      1
(1 row)
 
COMMIT;
COMMIT
... running results on EAD using /home/dbadmin/Builds/scripts/sql-scripts/run-results-EAD.sql
.......................
=> SELECT beginDT, totalRunTime, numberOfRecords FROM scoringRunMetrics WHERE runType='EAD' ORDER BY beginDT DESC;
           beginDT           |  totalRunTime   | numberOfRecords
-----------------------------+-----------------+-----------------
  2016-06-03 15:24:18.387493 | 00:00:07.480201 |           10000
(1 row)
 
=> SELECT numberOfRecords, AVG(totalRunTime) FROM scoringRunMetrics WHERE runType='EAD' GROUP BY numberOfRecords ORDER BY numberOfRecords DESC;
 numberOfRecords |       AVG
-----------------+-----------------
           10000 | 00:00:07.480201
(1 row)

Looking at the numbers in scoringRunMetrics can help you understand the time taken to calculate the scores on your input data set. In this example, Vertica took about 7 seconds to process 10000 records.

For More Information

For More Information About…	… See
FICO Blaze Advisor	http://www.fico.com/en/products/fico-blaze-advisor-decision-rules-management-system
Vertica Community	https://vertica.com/community/
Vertica Documentation	http://vertica.com/docs/latest/HTML/index.htm
Big Data and Analytics Community	https://vertica.com/big-data-analytics-community-content/