|Share this article:|
End-To-End-Analytics with Vertica and Alteryx Designer: Technical Exploration
About this Document
The goal of this document is to create a predictive model in Alteryx Designer and import and run this model in Vertica. Since Alteryx Designer does not support some of Vertica's machine learning functions, the model was exported as PMML and then imported into Vertica. We executed this model in Vertica with test data. There was a significant difference in the processing time when we executed it in Vertica as compared to running the model in Alteryx. Vertica processed the model much faster. This model is tested using Alteryx Designer 2021.1.4.26400 in Windows 2016 with Pre-11 Vertica version.
Alteryx Designer Overview
Alteryx Designer is an analytical tool that enables data preparation, data blending, and analytics using an intuitive user interface. Its drag and drop interface allows you to create predictive models and report and share analytics using repeatable workflows. Alteryx Designer is available for Windows as a 64-bit application and uses ODBC to connect to your Vertica database.
Before You Begin
Following is a flow chart that illustrates the high level tasks that you need to perform to create a predictive model in Alteryx, and import and run the model in Vertica.
- Alteryx Designer is installed on Windows or Linux environment.
- Vertica client is installed and configured for Vertica database connections for source.
- Predictive tool set and Output Model tool is installed in Alteryx Designer.
To install the Predictive Tools set and Output Model tool, see Creating and Exporting a Predictive Model from Alteryx Designer as PMML in this document.
Sample Reference Predictive Workflows in Alteryx
For information about Predictive model designs and tools in Alteryx Designer
- Open Alteryx Designer.
Click Help > Sample Workflows > Predictive tool Samples > Predictive Analytics and select the required option.
For more information, refer to the Predictive Modeling video in the Alteryx Community page.
For information about Predictive Analysis, see the documentation in the Alteryx website.
Following are the steps to create and export the predictive model as PMML:
If you have not downloaded the Predictive tools set, open Alteryx Designer and click Options > Download Predictive Tools.
Download the Output Model tool from the Alteryx Gallery.
You need the Output Model tool to export the predictive model as a PMML file.
- After you download Output Model, it needs to be configured as an Alteryx Macro.
From the Alteryx Designer menu, click Options > User Settings > Edit User Settings.
In the User Settings window, click Macros.
Click the + sign, type the Category Name and browse to the directory where you downloaded the Output Model tool.
Click OK and then click Save.
The Output Model tool appears in the In/Out tools category.
Create an Alteryx workflow to design the predictive model based on your requirements. Click the Browse tool connected to the R port of the Regression tool and use the R-Squared coefficient to measure the model accuracy.
After you are satisfied with the model accuracy, drag and drop the Output Model tool (depicted as R) into the Alteryx Designer canvas and connect the ‘O’ port (Output port) of the Regression Tool (Linear Regression in the following sample workflow) to it.
- In the Output Model - Configuration pane, specify the PMML file name, click Save as a PMML file and specify the PMML file name and location.
Click Run to run the Alteryx workflow. This generates and exports the PMML file to the specified directory.
Loading and Executing the Predictive Model in Vertica
Following are the steps to load and execute the PMML file in Vertica:
Use WinSCP, FileZilla, or any other FTP tool to keep the PMML file in one of the Vertica cluster nodes under the same directory name as that of the .pmml file. Here, the PMML file name is Vertica_Demo.pmml which is in the Vertica_Demo directory.
To import the model into Vertica, run the Import_Models command by logging into the Vertica database:
SELECT IMPORT_MODELS( 'source' USING PARAMETERS new_schema='schema-name' , category='model-category' )
Run the PREDICT_PMML command to score the results. Ensure to have test data available in the Vertica database table to perform scoring.
SELECT PREDICT_PMML(Predictor_Columns USING PARAMETERS model_name='schema.Model_Name') AS predicted_label FROM Table_Data;
The following are two scenarios we tested and compared:
- This guide documents the predictive model we created in Alteryx Designer, tested it with sample data, and then scored it in Alteryx. It was then exported as a PMML file and imported into Vertica.
- We also tested this by creating a predictive model in Vertica itself and scoring this model in Vertica.
There was a significant time difference between the two methods and scoring the model in Vertica took only a few minutes for terabytes of data as compared to scoring the model in Alteryx Designer. Moreover, creating and scoring the model in Vertica, reduces the effort of PMML export/import and manual editing required if the Vertica server version is below 11.
Importing PMMLs to Vertica has the following limitations:
- The PMML files that you are importing in Vertica should be in a directory with the same name as that of the file.
- PMML version 4.4 is supported in Vertica 10.1.1 and later. If you are using an older version of Vertica, you need to manually edit the PMML file to change the version from 4.4 to 4.3.
Only selected PMML tags are supported in Vertica 10.1.1 and earlier. For more information, refer to Supported and Unsupported PMML Features and Attributes.