Using Java UDX in Vertica
IntroductionVertica has a highly extensible UDx framework, which allows external user-defined functions, parsers and data loaders to be installed onto the Vertica server. This means that a routine written in C++, R, Java or Python can be run in-database as a Vertica SQL function.
This blog is based on the existing documentation and examples, which are provided from the Vertica documentation and onGitHub.
• The Java UDX features are documented in the “Extending Vertica” Guide – https://my.vertica.com/docs/latest/HTML/index.htm#Authoring/ExtendingVertica/ExtendingVertica.htm
• Examples are available at https://github.com/vertica/UDx-Examples
• More information about those examples is available also in the Extending Vertica Guide – https://my.vertica.com/docs/latest/HTML/index.htm#Authoring/ExtendingVertica/UDx/UDxExampleCodeGitHub.htm
The purpose of this document is to tie together these available sources.
The document was tested against the latest available version of Vertica (version v9.1.0-1 at the time of writing), running on CentOS 7.4.1708.
Step 1. Download and Install VerticaIf you are unfamiliar with Vertica, or you don’t have the physical hardware on which to run Vertica, test it for free by downloading Community Edition onto a prebuilt Virtual Machine – https://my.vertica.com/download/vertica/community-edition/
The page contains 2 relevant sections:
– Download – download the virtual machine of your choice. This guide was written using the “VMWare VMX file format” download.
– Resources – Follow the instructions contained in the resource links to complete the installation process.
This guide was developed using the Vertica Community Edition 9.1.0 which is preinstalled on the Virtual Machine (vertica_community_edition-9.1.0-0_vmx.zip). In order to use the VM, you will also need to have a suitable VM player (eg https://www.vmware.com/go/downloadplayer )
Step 2. Start the Virtual MachineThe CE VM contains a preinstalled version of Vertica. It already has a running database, called “Vmart”. If you are prompted for a password, it is password
Start the Virtual Machine.
You will have 2 available logins:
• root – this user will be used for system administration. Apart from step 3, this account will not be used.
• dbadmin – This user is responsible for the Vertica administration. It is the main user account we will use herein.
Step 3. Install the Java Development KitThe JDK needs to be installed on the Vertica server machine in order to compile the provided Java functions.
Login as the root user and install the Java Development Kit.
On Centos / Redhat, run
yum install java-devel
Step 4. Download the Java UDX Examples from GitHubDownload the UDX zip from GitHub- https://github.com/vertica/UDx-Examples
Login as user dbadmin
Transfer the downloaded files onto your Vertica server.
Copy the zipfile “UDx-Examples-master.zip” to the Vertica server machine.
Open a Terminal command prompt and unzip the file.
Go to the Java folder
Step 5. Link Vertica to the Java executable areaLogin as user dbadmin
Define the Vertica JavaBinaryForUDx config parameter
eg Find where is the java executable
Make a note of the result. In this case it is /usr/bin/java
Run the Vertica SQL command to change the setting
vsql -w password -c "ALTER DATABASE VMart SET JavaBinaryForUDx = '/usr/bin/java';"
Step 6. Edit the makefile to link to the Java Executable areaLogin as user dbadmin
Edit the makefile and change the setting for the JAVA_HOME variable.
eg whereis java
Make a note of the result. In this case it is “/usr/bin/java”. The root directory will be “/usr”
vi makefileChange the value of JAVA_HOME:
JAVA_HOME ?= $(SOURCE)/../third-party/jdk/jdk1.6.0_45with
JAVA_HOME ?= /usrSave the file and exit the file editor.
Step 7. Compile the Java code samples and create associated Vertica User Defined FunctionsLogin as user dbadmin
In order to compile the Java functions, run the “make” command with the relevant parameter.
Step 8. Test the Java User Defined FunctionsLogin as user dbadmin
Run the Java UDX functions by executing the provided script. The script will:
• Create the necessary tables.
• Populate the tables with sample data.
• Run the example functions.
• Perform necessary housekeeping.
vsql –w password -f JavaFunctions.sql
Step 9. Test the Java User Defined Parser and Filter FunctionsJava functions can be embedded into Vertica’s SQL COPY command, which means that user defined transformations can be included during the data load process.
To run the provided examples, login as user dbadmin
Run the provided script
vsql –w password -f JavaUDLFunctions.sql
Be patient. One of the steps generates a 10m row data set which takes most of the time. The next step loads the dataset into Vertica, which takes only a little time!