Using Java UDX in Vertica

Posted July 9, 2018 by Soniya Shah, Information Developer

Database Server Room
Michael Flower authored this post.

Introduction

Vertica has a highly extensible UDx framework, which allows external user-defined functions, parsers and data loaders to be installed onto the Vertica server. This means that a routine written in C++, R, Java or Python can be run in-database as a Vertica SQL function. This blog is based on the existing documentation and examples, which are provided from the Vertica documentation and onGitHub. • The Java UDX features are documented in the “Extending Vertica” Guide – https://my.vertica.com/docs/latest/HTML/index.htm#Authoring/ExtendingVertica/ExtendingVertica.htm • Examples are available at https://github.com/vertica/UDx-Examples • More information about those examples is available also in the Extending Vertica Guide – https://my.vertica.com/docs/latest/HTML/index.htm#Authoring/ExtendingVertica/UDx/UDxExampleCodeGitHub.htm The purpose of this document is to tie together these available sources. The document was tested against the latest available version of Vertica (version v9.1.0-1 at the time of writing), running on CentOS 7.4.1708.

Step 1. Download and Install Vertica

If you are unfamiliar with Vertica, or you don’t have the physical hardware on which to run Vertica, test it for free by downloading Community Edition onto a prebuilt Virtual Machine – https://my.vertica.com/download/vertica/community-edition/ The page contains 2 relevant sections: – Download – download the virtual machine of your choice. This guide was written using the “VMWare VMX file format” download. – Resources – Follow the instructions contained in the resource links to complete the installation process. This guide was developed using the Vertica Community Edition 9.1.0 which is preinstalled on the Virtual Machine (vertica_community_edition-9.1.0-0_vmx.zip). In order to use the VM, you will also need to have a suitable VM player (eg https://www.vmware.com/go/downloadplayer )

Step 2. Start the Virtual Machine

The CE VM contains a preinstalled version of Vertica. It already has a running database, called “Vmart”. If you are prompted for a password, it is password Start the Virtual Machine. You will have 2 available logins: • root – this user will be used for system administration. Apart from step 3, this account will not be used. • dbadmin – This user is responsible for the Vertica administration. It is the main user account we will use herein.

Step 3. Install the Java Development Kit

The JDK needs to be installed on the Vertica server machine in order to compile the provided Java functions. Login as the root user and install the Java Development Kit. On Centos / Redhat, run yum install java-devel

Step 4. Download the Java UDX Examples from GitHub

Download the UDX zip from GitHub- https://github.com/vertica/UDx-Examples Login as user dbadmin Transfer the downloaded files onto your Vertica server. Copy the zipfile “UDx-Examples-master.zip” to the Vertica server machine. Open a Terminal command prompt and unzip the file. unzip UDx-Examples-master.zip snip Go to the Java folder cd /home/dbadmin/UDx-Examples-master/Java-and-C++

Step 5. Link Vertica to the Java executable area

Login as user dbadmin Define the Vertica JavaBinaryForUDx config parameter eg Find where is the java executable whereis java Make a note of the result. In this case it is /usr/bin/java Run the Vertica SQL command to change the setting vsql -w password -c "ALTER DATABASE VMart SET JavaBinaryForUDx = '/usr/bin/java';"

Step 6. Edit the makefile to link to the Java Executable area

Login as user dbadmin Edit the makefile and change the setting for the JAVA_HOME variable. eg whereis java Make a note of the result. In this case it is “/usr/bin/java”. The root directory will be “/usr” vi makefile Change the value of JAVA_HOME: replace JAVA_HOME ?= $(SOURCE)/../third-party/jdk/jdk1.6.0_45 with JAVA_HOME ?= /usr Save the file and exit the file editor.

Step 7. Compile the Java code samples and create associated Vertica User Defined Functions

Login as user dbadmin In order to compile the Java functions, run the “make” command with the relevant parameter. make JavaFunctions snip

Step 8. Test the Java User Defined Functions

Login as user dbadmin Run the Java UDX functions by executing the provided script. The script will: • Create the necessary tables. • Populate the tables with sample data. • Run the example functions. • Perform necessary housekeeping. vsql –w password -f JavaFunctions.sql snip

Step 9. Test the Java User Defined Parser and Filter Functions

Java functions can be embedded into Vertica’s SQL COPY command, which means that user defined transformations can be included during the data load process. To run the provided examples, login as user dbadmin Run the provided script vsql –w password -f JavaUDLFunctions.sql Be patient. One of the steps generates a 10m row data set which takes most of the time. The next step loads the dataset into Vertica, which takes only a little time!