Loading UDxs

User-defined extensions (UDxs) are contained in libraries. A library can contain multiple UDxs. To add UDxs to Vertica, you must:

  1. Deploy the library (once per library).
  2. Create each UDx (once per UDx).

If you are using UDxs written in Java, you must also set up a Java runtime environment. See Installing Java on Vertica Hosts.

Deploying Libraries

To deploy a library to your Vertica database:

  1. Copy the UDx shared library file (.so), Python file, Java JAR file, or R functions file that contains your function to a node on your Vertica cluster. You do not need to copy it to every node.
  2. Connect to the node where you copied the library (for example, using vsql).
  3. Add your library to the database catalog using the CREATE LIBRARY statement.

    => CREATE LIBRARY libname AS '/path_to_lib/filename'
       LANGUAGE 'language';
    

    libname is the name you want to use to reference the library. path_to_lib/filename is the fully-qualified path to the library or JAR file you copied to the host. language is the implementation language.

    For example, if you created a JAR file named TokenizeStringLib.jar and copied it to the account's home directory, you would use this command to load the library:

    => CREATE LIBRARY tokenizelib AS '/home/dbadmin/TokenizeStringLib.jar'
       LANGUAGE 'Java';
    

You can load any number of libraries into Vertica.

Privileges

Superusers can create, modify, and drop any library. Users with the UDXDEVELOPER role or explicit grants can also act on libraries, as shown in the following table:

Operation Requires
CREATE LIBRARY UDXDEVELOPER
Replace a library (CREATE OR REPLACE LIBRARY)

UDXDEVELOPER and one of:

  • owner of library being replaced
  • DROP privilege on the target library
DROP LIBRARY

UDXDEVELOPER and one of:

  • owner of library being dropped
  • DROP privilege on the target library
ALTER LIBRARY UDXDEVELOPER and owner

Creating UDx Functions

After the library is loaded, define individual UDxs using SQL statements such as CREATE FUNCTION and CREATE SOURCE. These statements assign SQL function names to the extension classes in the library. They add the UDx to the database catalog and remain available after a database restart.

The statement you use depends on the type of UDx you are declaring, as shown in the following table:

UDx Type SQL Statement
Aggregate Function (UDAF) CREATE AGGREGATE FUNCTION
Analytic Function (UDAnF) CREATE ANALYTIC FUNCTION
Scalar Function (UDSF) CREATE FUNCTION (Scalar)
Transform Function (UDTF) CREATE TRANSFORM FUNCTION
Load (UDL): Source CREATE SOURCE
Load (UDL): Filter CREATE FILTER
Load (UDL): Parser CREATE PARSER

If a UDx of the given name already exists, you can replace it or instruct Vertica to not replace it. To replace it, use the OR REPLACE syntax, as in the following example:

=> CREATE OR REPLACE TRANSFORM FUNCTION tokenize
   AS LANGUAGE 'C++' NAME 'TokenFactory' LIBRARY TransformFunctions;
CREATE TRANSFORM FUNCTION

You might want to replace an existing function to change between fenced and unfenced modes.

Alternatively, you can use IF NOT EXISTS to prevent the function from being created again if it already exists. You might want to use this in upgrade or test scripts that require, and therefore load, UDxs. By using IF NOT EXISTS, you preserve the original definition including fenced status. The following example shows this syntax:

--- original creation:
=> CREATE TRANSFORM FUNCTION tokenize
   AS LANGUAGE 'C++' NAME 'TokenFactory' LIBRARY TransformFunctions NOT FENCED;
CREATE TRANSFORM FUNCTION

--- function is not replaced (and is still unfenced):			
=> CREATE TRANSFORM FUNCTION IF NOT EXISTS tokenize
   AS LANGUAGE 'C++' NAME 'TokenFactory' LIBRARY TransformFunctions FENCED;
CREATE TRANSFORM FUNCTION

After you add the UDx to the database, you can use your extension within SQL statements. The database superuser can grant access privileges to the UDx for users. See GRANT (User Defined Extension) for details.

When you call a UDx, Vertica creates an instance of the UDx class on each node in the cluster and provides it with the data it needs to process.