Loading...

User Defined Functions#

User-Defined Functions (UDFs) in VerticaPy offer a powerful mechanism for extending the functionality of the Vertica analytical database by allowing users to define custom functions tailored to their specific needs. These UDFs come in various types, each serving distinct purposes within the database ecosystem.

  • User-Defined Scalar Functions (UDSFs)

  • User-Defined Analytic Functions (UDAF)

  • User-Defined Aggregate Functions (UDAnF)

  • User-Defined Transform Functions (UDTF)

  • User-Defined Load Functions (UDL)

  • User-Defined SQL Functions (UDF)

The table below summarizes the availability of different types of UDx with their language support.

_images/img_udf_8.png
_images/img_udf_1.png

User-Defined Scalar Functions are custom functions that operate on individual values and return a single value. These functions can be used in SELECT statements, WHERE clauses, and various other SQL expressions.

_images/img_udf_2.png

User-Defined Analytic Functions perform calculations across a set of rows related to the current row. They are commonly used in analytical and windowed queries, providing flexibility in data analysis.

_images/img_udf_3.png

User-Defined Aggregate Functions allow you to define custom aggregate operations on groups of rows. These functions are particularly useful in summarizing and aggregating data.

_images/img_udf_4.png

User-Defined Transform Functions are generally used during the loading process to transform or preprocess data before it is loaded into the Vertica database. They are essential for data cleansing and transformation tasks.

_images/img_udf_5.png

User-Defined Load Functions are used to define how Vertica should load data into a table. They allow customization of the loading process, enabling users to handle specific data formats or perform additional processing during data loading.

_images/img_udf_6.png

User-Defined SQL Functions enable the creation of custom SQL functions with specific logic. These functions can be invoked in SQL queries, providing a way to encapsulate complex logic in a modular and reusable manner.


Unfenced Mode#

Vertica also has an Unfenced Mode for advanced users.

_images/img_udf_7.png

In Fenced Mode, stability is emphasized as it employs a separate zygote process. This separation ensures that User-Defined Extension (UDx) crashes do not adversely impact the core Vertica process.

Pro: Protected
Con: Slower

Conversely, in Unfenced Mode, the safety of zygote processes is not guaranteed. However, it excels in performance as data does not need to move back and forth.

Pro: Faster
Con: Unprotected

Let’s have a look at some use-cases to see the difference in speed for both the modes:

Size:

Rows

Columns

31.29M

2559

Query:

SELECT xgb_score(...)
FROM s_table;

Size:

Rows

Columns

100M

2

Query:

SELECT COUNT(*)
FROM
    (
        SELECT polyk(10, x, y)
        OVER (PARTITION BY x) FROM foo
    ) AS P;

Size:

Rows

Columns

100M

2

Query:

SELECT COUNT(*)
FROM foo
WHERE shorterString(a,b) = a;

Example - XGB Prediction#

Let us look at an exmaple of a UDx performing XGB Prediction. The elements of code are explained in the image;

_images/img_udf_9.png

The above example was using python. Let us look at the same function using SQL:

_images/img_udf_10.png

Note

For more information, please refer to the Python SDK doc


Auto-Lambda Generator Function#

Understanding how VerticaPy User-Defined Extensions (UDx) work involves exploring the seamless process of creating and deploying custom functions within Vertica. As an illustrative example, consider the auto-generation of a lambda function.

_images/img_udf_11.png

Once a user creates a custom UDx using a Python function, the workflow becomes remarkably straightforward. The user interacts solely with the Python function, and VerticaPy takes care of the rest. VerticaPy intelligently parses the provided Python function, creates the necessary User-Defined Functions (UDFs), and efficiently installs them across different nodes in the Vertica cluster. This installation process ensures that the custom functionality is distributed across multiple nodes, facilitating load distribution and enhancing scalability. By automating the parsing, creation, and distribution of UDx, VerticaPy streamlines the development and deployment of custom functions, allowing users to focus on the logic and functionality of their Python code without worrying about the intricacies of distributed computing within the Vertica environment

Example#

Let’s delve into an example of a User-Defined Extension (UDx) function, specifically focusing on the math.isClose() function. When a user defines this new function using the create_lib_udf function in VerticaPy, the process becomes exceptionally streamlined. VerticaPy, behind the scenes, automatically generates the corresponding lambda function in Python. Furthermore, it seamlessly installs this function across the different nodes in the Vertica cluster using SQL commands.

_images/img_udf_12.png

The beauty of this approach shines when the math.isClose() function is called. As part of the Vertica cluster, each node independently implements the function. Leveraging the power of distributed computing, the aggregate result is then efficiently calculated across all nodes. This ensures that the math.isClose() function, once defined and installed, seamlessly integrates into the Vertica environment, providing users with a scalable and distributed solution for their mathematical closeness calculations.

For more in-depth detail on the uses of User-Defined Functions please refer to Full Stack.

See also

SDK Documentation: Vertica Python SDK
Notebooks on UDFx: Full Stack
Vertica Documentation on UDx: Python SDK doc