User Defined Functions#
User-Defined Functions (UDFs) in VerticaPy offer a powerful mechanism for extending the functionality of the Vertica analytical database by allowing users to define custom functions tailored to their specific needs. These UDFs come in various types, each serving distinct purposes within the database ecosystem.
User-Defined Scalar Functions (UDSFs)
User-Defined Analytic Functions (UDAF)
User-Defined Aggregate Functions (UDAnF)
User-Defined Transform Functions (UDTF)
User-Defined Load Functions (UDL)
User-Defined SQL Functions (UDF)
The table below summarizes the availability of different types of UDx with their language support.
User-Defined Scalar Functions are custom functions that operate on individual values and return a single value. These functions can be used in SELECT statements, WHERE clauses, and various other SQL expressions.
User-Defined Analytic Functions perform calculations across a set of rows related to the current row. They are commonly used in analytical and windowed queries, providing flexibility in data analysis.
User-Defined Aggregate Functions allow you to define custom aggregate operations on groups of rows. These functions are particularly useful in summarizing and aggregating data.
User-Defined Transform Functions are generally used during the loading process to transform or preprocess data before it is loaded into the Vertica database. They are essential for data cleansing and transformation tasks.
User-Defined Load Functions are used to define how Vertica should load data into a table. They allow customization of the loading process, enabling users to handle specific data formats or perform additional processing during data loading.
User-Defined SQL Functions enable the creation of custom SQL functions with specific logic. These functions can be invoked in SQL queries, providing a way to encapsulate complex logic in a modular and reusable manner.
Unfenced Mode#
Vertica also has an Unfenced Mode for advanced users.
In Fenced Mode, stability is emphasized as it employs a separate zygote process. This separation ensures that User-Defined Extension (UDx) crashes do not adversely impact the core Vertica process.
Conversely, in Unfenced Mode, the safety of zygote processes is not guaranteed. However, it excels in performance as data does not need to move back and forth.
Let’s have a look at some use-cases to see the difference in speed for both the modes:
Size:
Rows |
Columns |
---|---|
31.29M |
2559 |
Query:
SELECT xgb_score(...)
FROM s_table;
Size:
Rows |
Columns |
---|---|
100M |
2 |
Query:
SELECT COUNT(*)
FROM
(
SELECT polyk(10, x, y)
OVER (PARTITION BY x) FROM foo
) AS P;
Size:
Rows |
Columns |
---|---|
100M |
2 |
Query:
SELECT COUNT(*)
FROM foo
WHERE shorterString(a,b) = a;
Example - XGB Prediction#
Let us look at an exmaple of a UDx performing XGB Prediction. The elements of code are explained in the image;
The above example was using python
.
Let us look at the same function using
SQL
:
Note
For more information, please refer to the Python SDK doc
Auto-Lambda Generator Function#
Understanding how VerticaPy User-Defined Extensions (UDx) work involves exploring the seamless process of creating and deploying custom functions within Vertica. As an illustrative example, consider the auto-generation of a lambda function.
Once a user creates a custom UDx using a Python function, the workflow becomes remarkably straightforward. The user interacts solely with the Python function, and VerticaPy takes care of the rest. VerticaPy intelligently parses the provided Python function, creates the necessary User-Defined Functions (UDFs), and efficiently installs them across different nodes in the Vertica cluster. This installation process ensures that the custom functionality is distributed across multiple nodes, facilitating load distribution and enhancing scalability. By automating the parsing, creation, and distribution of UDx, VerticaPy streamlines the development and deployment of custom functions, allowing users to focus on the logic and functionality of their Python code without worrying about the intricacies of distributed computing within the Vertica environment
Example#
Let’s delve into an example of a User-Defined
Extension (UDx) function, specifically focusing
on the math.isClose()
function. When a user
defines this new function using the
create_lib_udf
function in VerticaPy, the
process becomes exceptionally streamlined.
VerticaPy, behind the scenes, automatically
generates the corresponding lambda function in
Python. Furthermore, it seamlessly installs this
function across the different nodes in the
Vertica cluster using SQL commands.
The beauty of this approach shines when the
math.isClose()
function is called. As part
of the Vertica cluster, each node independently
implements the function. Leveraging the power of
distributed computing, the aggregate result is
then efficiently calculated across all nodes.
This ensures that the math.isClose()
function,
once defined and installed, seamlessly integrates
into the Vertica environment, providing users with
a scalable and distributed solution for their
mathematical closeness calculations.
For more in-depth detail on the uses of User-Defined Functions please refer to Full Stack.
See also