What’s New in Vertica 9.1: Python SDK Expands

Posted May 30, 2018 by Soniya Shah, Information Developer

Programmer
This blog post was authored by Monica Cellio. Using the Vertica SDK, you can write several types of user-defined extensions (UDxs) to add your own customizations. In a previous release the SDK added Python support for one type, scalar functions (UDSFs). In 9.1 we added Python support for transform functions (UDTFs). User-Defined Transform Functions (UDTFs) operate on table partitions (as specified by the query’s OVER() clause) and return zero or more rows of data. The data they return can be an entirely new table, unrelated to the schema of the input table, including having its own ordering and segmentation expressions. They can only be used in the SELECT list of a query. The following example tokenizes input strings. This is the complete Python code for this UDTF. class StringTokenizer(vertica_sdk.TransformFunction): """ Transform function which tokenizes its inputs. For each input string, each of the whitespace-separated tokens of that string is produced as output. """ def processPartition(self, server_interface, input, output): while True: for token in input.getString(0).split(): output.setString(0, token) output.next() if not input.next(): break class StringTokenizerFactory(vertica_sdk.TransformFunctionFactory): def getPrototype(self, server_interface, arg_types, return_type): arg_types.addVarchar() return_type.addVarchar() def getReturnType(self, server_interface, arg_types, return_type): return_type.addColumn(arg_types.getColumnType(0), "tokens") def createTransformFunction(cls, server_interface): return StringTokenizer() For more information about writing UDTFs in Python, see UDTF Python API in Extending Vertica.