Requirements for Custom Stemmers and Tokenizers

Sometimes, you may want specific tokenization or stemming behavior that differs from what Vertica provides. In such cases, you can to implement your own custom User Defined Extensions (UDx) to replace the stemmer or tokenizer. For more information about building custom UDxs see Developing User-Defined Extensions (UDxs).

Before implementing a custom stemmer or tokenizer in Vertica verify that the UDx extension meets these requirements.

Note: Custom tokenizers can return multi-column text indices.

Vertica Stemmer Requirements

Comply with these requirements when you create custom stemmers:

Supported Data Input Types:

Supported Data Output Types:

Vertica Tokenizer Requirements

To create custom tokenizers, follow these requirements:

Supported Data Input Types:

Supported Data Output Types: