SET_TOKENIZER_PARAMETER
Configures the tokenizer parameters.
Important: \n, \t,\r
must be entered as Unicode using Vertica notation, U&’\000D’
, or using Vertica escaping notation, E’\r’
. Otherwise, they are taken literally as two separate characters. For example, "\" & "r"
.
Syntax
SELECT v_txtindex.SET_TOKENIZER_PARAMETER (parameter_name, parameter_value USING PARAMETERS proc_oid='proc_oid')
Parameters
parameter_name |
Name of the parameter to be configured. Use one of the following:
|
parameter_value |
The value of a configuration parameter. If you want to disable minorSeperators or stopWordsCaseInsensitive, then set their values to |
proc_oid |
A unique identifier assigned to a tokenizer when it is created. Users must query the system table vs_procedures to get the proc_oid for a given tokenizer name. See Configuring a Tokenizer for more information. |
Examples
The following examples show how you can use SET_TOKENIZER_PARAMETER to configure stop words and separators.
Configure the stop words of a tokenizer:
=> SELECT v_txtindex.SET_TOKENIZER_PARAMETER('stopwordsCaseInsensitive', 'devil,TODAY,the,fox' USING PARAMETERS proc_oid='45035996274126984');
SET_TOKENIZER_PARAMETER
-------------------------
t
(1 row)
Configure the major separators of a tokenizer:
=> SELECT v_txtindex.SET_TOKENIZER_PARAMETER('majorSeparators',E'{}()&[]' USING PARAMETERS proc_oid='45035996274126984'); SET_TOKENIZER_PARAMETER ------------------------- t (1 row)