REGEXP_SUBSTR

Returns the substring that matches a regular expression within a string. If no matches are found, this function returns NULL. This is different from an empty string, which the function can return if the regular expression matches a zero-length string.

Syntax

REGEXP_SUBSTR( string, pattern [, position [,  occurrence  [, regexp_modifier...  [, captured_subexp ] ] ] ])

Parameters

string

The VARCHAR or LONG VARCHAR string to search for a regular expression pattern match. If string exists in a __raw__ column of a flex or columnar table, cast string to a LONG VARCHAR before searching for pattern.

pattern

The regular expression to find a substring to extract. The syntax of the regular expression is compatible with the Perl 5 regular expression syntax. See the Perl Regular Expressions Documentation for details.

position

[Optional] The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the nth character you specify.

Default value: 1

occurrence

[Optional] Controls which occurrence of a pattern match in the string to return. By default, the function returns the position of the first matching substring. Use this parameter to find the position of subsequent matching substrings. For example, setting this parameter to 3 returns the position of the third substring that matches the pattern.

Default value: 1

regexp_modifier

[Optional] One or more single-character flags that modify how the regular expression finds matches in string:

b

Treat strings as binary octets, rather than UTF-8 characters.

c

Force the match to be case sensitive (the default).

i

Force the match to be case insensitive.

m

Treat the string to match as multiple lines. Using this modifier, the start of line (^) and end of line ($) regular expression operators match line breaks (\n) within the string. Without the m modifier, the start and end of line operators match only the start and end of the string.

n

Allow the single character regular expression operator (.) to match a newline (\n). Without the n modifier, the . operator matches any character except a newline.

x

Add comments to your regular expressions. Using the x modifier causes the function to ignore all unescaped space characters and comments in the regular expression. Comments start with a hash (#) character and end with a newline (\n). All spaces in the regular expression that you want to be matched in strings must be escaped with a backslash (\) character.

captured_subexp 

[Optional] The captured subexpression whose position to return. By default, the function returns the position of the first character in string that matches the regular expression. If you set this value from 1 – 9, the function returns the subexpression captured by the corresponding set of parentheses in the regular expression. For example, setting this value to 3 returns the substring captured by the third set of parentheses in the regular expression.

Default value: 0

The subexpressions are numbered left to right, based on the appearance of opening parenthesis, so nested regular expressions . For example, in the regular expression \s*(\w+\s+(\w+)), subexpression 1 is the one that captures everything but any leading whitespaces.

Notes

This function operates on UTF-8 strings using the default locale, even if the locale has been set to something else.

If you are porting a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.

Examples

Select the first substring of letters that end with "thy."

=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy');
 REGEXP_SUBSTR 
---------------
 healthy
(1 row)

Select the first substring of letters that ends with "thy" starting at the second character in the string.

=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',2);
 REGEXP_SUBSTR 
---------------
 ealthy
(1 row)

Select the second substring of letters that ends with "thy."

=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',1,2);
 REGEXP_SUBSTR 
---------------
 wealthy
(1 row)

Return the contents of the third captured subexpression, which captures the third word in the string.

=> SELECT REGEXP_SUBSTR('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
 REGEXP_SUBSTR
---------------
 three
(1 row)