REGEXP_SUBSTR
Returns the substring that matches a regular expression within a string. If no matches are found, this function returns NULL. This is different from an empty string, which the function can return if the regular expression matches a zero-length string.
Syntax
REGEXP_SUBSTR( string, pattern [, position [, occurrence [, regexp_modifier... [, captured_subexp ] ] ] ])
Parameters
string
|
The |
||||||||||||
pattern
|
The regular expression to find a substring to extract. The syntax of the regular expression is compatible with the Perl 5 regular expression syntax. See the Perl Regular Expressions Documentation for details. |
||||||||||||
position
|
[Optional] The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the nth character you specify. Default value: 1 |
||||||||||||
occurrence
|
[Optional] Controls which occurrence of a pattern match in the string to return. By default, the function returns the position of the first matching substring. Use this parameter to find the position of subsequent matching substrings. For example, setting this parameter to 3 returns the position of the third substring that matches the pattern. Default value: 1 |
||||||||||||
regexp_modifier
|
[Optional] One or more single-character flags that modify how the regular expression finds matches in string:
|
||||||||||||
captured_subexp |
[Optional] The captured subexpression whose position to return. By default, the function returns the position of the first character in string that matches the regular expression. If you set this value from 1 – 9, the function returns the subexpression captured by the corresponding set of parentheses in the regular expression. For example, setting this value to 3 returns the substring captured by the third set of parentheses in the regular expression. Default value: 0 The subexpressions are numbered left to right, based on the appearance of opening parenthesis, so nested regular expressions . For example, in the regular expression |
Notes
This function operates on UTF-8 strings using the default locale, even if the locale has been set to something else.
If you are porting a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Examples
Select the first substring of letters that end with "thy."
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy'); REGEXP_SUBSTR --------------- healthy (1 row)
Select the first substring of letters that ends with "thy" starting at the second character in the string.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',2); REGEXP_SUBSTR --------------- ealthy (1 row)
Select the second substring of letters that ends with "thy."
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',1,2); REGEXP_SUBSTR --------------- wealthy (1 row)
Return the contents of the third captured subexpression, which captures the third word in the string.
=> SELECT REGEXP_SUBSTR('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3); REGEXP_SUBSTR --------------- three (1 row)