MATCH Clause

A SQL extension that lets you screen large amounts of historical data in search of event patterns, the MATCH clause provides subclasses for analytic partitioning and ordering and matches rows from the result table based on a pattern you define.

You specify a pattern as a regular expression, which is composed of event types defined in the DEFINE subclause, where each event corresponds to a row in the input table. Then you can search for the pattern within a sequence of input events. Pattern matching returns the contiguous sequence of rows that conforms to PATTERN subclause. For example, pattern P (A B* C) consist of three event types: A, B, and C. When Vertica finds a match in the input table, the associated pattern instance must be an event of type A followed by 0 or more events of type B, and an event of type C.

Pattern matching is particularly useful for clickstream analysis where you might want to identify users' actions based on their Web browsing behavior (page clicks). A typical online clickstream funnel is:

Company home page -> product home page -> search -> results -> purchase online

Using the above clickstream funnel, you can search for a match on the user's sequence of web clicks and identify that the user:

For examples that use this clickstream model, see Event Series Pattern Matching in Analyzing Data.

Syntax

MATCH ( [ PARTITION BY table_column ] ORDER BY table_column
... DEFINE event_name AS boolean_expr [, ...]
... PATTERN pattern_name AS ( regexp ) 
... [ ROWS MATCH { ALL EVENTS | FIRST EVENT } ] )

Parameters

PARTITION BY

[Optional] Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. The partition clause partitions the data by matched patterns defined in the PATTERN subclause. For each partition, data is sorted by the ORDER BY clause. If the partition clause is omitted, the entire data set is considered a single partition.

ORDER BY

Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. For each partition, the order clause specifies how the input data is ordered for pattern matching.

Note: The ORDER BY clause is mandatory.

DEFINE

Defines the boolean expressions that make up the event types in the regular expressions. For example:

DEFINE 
 Entry    AS RefURL  NOT ILIKE '%website2.com%' AND PageURL ILIKE
                               '%website2.com%',
 Onsite   AS PageURL ILIKE     '%website2.com%' AND Action='V',

 Purchase AS PageURL ILIKE     '%website2.com%' AND Action='P' 

The DEFINE subclause accepts a maximum of 52 events. See Event Series Pattern Matching in Machine Learning for Predictive Analytics for examples.

event_name

Is the name of the event to evaluate for each row; for example, Entry, Onsite, Purchase.

Note: Event names are case insensitive and follow the same naming conventions as those used for tables and columns.

boolean_expr

Is an expression that returns true or false. boolean_expr can include Boolean Operators and relational (comparison) operators. For example:

Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN pattern_name

Is the name of the pattern, which you define in the PATTERN subclause; for example, P is the pattern name defined below:

 PATTERN P AS (...)

A PATTERN is a search pattern that is comprised of a name and a regular expression.

Note: Vertica supports one pattern per query.

regexp

Is a regular expression comprised of event types (defined in the DEFINE subclause), and one or more of the quantifiers below. When Vertica evaluates the MATCH clause, the regular expression identifies the rows that meet the expression criteria.

 *

Match 0 or more times

 *?

Match 0 or more times, not greedily

 +

Match 1 or more times

 +?

Match 1 or more times, not greedily

 ?

Match 0 or 1 time

 ??

Match 0 or 1 time, not greedily

 *+

Match 0 or more times, possessive

 ++

Match 1 or more times, possessive

 ?+

Match 0 or 1 time, possessive

 |

Alternation. Matches expression before or after the vertical bar. Similar to a Boolean or.

ROWS MATCH 

[Optional] Defines how to resolve more than one event evaluating to true for a single row.

  • If you use ROWS MATCH ALL EVENTS, Vertica returns the following run-time error if more than one event evaluates to true for a single row:

    ERROR: pattern events must be mutually exclusive
    HINT:  try using ROWS MATCH FIRST EVENT
  • For ROWS MATCH FIRST EVENT, if more than one event evaluates to true for a single row, Vertica chooses the event defined first in the SQL statement to be the event it uses for the row.

Pattern Semantic Evaluation

For each match, Vertica outputs the rows that contribute to the match. Rows not part of the match (do not satisfy one or more predicates) are not output.

Notes and Restrictions

Examples

For examples, see Event Series Pattern Matching in Machine Learning for Predictive Analytics.