
While walking to the coffee shop yesterday morning, I wondered briefly how frequently I decide to stop and get coffee and how frequently I head straight to work. While I stood in line for an embarrassingly complicated latte, I started to think about how my actions could actually be interpreted (and maybe even predicted) by Big Data analytics. Specifically, I thought about pattern matching.
It works like this: I could record everywhere I go, and enter each event (leave home, get coffee, arrive at work) as a different row in a table. Then, I could examine that table for patterns, or specific series of events. One pattern would be: leave home, arrive at work. Another pattern would be: leave home, get coffee, arrive at work.
Extrapolating from the previous scenario, we can see that the same tactic can be applied to larger amounts of data. Maybe you want to find specific patterns of user behavior in your online game. Maybe you want to identify areas on your credit card bill that show a $20+ charge at one bar followed immediately by a $20+ charge at a different bar. Then again, maybe you dont want to do that. The point is, the applications of Verticas event series pattern matching functionally are endless. In this blog, well introduce you to the pattern matching key features.
Whats an event and whats a pattern?
As the name suggests, event patterns are simply series of events that occur in an order, or pattern that you specify. Vertica evaluates each row in your table, looking for the event you define. When Vertica finds a sequence of rows that conform to your pattern, it outputs the rows that contribute to the match.
As a simple example, lets say you want to find places in your records where event A is following by event B and then event C. With pattern matching, Vertica will return the following shaded rows:

An example with clickstream analysis
Pattern matching is particularly useful for clickstream analysis where you might want to identify users’ actions based on their web browsing behavior. An example of an online clickstream event sequence might be:
- User is on outside website
- User visits your organizations home page
- User navigates to a product page
- User purchases a product online
Here’s the clickstream_log table well use through the tutorial. The column definitions are as follows:
- uid INT: user ID
- ts TIME: timestamp of the action the user took
- refURL VARCHAR(50): URL of the page referencing pageURL. In other words, the page where the previous event took place.
- pageURL VARCHAR(50): URL of the page the user visits
- action CHAR(1): action the user took after visiting the page (E = Enter my site, ‘V’ = View page on my site, ‘P’ = Purchase item from my site, X = Exit my site)
uid | ts | refURL | pageURL | action
----+----------+----------------------------+----------------------------+--------
1 | 12:00:00 | otherSite.com | mySite.com/home | E
1 | 12:01:00 | mySite.com/home | mySite.com/products | V
1 | 12:02:00 | mySite.com/products | mySite.com/products/tshirt | V
1 | 12:03:00 | mySite.com/products/tshirt | mySite.com/buy | P
2 | 12:10:00 | otherSite.com | mySite.com/home | E
2 | 12:11:00 | mySite.com/home | mySite.com/products | V
2 | 12:12:00 | mySite.com/products | mySite.com/home | V
2 | 12:13:00 | mySite.com/home | otherSite.com | X
3 | 12:17:00 | otherSite.com | mySite.com/home | E
3 | 12:18:00 | mySite.com/home | mySite.com/products | V
3 | 12:19:00 | mySite.com/products | mySite.com/products/couch | V
3 | 12:20:00 | mySite.com/products/couch | mySite.com/buy | P
(12 rows)
The MATCH Clause
So what can you do with this data? Lets say you want to find out if your purchase option is easy for a user to use. You might determine this by identifying how many times a user goes straight from a specific products page to the buy page, indicating the user is ready to purchase that specific item.
To find instances where these events occurred in order, use the MATCH clause. The MATCH clause includes two subclauses: DEFINE and PATTERN. True to their name, you use the subclauses to do the following:
- DEFINE: define your events
- PATTERN: specify the pattern of the events
In this scenario, the events involved are the following:
- User viewed a products page
- User clicked the buy button
And the pattern we want them to be in is:
- User viewed a products page, and then immediately purchased the item.

Putting these subclauses together in the MATCH clause, we get:
=> SELECT * FROM clickstream_log
MATCH(PARTITION BY uid ORDER BY ts
DEFINE product AS pageURL iLIKE '%mySite.com/products/%'
AND action='V', purchase AS pageURL ILIKE '%mySite.com%' AND action ='P' PATTERN P AS (product purchase));
uid | ts | refURL | pageURL | action
----+----------+----------------------------+----------------------------+--------
1 | 12:02:00 | mySite.com/products | mySite.com/products/tshirt | V
1 | 12:03:00 | mySite.com/products/tshirt | mySite.com/buy | P
3 | 12:19:00 | mySite.com/products | mySite.com/products/couch | V
3 | 12:20:00 | mySite.com/products/couch | mySite.com/buy | P
(4 rows)
As you can see, the statement returns all instances where a user went straight from viewing a products page to purchasing an item.
Pattern Matching Functions
You can view more information about your patterns using pattern matching functions.
Pattern matching functions include:
- EVENT_NAME(): Returns the name of the event that matched the row.
- MATCH_ID(): Returns a successful pattern match as an INTEGER value. The returned value is the ordinal position of a match within a partition.
- PATTERN_ID():Returns an integer value that is a partition-wide unique identifier for the instance of the pattern that matched.
For example, using the previous MATCH clause:
=> SELECT uid, ts, refURL, pageURL, action, event_name(), pattern_id(), match_id()
FROM clickstream_log
MATCH (PARTITION BY uid ORDER BY ts
DEFINE product AS pageURL iLIKE '%mySite.com/products/%' AND action='V', purchase AS pageURL ILIKE '%mySite.com%' AND action ='P' PATTERN P AS (product purchase));
Youll notice that the event_name column lists only one event per row. But what if, within a single row, more than one event is a match? In the previous example, this can never happen (i.e., a row cant satisfy both the product and purchase predicates). So lets look at an example where this might happen.
Using the ROWS MATCH parameter for multiple TRUE events
In the following clause we define event duplicate1 as an event where the pageURL includes mySite.com. We define event duplicate2 as an event where the refURL contains mySite.com. We want to output the pattern where a row with event duplicate1 is followed by a row with duplicate2.
DEFINE duplicate1 AS pageURL iLIKE '%mySite.com/%', duplicate2 AS refURL iLIKE '%mySite.com/%' PATTERN P AS (duplicate1 duplicate2)
From our clickstream_log table, its easy to see that almost all rows satisfy events duplicate1 and duplicate2. If we leave this clause as-is, Vertica returns the following error:
ERROR: pattern events must be mutually exclusive HINT: try using ROWS MATCH FIRST EVENT
The Vertica MATCH clause has a ROWS MATCH parameter that allows you to solve this problem. Until now, weve been letting the MATCH clause use the default ROWS MATCH parameter: ROWS MATCH ALL EVENTS. To deal with a table that has rows that satisfy more than one event, use the ROWS MATCH FIRST EVENT option to avoid the above error.
With the ROWS MATCH FIRST EVENT keyword, when Vertica comes across a row where both event types exist, it considers the row as being of the event type that is mentioned first in the MATCH clause.
Looking at our table, Vertica would consider the rows marked in green boxes as being of event type duplicate1 (because they all satisfy the first defined event) and the row marked in a hashed orange box as being of event type duplicate2 (it only matches the second event):
=> SELECT * FROM clickstream_log;
Thus, when we specify that we want a pattern where duplicate1 is followed by duplicate2, we get only one instance (where the orange box follows a green box):
=> SELECT uid, ts, refURL, pageURL, action, event_name()
FROM clickstream_log
MATCH ( PARTITION BY uid ORDER BY ts
DEFINE duplicate1 AS pageURL iLIKE '%mySite.com/%', duplicate2 AS refURL iLIKE '%mySite.com/%'
PATTERN P AS (duplicate1 duplicate2)
ROWS MATCH FIRST EVENT);
uid | ts | refURL | pageURL | action | event_name
-----+----------+---------------------+-----------------+--------+------------
2 | 12:12:00 | mySite.com/products | mySite.com/home | V | duplicate1
2 | 12:13:00 | mySite.com/home | otherSite.com | X | duplicate2
(2 rows)
More Examples
With event series pattern matching, the possibilities for data interpretation are endless. Want to see a more complex example? Stay tuned for part two of the Event Series Pattern Matching series.
Learn More
Learn more about event series pattern matching, including restrictions, in the core Vertica documentation.