How to Do Time Series Analysis in Vertica

This blog post was authored by Maurizio Felici. Time series analytics evaluate the values of a given set of variables over time and group those values into a window, based on a time interval for analysis and aggregation. There are a few types of functions that help perform time series analytics:

Event based functions
Time series functions

Event Based Functions

Event based analytic functions allow you to use significant events to define the “borders” of your event window. For example, suppose you have a series of events represented by timestamps:

select * from ts1 order by ts ;
ts
---------------------
2015-05-21 23:08:02
2015-05-21 23:08:11
2015-05-21 23:08:20
2015-05-21 23:08:53
2015-05-21 23:09:03
2015-05-21 23:11:18
2015-05-21 23:16:38
2015-05-21 23:16:49

You can visualize these events by plotting them on a time axis:

Now we want to identify group events consisting of one or more single events. A single event is part of a group event if it happens within a time proximity threshold of 60 seconds:

Each group event is described by the following attributes:

Start timestamp
End timestamp
Group-event size (the number of single events in the group event)

This kind of analysis has several practical applications, including customer interaction analysis and web interaction session definitions. Let’s see how easily we can identify our group events using the Vertica event based analytic functions:

The following graphic breaks down how the previous query worked:

Time Series Functions

Time series are two dimensional objects used to measure variables over time. Each element of a time series contains a variable value and a position in the time axis. Time series analytics is useful when you want to analyze discrete data collected over time, such as stock market trades and performance. Discrete time series occur when variables are measured within a certain sampling interval. Time series are normally represented in two dimension graphs, with an X-axis for time and a Y-axis for the variable value. Time series analysis can be difficult when you have gaps, incomplete series, or a non-uniform time interval. Time series analytics is a useful Vertica SQL extension used to evaluate and group variables over time. It consists of:

SELECT timeseries clause
TS_FIRST_VALUE and TS_LAST_VALUE aggregate functions

The syntax for time series functions is similar to other analytic functions:

The slice time is the time column produced by the TIMESERIES clause. The time interval is the length of the time slice. The window partition clause (PARTITION BY) is option and used to partition data. The window order clause (ORDER BY) is required and used to sort data. This example uses the same list of timestamps used in the previous section and then fills in the gaps:

select * from ts1 ;
ts
---------------------
2015-05-21 23:08:02
2015-05-21 23:08:11
2015-05-21 23:08:20
2015-05-21 23:08:53
2015-05-21 23:09:03
2015-05-21 23:11:18
2015-05-21 23:16:38
2015-05-21 23:16:49

The SELECT…TIMESERIES created one time point for each minute:

select tm from ts1
timeseries tm as '1 minute' over ( order by ts ) ;
tm
---------------------
2015-05-21 23:08:00
2015-05-21 23:09:00
2015-05-21 23:10:00
2015-05-21 23:11:00
2015-05-21 23:12:00
2015-05-21 23:13:00
2015-05-21 23:14:00
2015-05-21 23:15:00
2015-05-21 23:16:00

Now, let’s add a new column:

select * from tseries
order by ts ;
ts          | value
---------------------+-------
2015-05-21 23:08:02 |    27
2015-05-21 23:08:11 |    26
2015-05-21 23:08:20 |    30
2015-05-21 23:08:52 |    28
2015-05-21 23:09:03 |    27
2015-05-21 23:11:18 |    20
2015-05-21 23:16:38 |    13
2015-05-21 23:16:49 |    12

Then, create a time series with both gap filling and interpolation:

select tm, ts_first_value(value) as int_value from tseries
timeseries tm as '1 minute' over ( order by ts ) ;
tm          |       int_value
---------------------+----------------
2015-05-21 23:08:00 |
2015-05-21 23:09:00 |             28
2015-05-21 23:10:00 |             27
2015-05-21 23:11:00 |             27
2015-05-21 23:12:00 |             20
2015-05-21 23:13:00 |             20
2015-05-21 23:14:00 |             20
2015-05-21 23:15:00 |             20
2015-05-21 23:16:00 |             20

Vertica uses two different interpolation methods:

CONSTANT interpolation
LINEAR interpolation

If you do not specify the interpolation method, the default is CONSTANT interpolation. The interpolation method is specified through the TS_FIRST_VALUE and TS_LAST_VALUE functions: TS_FIRST_VALUE ( expression [ IGNORE NULLS ] ... [, { 'CONST' | 'LINEAR' } ] ) TS_LAST_VALUE ( expression [ IGNORE NULLS ] … [, { ‘CONST’ | ‘LINEAR’ } ] ) Choosing the correct interpolation method depends on what you are analyzing. For example, analyzing stock quotes would likely mean using CONSTANT, while analyzing temperatures is better suited for LINEAR. The following example shows what happens when the example is gap filled using linear interpolation:

Time series can have NULL values either on the time axis or on the value axis. Rows with NULL in the time axis are filtered out by Vertica. If you instruct Vertica to IGNORE NULLS, it will use the previous or next LAST/FIRST value in the time slice. LINEAR interpolation requires at least two values in the time slice. If one is NULL, Vertica interpolated values are NULL as well.

Event Series Joins

Event series joins are a Vertica SQL extension for OUTER JOINs. This extension allows you to join tables on time series that do not align precisely. For example, the following shows an OUTER JOIN on the two tables, where non-matching rows are padded with NULL values:

Event series joins will replace “gaps” with interpolated values.

Time Series Pattern Matching

The Vertica event series pattern matching functionality allows you to screen large amounts of historical data in search of event patterns. Event patters are a series of events that occur in an order or “pattern” that you specify. Events are evaluated for each row. When Vertica finds a contiguous sequence of rows that conforms to the pattern you define, it outputs the rows that contribute to the match. Rows not part of the match do not output. For more information, see the following in the Vertica documentation:

About the Author

Soniya Shah
Information Developer

Currently, a first year law student with a background in science and technology. Experienced technical writer, with specializations in software documentation, big data, blog development, and website development. I build user-centered content to communicate complex and technical information more easily.

I used to work for Vertica full time for about 3 years. I still work at Vertica part time while going to law school.

Update: Soniya is now doing her law internship, and no longer working at Vertica. Good luck, Soniya!

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

How to Do Time Series Analysis in Vertica

Event Based Functions

Time Series Functions

Event Series Joins

Time Series Pattern Matching

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

How to Do Time Series Analysis in Vertica

Event Based Functions

Time Series Functions

Event Series Joins

Time Series Pattern Matching

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Under the Hood Posts