Vertica Analytics Platform Version 9.2.x Documentation

INTERPOLATE

Used to join two event series together using some ordered attribute, event series joins let you compare values from two series directly, rather than having to normalize the series to the same measurement interval.

Syntax

expression1 INTERPOLATE PREVIOUS VALUE expression2

Parameters

expression1
expression2

A column reference from one the tables specified in the FROM Clause.

The referenced columns are typically a DATE/TIME data type, often TIMESTAMP, inasmuch as you are joining data that represents an event series; however, the referenced columns can be of any type.

PREVIOUS VALUE

Pads the non-preserved side with the previous values from relation when there is no match.

Input rows are sorted in ascending logical order of the join column.

An ORDER BY clause, if used, does not determine the input order but only determines query output order.

Description

  • An event series join is an extension of a regular outer join. Instead of padding the non-preserved side with null values when there is no match, the event series join pads the non-preserved side with the previous values from the table.
  • The difference between expressing a regular outer join and an event series join is the INTERPOLATE predicate, which is used in the ON clause. See the Examples section below Notes and Restrictions. See also Event Series Joins in Analyzing Data.
  • Data is logically partitioned on the table in which it resides, based on other ON clause equality predicates.
  • Interpolated values come from the table that contains the null, not from the other table.
  • Vertica does not guarantee that there will be no null values in the output. If there is no previous value for a mismatched row, that row will be padded with nulls.
  • Event series join requires that both tables be sorted on columns in equality predicates, in any order, followed by the INTERPOLATED column. If data is already sorted in this order, then an explicit sort is avoided, which can improve query performance. For example, given the following tables:

    ask: exchange, stock, ts, pricebid: exchange, 
    stock, ts, price
    

    In the query that follows

    • ask is sorted on exchange, stock (or the reverse), ts
    • bid is sorted on exchange, stock (or the reverse), ts
    SELECT ask.price - bid.price, ask.ts, ask.stock, ask.exchange    
    FROM ask FULL OUTER JOIN bid
       ON ask.stock = bid.stock AND ask.exchange = 
       bid.exchange AND ask.ts INTERPOLATE PREVIOUS 
       VALUE bid.ts;
    

Restrictions

  • Only one INTERPOLATE expression is allowed per join.
  • INTERPOLATE expressions are used only with ANSI SQL-99 syntax (the ON clause), which is already true for full outer joins.
  • INTERPOLATE can be used with equality predicates only.
  • The AND operator is supported but not the OR and NOT operators.
  • Expressions and implicit or explicit casts are not supported, but subqueries are allowed.

Example

The examples that follow use this simple schema.

CREATE TABLE t(x TIME);
CREATE TABLE t1(y TIME);
INSERT INTO t VALUES('12:40:23');
INSERT INTO t VALUES('14:40:25');
INSERT INTO t VALUES('14:45:00');
INSERT INTO t VALUES('14:49:55');
INSERT INTO t1 VALUES('12:40:23');
INSERT INTO t1 VALUES('14:00:00');
COMMIT;

Normal Full Outer Join

=> SELECT * FROM t FULL OUTER JOIN t1 ON t.x = t1.y;

Notice the null rows from the non-preserved table:

    x     |    y
----------+----------
 12:40:23 | 12:40:23
 14:40:25 |
 14:45:00 |
 14:49:55 |
          | 14:00:00
(5 rows)

Full Outer Join with Interpolation

=> SELECT * FROM t FULL OUTER JOIN t1 ON t.x INTERPOLATE 
PREVIOUS VALUE t1.y;

In this case, the rows with no entry point are padded with values from the previous row.

    x     |    y
----------+----------
 12:40:23 | 12:40:23
 12:40:23 | 14:00:00
 14:40:25 | 14:00:00
 14:45:00 | 14:00:00
 14:49:55 | 14:00:00
(5 rows)

Normal Left Outer Join

=> SELECT * FROM t LEFT OUTER JOIN t1 ON t.x = t1.y;

Again, there are nulls in the non-preserved table

    x     |    y
----------+----------
 12:40:23 | 12:40:23
 14:40:25 |
 14:45:00 |
 14:49:55 |
(4 rows)

Left Outer Join with Interpolation

=> SELECT * FROM t LEFT OUTER JOIN t1 ON t.x INTERPOLATE
   PREVIOUS VALUE t1.y;

Nulls padded with interpolated values.

    x     |    y
----------+----------
 12:40:23 | 12:40:23
 14:40:25 | 14:00:00
 14:45:00 | 14:00:00
 14:49:55 | 14:00:00
(4 rows)

Inner Joins

For inner joins, there is no difference between a regular inner join and an event series inner join. Since null values are eliminated from the result set, there is nothing to interpolate.

A regular inner join returns only the single matching row at 12:40:23:

=> SELECT * FROM t INNER JOIN t1 ON t.x = t1.y;
    x     |    y
----------+----------
 12:40:23 | 12:40:23
(1 row)

An event series inner join finds the same single-matching row at 12:40:23:

=> SELECT * FROM t INNER JOIN t1 ON t.x INTERPOLATE 
PREVIOUS VALUE t1.y;
    x     |    y
----------+----------
 12:40:23 | 12:40:23
(1 row)

Semantics

When you write an event series join in place of normal join, values are evaluated as follows (using the schema in the above examples):

  • t is the outer, preserved table
  • t1 is the inner, non-preserved table
  • For each row in outer table t, the ON clause predicates are evaluated for each combination of each row in the inner table t1.
  • If the ON clause predicates evaluate to true for any combination of rows, those combination rows are produced at the output.
  • If the ON clause is false for all combinations, a single output row is produced with the values of the row from t along with the columns of t1 chosen from the row in t1 with the greatest t1.y value such that t1.y < t.x; If no such row is found, pad with nulls.

t LEFT OUTER JOIN t1 is equivalent to t1 RIGHT OUTER JOIN t.

In the case of a full outer join, all values from both tables are preserved.