Range Joins

Vertica provides performance optimizations for <, <=, >, >=, and BETWEEN predicates in join ON clauses. These optimizations are particularly useful when a column from one table is restricted to be in a range specified by two columns of another table.

Key Ranges

Multiple, consecutive key values can map to the same dimension values. Consider, for example, a table of IPv4 addresses and their owners. Because large subnets (ranges) of IP addresses could belong to the same owner, this dimension can be represented as:

=> CREATE TABLE ip_owners(
      ip_start INTEGER,
      ip_end INTEGER,
      owner_id INTEGER);
=> CREATE TABLE clicks(
      ip_owners INTEGER, 
      dest_ip INTEGER);

A query that associates a click stream with its destination can use a join similar to the following, which takes advantage of the range optimization:

=> SELECT owner_id, COUNT(*) FROM clicks JOIN ip_owners
   ON clicks.dest_ip BETWEEN ip_start AND ip_end
   GROUP BY owner_id;

Requirements

Operators <, <=, >, >=, or BETWEEN must appear as top-level conjunctive predicates for range join optimization to be effective, as shown in the following examples:

Notes