Creating Tables and Projections for use with the Routable Query API

For routable queries, the client needs to determine the appropriate node to get the data. The client does this by comparing all of the projections available for the table and determining the best projection to use to find the single node that contains data. You must create a projection segmented by the key column(s) on at least one table to take full advantage of the Routable Query API. Other tables which join to this table must either have an unsegmented projection, or a projection segmented as described below.

Note: Tables must be segmented by hash for Routable Queries. See Hash Segmentation Clause. Other segmentation types are not supported.

Creating Tables for use with Routable Queries

To create a table that can be used with the Routable Query API, segment (by hash) the table on a uniformly distributed column. Typically, you segment on a primary key. For faster lookups, sort the projection on the same columns on which you segmented. For example, to create a table that is well suited to Routable Queries:

CREATE TABLE users (
id INT NOT NULL PRIMARY KEY,
username VARCHAR(32),
email VARCHAR(64),
business_unit VARCHAR(16))
ORDER BY id
SEGMENTED BY HASH(id)
ALL NODES;

This table is segmented based on the id column (and ordered by id to make lookups faster). To build a query for this table using the Routable Query API, you only need to provide a single predicate for the id column which returns a single row when queried.

However, if you were to add multiple columns to the segmentation clause, such as this table:

CREATE TABLE users2 (
	id INT NOT NULL PRIMARY KEY,
	username VARCHAR(32),
	email VARCHAR(64),
	business_unit VARCHAR(16))
ORDER BY id, business_unit
SEGMENTED BY HASH(id, business_unit)
ALL NODES;

Then you would need to provide two predicates when querying the users2 table, since the segmentation clause uses both the id and the business_unit columns. However, if you know both id and business_unit when you perform the queries, then it is beneficial to segment on both columns, as it makes it easier for the client to determine that this projection is the best projection to use to determine the correct node.

Designing Tables for Single-node JOINs

If you plan to use the VerticaRoutableExecutor class and join tables during routable queries, then you must segment all tables being joined by the same segmentation key. Typically this key is a Primary/Foreign key on all the tables being joined. For example, the customer_key may be the primary key in a customers dimension table, and the same key is a foreign key in a sales fact table. Projections for a VerticaRoutableExecutor query using these tables must be segmented by hash on the customer key in each table.

If you want to join with small dimension tables, such as date dimensions, then it may be appropriate to make those tables unsegmented so that the date_dimension data exists on all nodes. It is important to note that when joining unsegmented tables, you still must specify a segmented table in the createRoutableExecutor() call.

Verifying Existing Projections for Tables

If you have existing tables that are already segmented by hash (for example, on an ID column), then you can determine what predicates are needed to query the table by using the select get_table_projections('tableName') command to view the projections associated with the table. The example table displays the following when select get_table_projections('users') is run:

Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
public.users_b1 [Segmented: Yes] [Seg Cols: "public.users.id"] [K: 1] [public.users_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
public.users_b0 [Segmented: Yes] [Seg Cols: "public.users.id"] [K: 1] [public.users_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]

Note that for each projection, only the "public.users.id" column is specified,meaning you need to provide a predicate for this column when you build your query.

If the table was segmented on multiple columns, for example id and business_unit, then you would need to provide both columns as predicates to the routable query.