Hash Segmentation Clause

Specifies how to segment projection data for distribution across some or all cluster nodes. You can specify segmentation for a table and a projection. If a table definition specifies segmentation, Vertica uses it for that table's auto-projections.

It is strongly recommended that you use Vertica's built-in HASH function, which distributes data evenly across the cluster, and facilitates optimal query execution.

Syntax

SEGMENTED BY expression { ALL NODES [ OFFSET offset ] | NODES node[,…] }

Parameters

SEGMENTED BY expression

A general SQL expression. Hash segmentation is the preferred method of segmentation. Vertica recommends using its built-in HASH function, whose arguments resolve to table columns. If you use an expression other than HASH, Vertica issues a warning.

The segmentation expression should specify columns with a large number of unique data values and acceptable skew in their data distribution. In general, primary key columns that meet these criteria are good candidates for hash segmentation.

For details, see Expression Requirements below.

ALL NODES

Automatically distributes data evenly across all nodes when the projection is created. Node ordering is fixed.

OFFSET offset

A zero-based offset that indicates on which node to start segmentation distribution.

This option is not valid for CREATE TABLE and CREATE TEMPORARY TABLE.

If you create a projection for a table with the OFFSET option, be sure to create enough copies of each projection segment to satisfy system K-safety; otherwise, Vertica regards the projection as unsafe and cannot use it to query the table.

You can ensure K-safety compliance when you create projections by combining OFFSET and KSAFE options in the CREATE PROJECTION statement. On executing this statement, Vertica automatically creates the necessary number of projection copies.

NODES node[,…]

Specifies a subset of the nodes in the cluster over which to distribute projection data. You can specify a node only once.

To obtain a list of all cluster nodes, query the system table V_CATALOG.NODES.

Expression Requirements

A segmentation expression must specify table columns as they are defined in the source table. Projection column names are not supported.

The following restrictions apply to segmentation expressions:

  • All leaf expressions must be constants or column references to a column in the CREATE PROJECTION 's SELECT list.
  • The expression must return the same value over the life of the database.
  • Aggregate functions are not allowed.
  • The expression must return non-negative INTEGER values in the range 0 <= x < 263, and values are generally distributed uniformly over that range.

    If the expression produces a value outside the expected range—for example, a negative value—no error occurs, and the row is added to the projection's first segment.

Examples

The following CREATE PROJECTION statement creates projection public.employee_dimension_super. It specifies to include all columns in table public.employee_dimension. The hash segmentation clause invokes the Vertica HASH function to segment projection data on the column employee_key; it also includes the ALL NODES clause, which specifies to distribute projection data evenly across all nodes in the cluster:

=> CREATE PROJECTION public.employee_dimension_super 
    AS SELECT * FROM public.employee_dimension
    ORDER BY employee_key
    SEGMENTED BY hash(employee_key) ALL NODES;