Projection Segmentation
You can define a projection to maintain its data on the cluster in two ways:
- Divided into multiple segments, or segmented projections
- Undivided storage units, or unsegmented projections
Segmented Projections
You typically create segmented projections for large fact tables. Vertica splits segmented projections into chunks (segments) of similar size and distributes these segments evenly across the cluster. System K-safety determines how many duplicates (buddies) of each segment are created and maintained on different nodes.
You create segmented projections with a CREATE PROJECTION
statement that includes a SEGMENTED BY
clause.
Projection segmentation achieves the following goals:
- Ensures high availability and recovery.
- Spreads the query execution workload across multiple nodes.
- Allows each node to be optimized for different query workloads.
Hash Segmentation
Vertica uses hash segmentation to segment large projections. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns typically meet these criteria, so they are often used as hash function arguments.
Unsegmented Projections
In many cases, dimension tables are relatively small, so you do not need to segment them. Accordingly, you should design a K-safe database so projections for its dimension tables are replicated without segmentation on all cluster nodes. You create unsegmented projections with a CREATE PROJECTION
statement that includes the clause UNSEGMENTED ALL NODES
. This clause specifies to create identical instances of the projection on all cluster nodes.