Shards and Subscriptions
In Eon Mode, Vertica stores data communally in a shared data storage location (for example, in S3 when running on AWS). All nodes are capable of accessing all of the data in the communal storage location. In order for nodes to divide the work of processing queries, Vertica must divide the data between them in some way. It breaks the data in communal storage into segments called shards. Each node in your database subscribes to a subset of the shards in the communal storage location. The shards in your communal storage location are similar to a collection of segmented projections in an Enterprise Mode database.
When K-Safety is 1 or higher (high availability), each shard has more than one node subscribing to it. One of the subscribers is responsible for executing queries involving the shard. The other subscribers act as backups. If the main subscriber goes down or is stopped, the another subscriber takes its place. See Maintaining Data Integrity and High Availability in an Eon Mode Database for more information.
Each shard in the database has a primary subscriber. This subscriber is a primary node that maintains the data in the shard by running Tuple Mover operations on it. See Tuple Mover for more information about these operations. If the primary subscriber node is stopped or goes down, Vertica chooses another primary node that subscribes to the shard as the shard's primary subscriber. If all of the primary nodes that subscribe to a shard go down or are stopped, your database shuts down to maintain data integrity. Any primary node that is the sole subscriber to a shard is a critical node. If that node goes down, then your database shuts down.
A special type of shard called a replica shard stores metadata for unsegmented projections. Replica shards exist on all nodes.
You define the number of shards when you create your database. For the best performance, the number of shards you choose should be no greater than 2× the number of nodes. At most, you should limit the shard-to-node ratio to no greater than 3:1. MC warns you to take all aspects of shard count into consideration. The number of shards should always be a multiple (or an even divisor) of the number of nodes in your database.
Once set, you cannot change the number of shards in your database. Therefore, you have to choose a shard count that allows both for initial performance while allowing room for future growth.
For efficiency, Vertica transfers metadata about shards directly between database nodes. This peer-to-peer transfer applies only to metadata; the actual data that is stored on each node gets copied from communal storage to the node's depot as needed.