Eon Mode Architecture

Eon Mode separates the computational resources from the communal storage layer of your database. This separation gives you the ability to store your data in a single location (currently, only in S3 on AWS). You can elastically vary the number of nodes connected to that location according to your computational needs. Adjusting the size of your cluster does not interrupt analytic workloads.

The entire Eon Mode cluster, both storage and compute, resides in the cloud. Currently, the only cloud provider that Eon Mode supports is Amazon Web Services (AWS).

Eon Mode is suited to a range of needs and data volumes. Because compute and storage are separate, you can scale them separately.

Communal Storage

Instead of storing data locally, Eon Mode uses a single communal storage location for all data and the catalog (metadata). Communal storage is the database's centralized storage location, shared among the database nodes.

Communal storage has the following properties:

  • Communal storage in the cloud is more resilient and less susceptible to data loss due to storage failures than storage on disk on individual machines.
  • Any data can be read by any node using the same path.
  • Capacity is not limited by disk space on nodes.
  • Because data is stored communally, you can elastically scale your cluster to meet changing demands. If the data were stored locally on the nodes, adding or removing nodes would require moving significant amounts of data between nodes to either move it off of nodes that are being removed, or onto newly-created nodes.

Vertica currently supports communal storage in Amazon S3.

Communal storage locations are listed in the STORAGE_LOCATIONS system table with a SHARING_TYPE of COMMUNAL.

Within communal storage, data is divided into portions called shards. Nodes subscribe to particular shards, with subscriptions balanced among the nodes. See Shards and Subscriptions.

The Depot

One drawback of communal storage is its speed. Accessing data from a shared cloud location is slower than reading it from local disk. Also, the connection to communal storage can become a bottleneck if many nodes are reading data from it at once. To improve data access speed, the nodes in an Eon Mode database maintain a local disk cache of data called the depot. When executing a query, the nodes first check whether the data it needs is in the depot. If it is, then it finishes the query using the local copy of the data. If the data is not in the depot, the node fetches the data from communal storage, and saves a copy in the depot.

The node stores newly-loaded data in the depot before sending it to communal storage. See Loading Data below for more details.

By default, Vertica sets the maximum size of the depot to be 60% of the total disk space allocated to the filesystem that stores the depot. Vertica limits the size of the depot to a maximum of 80% of the filesystem that contains it. This upper limit ensures there is enough disk space for other uses, such as temporary files Vertica creates during data loads.

In versions of Vertica before 9.2.1, the default size of the depot was 80% of the filesystem's disk space.

Each node also stores a local copy of the database catalog.

Loading Data

In Eon Mode, COPY statements usually write to Read Optimized Store (ROS) files in a node's depot to improve performance. The COPY statement segments, sorts, and compresses for high optimization. Before the statement commits, Vertica ships the ROS files to communal storage. Eon Mode does not use the Write Optimized Storage (WOS).

Because a load is buffered in the depot on the node executing the load, the size of your depot limits the amount of data you can load in a single operation. Unless you perform multiple loads in parallel sessions, you are unlikely to encounter this limit.

If your data loads do overflow the amount of space in your database's depot, you can tell Vertica to bypass the depot and load data directly into communal storage. You enable direct writes to communal storage by setting the UseDepotForWrites configuration parameter to 0. See Eon Mode Parameters for more information. Once you have completed your large data load, switch this parameter back to 1 to re-enable writing to the depot.

At load time, files are written to the depot and synchronously sent to communal storage and all the nodes that subscribe to the shard in which the data is being loaded. This mechanism of sending data to peers at load time improves performance if a node goes down, because the cache of the peers who take over for the down node is already warm. The file compaction mechanism (mergeout) puts its output files into the cache and also uploads them to the communal storage.

The following diagram shows the flow of data during a COPY statement.

 

Querying Data

Vertica uses a slightly different process to plan queries in Eon Mode to incorporate the sharding mechanism and remote storage. Instead of using a fixed-segmentation scheme to distribute data to each node, Vertica uses the sharding mechanism to segment the data into a specific number of shards that at least one (and usually more) nodes subscribes to. When the optimizer selects a projection, the layout for the projection is determined by the participating subscriptions for the session. The optimizer generates query plans that are equivalent to those in Enterprise Mode. It selects one of the nodes that subscribes to each shard to participate in query execution.

Vertica first tries to use data in the depot to resolve a query. When the data in the depot cannot resolve the query, Vertica reads from the communal storage. You could see an impact on query performance when a substantial number of your queries read from the communal storage. If this is the case, then you should consider re-sizing your depot or use depot system tables to get a better idea of what is causing the issue. You can use ALTER_LOCATION_SIZE to change depot size.

Workload Isolation and Scaling

Eon Mode lets you define subclusters that divide up your nodes to isolate workloads from one another. You can also use subclusters to ensure that scaling down your cluster does not result in Vertica shutting down to maintain data integrity. See Subclusters for more information.