Understanding Vertica Eon Mode
This blog post is updated as of July 2018.
With Eon mode, Vertica is a database that takes advantage of all elastic compute and scalable storage capabilities in the cloud. In Eon mode, Vertica can take advantage of cloud economics – it can be made to scale and shrink rapidly in response to changing workloads, and separates the size of your data from the compute you want to dedicate to running queries over it. Eon mode is also built to handle failure gracefully, even in volatile cloud environments. And, we’re committed to helping users get started quickly, using Management Console. Currently, Eon mode works on AWS.
Eon Mode versus Enterprise ModeThe traditional mode of Vertica is known as Enterprise mode. Let’s take a closer look at understanding the differences between the two modes.
In Enterprise mode, the nodes in your cluster own the data. If you lose a copy of your data, you can restore it from another node, known as the buddy node. However, if you lose all nodes with a copy, there is data loss. Nodes are responsible for durability within the cluster.
In Eon mode, the primary copy is available on shared storage. The local cache serves as the secondary copy. This means that adding and removing nodes does not redistribute the primary copy. This shared storage model enables elasticity, meaning it is both time and cost effective to adapt the cluster resources to fit the usage pattern of the cluster. Node failure and recovery is simplified in Eon mode. If a node goes down, other nodes are not impacted because of shared storage. Node restarts are fast and no recovery is needed.
Using this model, you have flexibility. You can have heavy storage with minimal compute, or small storage with heavy compute. It’s up to you and your workload needs.
While most of the traditional features of Enterprise mode are supported in Eon mode, there are a few features and operations that are not supported including:
• Backup and restore
• WOS and the moveout operation
• Prejoin projections
• Elastic cluster, including local segmentation, scaling factor, rebalance cluster, and rebalance table
• Standby nodes
• Grouped ROS
Eon Mode ArchitectureThe optimizer and execution engine that make Vertica great are largely untouched in Eon mode. Under the hood, Vertica uses a storage API to interact with the underlying storage.
You can easily load data from S3. The data is written to the depot, where a peer-to-peer fill step sends the output data file to other nodes. Concurrently, the node uploads the file to shared storage. After these two operations are complete, the transaction can be committed.
In Eon mode, projections are not mapped to nodes directly. Data loaded into projections is instead mapped to shards (also known as segments in Vertica). During data load, nodes store the data if they serve a shard. This flexibility allows node set changes with less overhead.
The catalog in Eon mode is broken into shards. The node maintains metadata for a subset of shards based on the mapping of which nodes have subscribed to which shards, also known as node subscriptions. These are automatically managed while running queries.
At the time of database creation, you specify how many shards are created. Once this value is defined, it cannot be changed.
There are two types of shards. The segment shards are for segmented tables and are distributed across the cluster. The replica shards are for maintaining metadata for unsegmented tables. The replica shard lives on all nodes. If the node count is the same as the shard count, each node services a shard.
Why Eon Mode?There are many reasons you may choose to operate your Vertica database in Eon mode. Let’s take a closer look at two examples.
Suppose you need to maintain large volumes of historical data on shared storage, while running a query workload on only recent data, stored in the local depot. Eon mode is perfect for this. You only need to recover the depot if there is a data loss. And, there is a much faster node recovery during startup or when you add more nodes. You can keep all your historical data without compromising the query performance on recent data.
Suppose you need to do operational dashboarding – that is scale your cluster based on your workload needs. During a peak workload, you can achieve higher throughput for dashboard queries by adding more nodes to the cluster. During low workload times, you can save resources by scaling down the cluster. This will also reduce the financial burden.