Comparing Eon and Enterprise Modes

Vertica databases running in Eon and Enterprise modes store their data differently. Eon Mode databases use communal storage for their data. Enterprise Mode databases store data locally in the filesystem of nodes that make up the database. These different storage methods lead to a number of important differences between the two modes.

Performance

Eon Mode and Enterprise Mode databases have roughly the same performance in the same environment when properly configured.

Usually, an Eon Mode database's performance relies on each node having a local cache of data from the communal store that it uses when processing queries. When this cache (called the depot) contains the data the node needs to process queries, the Eon Mode database's performance is the same as an Enterprise Mode database, where each node stores a portion of the database locally. In both cases, the nodes are accessing locally-stored data to resolve queries. A depot that has the necessary data to process most queries is called a "warm" depot.

If the depot does not contain the data it needs to resolve a query, the node must retrieve the data from the communal store. In cloud environments, accessing the communal store has a performance penalty because cloud-based object stores such as Amazon's S3 have high latency. If the Eon Mode database has to access the communal store frequently, its query performance will be slower.

You will often see slower query performance when one or more nodes have a "cold" (empty) depot. The depot is cold when you start a new database cluster or subcluster. In this case, the cluster will have slower performance than a Enterprise Mode database would have under the same conditions as the nodes have to fetch data from the communal storage to process queries. As the depot is filled with this retrieved data, and the nodes have to make fewer fetches from the communal store, the database's query performance improves.

If you add additional nodes to an existing Eon Mode database cluster or subcluster, they delay processing queries while they perform a process called depot warming to avoid negatively impacting query performance due to their cold depots. Once they warm their depot, they begin taking part in queries.

You can see poor query performance in an Eon Mode database if you make its depot too small. A small depot increases the chance that a query will require data that is not in the depot. That results in nodes having to retrieve data from communal storage more frequently.

Some Vertica users compare the performance of their Eon Mode database and Enterprise Mode database performance conclude that Eon Mode has much worse performance. In these cases, they are usually comparing a cloud-based Eon Mode database to an on-premise Enterprise Mode database. Here, the performance difference isn't due to the different between the two database modes. Instead, it is due to the overall performance impact of a shared cloud-based virtual environment compared to on-premise dedicated hardware. An Enterprise Mode database running in the same cloud would have the same performance as the Eon Mode in most cases.

Installation

Eon Mode databases must have an object store to store its data communally. Enterprise Mode databases do not require any additional storage hardware beyond the hard disks installed in the nodes. Depending on the environment you've chosen for your Vertica database, this requirement may make your installation a bit more complex.

Because Enterprise Mode does not need additional hardware for data storage, it can be a bit simpler to install. An on-premise Eon Mode install needs additional hardware and additional configuration for the object store that provides the communal storage.

Because it does not need additional storage, Enterprise Mode is especially useful for development environments. You can even create a single-node Enterprise Mode database, either on physical hardware or on a virtual machine. You can download a pre-configured single-node Enterprise Mode virtual machine that is ready to run. See Downloading and Starting the Vertica Community Edition Virtual Machine for more information.

Installing an Eon Mode database in a cloud environment is usually simpler than an on premise install. The cloud environments provide their own object store for you. For example, when you install an Eon Mode database in Amazon's AWS, you just need to create an S3 bucket for the communal data store. You then provide the S3 URL to Vertica when creating the database. There is no need to install and configure a separate data store.

Installing an Enterprise Mode database in the cloud is similar to installing one on premise. The virtual machines you create in the cloud must have enough local storage to store your database's data.

Workload Isolation

You often want to prevent intensive workloads from interfering with other potentially time-sensitive workloads. For example, you may want to isolate ETL workloads from querying workloads. Groups of users that rely on real-time analytics can be isolated from groups that are running batched reports.

Eon Mode databases offer the best workload isolation option. It allows you to create groups of nodes called subclusters that isolate workloads. A query only runs on the nodes in a single subcluster. It does not affect nodes outside the subcluster. You can assign different groups of users a different subcluster to use.

In an Eon Mode database, subclusters and scalability work hand in hand. You often add, remove, stop, and start entire subclusters of nodes, rather than scaling nodes individually.

Enterprise Mode does not offer subclusters to isolate workloads. You can use features such as resource pools and other settings to give specific queries priority and access to more resources. However, these features do not truly isolate workloads as subclusters do. See Managing Workloads for an explanation of managing workloads using these features.

Scalability

You can scale a Vertica database by adding or removing nodes to meet changing analytic needs. Scalability is usually more important in cloud environments where you are paying by the hour for each node in your database. If your database isn't busy, there is no reason to have underused nodes costing you money. You can reduce the number of nodes in your database during quiet times (weekends and holidays, for example) to save money.

Scalability is usually less important for on-premise installations. There are limited additional costs involved in having nodes running when they are not fully in use.

An Enterprise Mode database scales less efficiently than an Eon Mode one. When an Enterprise Mode database scales, it must re-segment (rebalance) its data to be spread among the new number of nodes.

Rebalancing is an expensive operation. When scaling the database up, Vertica must break up files and physically move a percentage of the data from the original nodes to the new nodes. When scaling down, Vertica must move the data off of the nodes that are being removed and distribute it among the remaining nodes. The database is not available during rebalancing. This process can take 12, 24, or even 36 hours to complete, depending on the size of the database. After scaling up an Enterprise Mode database, queries should run faster because each node is responsible for less data. Therefore, each node has less work to do to process each query. Scaling down an Enterprise Mode database usually has the opposite effect—queries will run slower.

Eon Mode databases scale more efficiently because data storage is separate from the computing resources.

When you scale up an Eon Mode database, the database's data does not need to be resegmented. Instead, the additional nodes subscribe to preexisting segments (called shards) of data in communal storage. When expanding the cluster, Vertica rebalances the shards assigned to each node, rather than physically splitting the data storage and moving in between nodes. The new nodes prepare to process queries by retrieving data from the communal storage to fill their depots (a local cache of data from the communal storage). The database remains available while scaling and the process takes minutes rather than hours to complete.

Node subscriptions are slightly more complicated than shown in the previous diagram. To ensure K-Safety, each node actually subscribes to a second shard to act as a backup. See Shards and Subscriptions for details.

If the number of shards in the communal storage is equal to or higher than the new number of nodes (as shown in the previous diagram), then query performance improves after expanding the cluster. Each node is responsible for processing less data, so the same queries will run faster after you scale the cluster up.

If the number of shards in communal storage is smaller than the new number of nodes, multiple nodes subscribe to the same shard. In this case, individual query performance won't improve. However, query throughput improves—multiple queries can run simultaneously on separate sets of nodes, so more queries get processed by the database.

Vertica recommends that you use subclusters to group nodes so that no subcluster has more nodes than shards. See Subclusters for more information.

Having more nodes in a subcluster than the number of shards in the database reduces the efficiency of query throughput scaling. If you want to have more nodes in a subcluster than shards in your database, contact Vertica support to discuss configuration settings that can improve performance in this configuration.

The number of nodes in your Eon Mode database should always be a multiple of the number of shards, or the reverse: the number of shards should be a multiple of the number of nodes. A six-node database should have three, six, or twelve shards. Vertica recommends you never have more than two shards per node.

A mismatch between the number of shards and the number of nodes can impact performance. For example, if the cluster in the previous diagram were expanded from three to only five nodes, one shard would not have a second subscriber. This means the two new nodes would not be able to independently process a query to improve query throughput.

Scaling down an Eon Mode database works similarly. The remaining nodes in the database subscribe to any shards that are no longer subscribed to by the nodes being removed. This process is fast, and the database remains running while it is happening. Depending on the ratio of shards to nodes in the downsized cluster, query performance or query throughput will be lower.

Expandability

As you load more data into your database, you may eventually need to expand its data storage. Because Eon Mode databases separate compute from storage, you often expand its storage without changing the number of nodes.

In a cloud environment, you usually do not have a limit on storage. For example, an AWS S3 bucket can store as much data as you want. As long as you are willing to pay for additional storage charges, you do not have to worry about expanding your database's storage.

When you install Eon Mode on premise, how you expand storage depends on the object store you are using. For example, Pure Storage FlashBlades support hot plugging new blades to add additional storage. This feature lets you expand the storage in your Eon Mode database with no downtime.

In most cases, you usually query a subset of the data in your database (called the working data set). Eon Mode's decoupling of compute and storage let you size your compute (the number of nodes in your database) to the working data set and your desired performance rather than to the entire data set.

For example, if you are performing time series analysis in which the active data set is usually the last 30 days, you can size your cluster to manage 30 days' worth of data. Data older than 30 days simply grows in communal storage. The only reason you need to add more nodes to your Eon Mode database is to meet additional workloads. On the other hand, if you want very high performance on a small data set, you can add as many nodes as you need to obtain the performance you want.

In an Enterprise Mode database, nodes are responsible for storage as well as compute. Because of the tight coupling between compute and storage, the best way to expand storage in an Enterprise Mode database is to add new nodes. As mentioned in the Scalability section, adding nodes to an Enterprise Mode database requires rebalancing the existing data in the database.

Due to the disruption rebalancing causes to the database, you usually expand the storage in an Enterprise Mode database infrequently. When you do expand its storage, you usually add significant amounts of storage to allow for future growth.

Adding nodes to increase storage has the downside that you may be adding compute power to your cluster that isn't really necessary. For example, suppose you are performing time-series analysis that focuses on recent data and your current cluster offers you enough query performance to meet your needs. However, you need to add additional storage to keep historical data. In this case, adding new nodes to your database for additional storage adds computing power you really don't need. Your queries may run a bit faster. However, the slight benefit of faster results probably does not justify the costs of adding more computing power.