Comparing Eon and Enterprise Modes

Vertica databases running in Eon and Enterprise modes store their data differently:

  • Eon Mode databases use communal storage for their data.
  • Enterprise Mode databases store data locally in the file system of nodes that make up the database.

These different storage methods lead to a number of important differences between the two modes.

Performance

Eon Mode and Enterprise Mode databases have roughly the same performance in the same environment when properly configured.

Usually, an Eon Mode database's performance relies on each node having a local cache of data from the communal store that it uses when processing queries. When this cache (called the depot) contains the data the node needs to process queries, the Eon Mode database's performance is the same as an Enterprise Mode database, where each node stores a portion of the database locally. In both cases, the nodes are accessing locally-stored data to resolve queries. A depot that has the necessary data to process most queries is called a "warm" depot.

If the depot does not contain the data it needs to resolve a query, the node must retrieve the data from the communal store. In cloud environments, accessing the communal store has a performance penalty because cloud-based object stores such as Amazon's S3 have high latency. If the Eon Mode database has to access the communal store frequently, its query performance will be slower.

You will often see slower query performance when one or more nodes have a "cold" (empty) depot. The depot is cold when you start a new database cluster or subcluster, or add or restart nodes. In this case, the cluster will have slower performance than a Enterprise Mode database would have under the same conditions because the nodes have to fetch data from the communal storage to process queries. As the depot fills with this retrieved data, and the nodes have to make fewer fetches from the communal store, the database's query performance improves.

To limit the impact of cold depots, newly added or restarted nodes perform a process called depot warming. These nodes load relevant data from communal storage into their depots before they need it. They determine what data to load based on the current contents of other node's depots. Once they have warmed their depots, they rely less on fetching data from communal storage to process queries.

You can see poor query performance in an Eon Mode database if you make its depot too small. A small depot increases the chance that a query will require data that is not in the depot. That results in nodes having to retrieve data from communal storage more frequently.

Some Vertica users compare the performance of their Eon Mode database and Enterprise Mode database performance conclude that Eon Mode has much worse performance. In these cases, they are usually comparing a cloud-based Eon Mode database to an on-premises Enterprise Mode database. Here, the performance difference isn't due to the different between the two database modes. Instead, it is due to the overall performance impact of a shared cloud-based virtual environment compared to on-premises dedicated hardware. An Enterprise Mode database running in the same cloud would have the same performance as the Eon Mode in most cases.

Installation

An Eon Mode database must have an object store to store its data communally. An Enterprise Mode database does not require any additional storage hardware beyond the storage installed on its nodes. Depending on the environment  you've chosen for your Vertica database (especially if you are installing on-premises), the need to configure an object store may make your installation a bit more complex.

Because Enterprise Mode does not need additional hardware for data storage, it can be a bit simpler to install. An on-premises Eon Mode install needs additional hardware and additional configuration for the object store that provides the communal storage.

Enterprise Mode is especially useful for development environments because it does not require additional hardware beyond the nodes you use to run it. You can even create a single-node Enterprise Mode database, either on physical hardware or on a virtual machine. You can download a preconfigured single-node Enterprise Mode virtual machine that is ready to run. See Downloading and Starting the Vertica Community Edition Virtual Machine for more information.

Installing an Eon Mode database in a cloud environment is usually simpler than an on-premises install. The cloud environments provide their own object store for you. For example, when you install an Eon Mode database in Amazon's AWS, you just need to create an S3 bucket for the communal data store. You then provide the S3 URL to Vertica when creating the database. There is no need to install and configure a separate data store.

Installing an Enterprise Mode database in the cloud is similar to installing one on-premises. The virtual machines you create in the cloud must have enough local storage to store your database's data.

Workload Isolation

You often want to prevent intensive workloads from interfering with other potentially time-sensitive workloads. For example, you may want to isolate ETL workloads from querying workloads. Groups of users that rely on real-time analytics can be isolated from groups that are running batched reports.

Eon Mode databases offer the best workload isolation option. It allows you to create groups of nodes called subclusters that isolate workloads. A query only runs on the nodes in a single subcluster. It does not affect nodes outside the subcluster. You can assign different groups of users a different subcluster to use.

In an Eon Mode database, subclusters and scalability work hand in hand. You often add, remove, stop, and start entire subclusters of nodes, rather than scaling nodes individually.

Enterprise Mode does not offer subclusters to isolate workloads. You can use features such as resource pools and other settings to give specific queries priority and access to more resources. However, these features do not truly isolate workloads as subclusters do. See Managing Workloads for an explanation of managing workloads using these features.

Scalability

You can scale a Vertica database by adding or removing nodes to meet changing analytic needs. Scalability is usually more important in cloud environments where you are paying by the hour for each node in your database. If your database isn't busy, there is no reason to have underused nodes costing you money. You can reduce the number of nodes in your database during quiet times (weekends and holidays, for example) to save money.

Scalability is usually less important for on-premises installations. There are limited additional costs involved in having nodes running when they are not fully in use.

An Enterprise Mode database scales less efficiently than an Eon Mode one. When an Enterprise Mode database scales, it must re-segment (rebalance) its data to be spread among the new number of nodes.

Rebalancing is an expensive operation. When scaling the database up, Vertica must break up files and physically move a percentage of the data from the original nodes to the new nodes. When scaling down, Vertica must move the data off of the nodes that are being removed and distribute it among the remaining nodes. The database is not available during rebalancing. This process can take 12, 24, or even 36 hours to complete, depending on the size of the database. After scaling up an Enterprise Mode database, queries should run faster because each node is responsible for less data. Therefore, each node has less work to do to process each query. Scaling down an Enterprise Mode database usually has the opposite effect—queries will run slower. See Elastic Cluster for more information on scaling an Enterprise Mode database.

Eon Mode databases scale more efficiently because data storage is separate from the computing resources.

When you scale up an Eon Mode database, the database's data does not need to be re-segmented. Instead, the additional nodes subscribe to preexisting segments (called shards) of data in communal storage. When expanding the cluster, Vertica rebalances the shards assigned to each node, rather than physically splitting the data storage and moving in between nodes. The new nodes prepare to process queries by retrieving data from the communal storage to fill their depots (a local cache of data from the communal storage). The database remains available while scaling and the process takes minutes rather than hours to complete.

Node subscriptions are slightly more complicated than shown in the previous diagram. To ensure K-Safety, each node actually subscribes to a second shard (or shards, if there are more shards than nodes) to act as a backup in case the node subscribing to those shards goes down. See Shards and Subscriptions for details.

If the number of shards in the communal storage is equal to or higher than the new number of nodes (as shown in the previous diagram), then query performance improves after expanding the cluster. Each node is responsible for processing less data, so the same queries will run faster after you scale the cluster up.

You can also scale your database up to improve query throughput. Query throughput improves the number of queries processed by your database in parallel. You usually care about query throughput when your workload contains many, shorter-running queries ("dashboard queries"). To improve throughput, add more nodes to your database in a new subcluster. The subcluster isolates queries run by clients connected to it from the other nodes in the database. Subclusters work independently and in parallel. Isolating the workloads means that your database runs more queries simultaneously.

If a subcluster contains more nodes than the number of shards in communal storage, multiple nodes subscribe to the same shard. In this case, Vertica uses a feature called elastic crunch scaling to execute the query faster. Vertica divides the responsibility for the data in each shard between the subscribing nodes. Each node only needs to process a subset of the data in the shard it subscribes to. Having less data to process means that each node usually finishes its part of the query faster. This often translates into the query finishing its executing sooner.

Always make the number of nodes in your Eon Mode subclusters a multiple of the number of shards in the database, or an even divisor of the number of shards. For example, in a six-shard database, your subclusters should have three, six, or twelve shards. Vertica recommends you never have more than two shards per node.

A mismatch between the number of shards and the number of nodes can impact performance. For example, suppose you have a six-shard database. If you expand a subcluster from three to five nodes, one node would still be the only subscriber for two shards. This means that node has to do twice the work of the other nodes in the subcluster during queries. In this case, you see no benefit from adding the two new nodes, because the node subscribing to two shards becomes a bottleneck.

Scaling down an Eon Mode database works similarly. Shutting down entire subclusters reduced your database's query throughput. If you remove nodes from a subcluster, the remaining nodes subscribe to any shards that do not have a subscriber. This process is fast, and the database remains running while it is happening.

Expandability

As you load more data into your database, you may eventually need to expand its data storage. Because Eon Mode databases separate compute from storage, you often expand its storage without changing the number of nodes.

In a cloud environment, you usually do not have a limit on storage. For example, an AWS S3 bucket can store as much data as you want. As long as you are willing to pay for additional storage charges, you do not have to worry about expanding your database's storage.

When you install Eon Mode on-premises, how you expand storage depends on the object store you are using. For example, Pure Storage FlashBlades support hot plugging new blades to add additional storage. This feature lets you expand the storage in your Eon Mode database with no downtime.

In most cases, you usually query a subset of the data in your database (called the working data set). Eon Mode's decoupling of compute and storage let you size your compute (the number of nodes in your database) to the working data set and your desired performance rather than to the entire data set.

For example, if you are performing time series analysis in which the active data set is usually the last 30 days, you can size your cluster to manage 30 days' worth of data. Data older than 30 days simply grows in communal storage. The only reason you need to add more nodes to your Eon Mode database is to meet additional workloads. On the other hand, if you want very high performance on a small data set, you can add as many nodes as you need to obtain the performance you want.

In an Enterprise Mode database, nodes are responsible for storage as well as compute. Because of the tight coupling between compute and storage, the best way to expand storage in an Enterprise Mode database is to add new nodes. As mentioned in the Scalability section, adding nodes to an Enterprise Mode database requires rebalancing the existing data in the database.

Due to the disruption rebalancing causes to the database, you usually expand the storage in an Enterprise Mode database infrequently. When you do expand its storage, you usually add significant amounts of storage to allow for future growth.

Adding nodes to increase storage has the downside that you may be adding compute power to your cluster that isn't really necessary. For example, suppose you are performing time-series analysis that focuses on recent data and your current cluster offers you enough query performance to meet your needs. However, you need to add additional storage to keep historical data. In this case, adding new nodes to your database for additional storage adds computing power you really don't need. Your queries may run a bit faster. However, the slight benefit of faster results probably does not justify the costs of adding more computing power.