Sizing your Vertica Cluster for Eon Mode

Running Vertica in Eon Mode decouples the cluster size from the data volume and lets you configure for your compute needs independently from your storage needs.

Before You Begin

Vertica in Eon Mode works on Amazon Web Services (AWS). As a Vertica administrator setting up your production cluster running in Eon Mode, you must make important decisions about choosing the correct EC2 instances and cluster size to meet your needs. This topic provides guidelines and best practices for selecting instance types and cluster sizes for a Vertica database running in Eon Mode. It assumes that you have a basic understanding of the Eon Mode architecture and Eon mode concepts like communal storage, depot, and shards. If you need to refresh your understanding, see Eon Mode Architecture.

Cluster Sizing Guidelines

As a starting point, consider how Enterprise Mode clusters are sized. In Enterprise Mode, sizing your cluster depends a lot on the total compressed data size. Most Vertica implementations have a 2:1 compression or better on disk. For the number of nodes in your cluster, divide the total compressed data size by the storage capacity of each node. Vertica recommends that you store no more than 10 TB of compressed data per node.

Depending on the complexity of your workload and expected concurrency, choose instance types that have sufficient CPU and memory. For production clusters, Vertica recommends a minimum of 16 cores, 128 GB RAM, and a minimum of 3 nodes for high availability.

In Eon Mode, communal storage on AWS S3 is like a data lake that can store unlimited data. Communal storage holds the durable copy of the database. Unfortunately, AWS S3 is not always fast. To compensate for S3’s lack of performance, a Vertica node in Eon Mode must have local storage for managing depot (data cache), catalog, and temp data. Sizing for Eon Mode depends on the following factors:

  • Working Data Size: The amount of data on which most of your queries will operate, often called "hot" data.
  • Depot Location: To get the fastest response time for frequently executed queries, provision a depot large enough to hold your working data set. When loading data, Vertica writes uncommitted ROS files into the depot before uploading the files to communal storage by default. If the free space in the depot is not sufficient, Vertica evicts files from the depot to make space for new files. Alternatively, data can be loaded directly into S3. The amount of data concurrently loaded into Vertica cannot be larger than the sum of depot location sizes across all nodes.

    In addition to storing the depot's files, Vertica uses the filesystem that stores the depot for temporary storage. By default, 60% of the filesystem is set aside for depot use, leaving thre rest for temporary files. You can change the disk space allocated to the depot (see ALTER_LOCATION_SIZE). However, Vertica prevents you from allocating more than 80% of the filesystem for the depot to make sure it has room for temporary files.

  • Data Location: Data location is used for data files that belong to temporary tables and temporary data from sort operators that spill to disk. When loading data into Vertica, the sort operator may spill to disk, and depending on the size of the load, Vertica may perform the sort in multiple merge phases. The amount of data concurrently loaded into Vertica cannot be larger than the sum of temp location sizes across all nodes divided by 2.
  • Catalog Location. The catalog size depends on the number of database objects per shard and the number of shard subscriptions per node.
  • Local disk sizing recommendation: Vertica recommends a minimum local storage capacity of 2 TB per node, out of which 60% must be reserved for the depot and the other 40% can be shared between the catalog and data location.
  • Concurrency and throughput scaling: You can choose the instance type based on the complexity of queries in your workload and the expected concurrency. In Eon Mode, you can increase throughput by adding more nodes to your cluster. Then, remove the nodes to save money when demand goes down. We refer to adding and removing nodes according to workload demand as "elastic throughput scaling." Should the increase in demand involve different kinds of workload, for example different queries accessing different data, you will want to consider creating a subcluster for the new nodes. To create a subcluster, you need to define a fault group. For best results, the number of nodes in a subcluster should be a multiple of the number of shards or should be modulus=0 of the number of shards. That is, every node should be responsible for the same number of shards.

To choose an instance type and the number of nodes for the Vertica cluster running in Eon mode, you must know what your working data set is. For general operation, the number of shards that you choose at database creation determines the maximum number of compute nodes that execute your queries in parallel. For this reason, shard count should be determined by considering how many nodes are needed to process the most complex queries in the target timeframe.

The number of shards cannot be changed in a Vertica database running in Eon Mode. Vertica recommends that you select your shard count based on the following table. It recommends the number of initial nodes and shards based on the working data size:

Cluster Type Working Data Size Number of Shards Initial Node Count
Small Up to 24 TB 6 3
Medium Up to 48 TB 12 6
Large Up to 96 TB 24 12
Extra large Up to 192 TB 48 24

A 2:1 ratio for shards:nodes is a performance recommendation, rather than a hard limit. If you attempt to go higher than 3:1, MC offers a warning to make sure you have taken all aspects of shard count into consideration because, once set, the shard count cannot be changed.

Planning for Scaling Your Cluster

The number of shards you choose for your database can impact your ability to scale your database in the future. You have two options when growing your cluster above the initial cluster size shown in the previous table. Which you choose depends on the types of queries you run:

  • To increase the performance of complex (analytic) queries, add nodes in addition to the initial node count shown in the previous table. These additional nodes usually improve the overall performance of these complex queries by splitting the load across more compute nodes.
  • To increase the throughput of multiple short-term queries (often called "dashboard queries"), improve your cluster's parallelism by adding nodes and assigning them to a subcluster. Subclusters can work independently and in parallel on these shorter queries.

These two approaches have an impact on the number of shards you choose to start your database with. Complex analytic queries perform better on clusters with more nodes, which means that 6 nodes with 6 shards perform better than 3 nodes and 6 shards. Dashboard-type queries operating on smaller data sets may not see much difference between 3 nodes with 6 shards and 6 nodes with 6 shards.

In general, choose a shard count that matches your expected working data size 6–12 months in the future.

However, as your shard count increases, inter-node communication and catalog complexity also increase. The user should be conservative when deciding.

Choosing Instance Types

Choose instance types that support ephemeral instance storage or EBS volumes for your depot, depending on cost and availability. It is not mandatory to have an EBS-backed depot, because in Eon Mode, a copy of the data is safely stored in communal storage. Vertica recommends either r4 or i3 instances for production clusters.

The following table provides information to help you make a decision on how to pick instances with ephemeral instance storage or EBS only storage. Check with AWS for the latest cost per hour.

Storage Type Instance Type Pros/Cons
Instance storage i3.8xlarge

Instance storage offers better performance than EBS attached storage through multiple EBS volumes. Instance storage can be striped (RAIDed) together to increase throughput and load balance I/O.

Data stored in instance-store volumes is not persistent through instance steps, terminations, or hardware failures.

EBS-only storage

r4.8xlarge with 600 GB

EBS volume attached

Newer instance types from AWS have only the EBS option. In most AWS regions, it's easier to provision a large number of instances.

You can terminate an instance but leave the EBS volume around for faster revive.

Take advantage of full-volume encryption.

If you select instances that use instance store, if you then terminate those instances there is the potential for data loss. For Eon mode, MC displays an alert to inform the user of the potential data loss when terminating instances that support instance store.

Use Cases

Let’s look at some use cases to learn how to size your Eon Mode cluster to meet your own particular requirements.

Use Case 1: Save compute by provisioning close to need, rather than peak times

This use case highlights the elastic throughput scaling (ETS) feature of Eon Mode to scale a cluster from 6 to 18 nodes with 3 subclusters of 6 nodes each. In this use case, you need to support a high concurrent, short query workload on a 24 TB or less working data set. You create an initial cluster with 6 nodes and 6 shards. You scale out throughput on demand by adding one or more subclusters during certain days of the week or for specific date ranges when you are expecting a peak load. The cluster can then be shrunk back to its initial size by dropping nodes or subclusters to meet the requirements of normal workloads. With Vertica in Eon Mode, you save compute by provisioning close to the need, rather than provisioning for the peak times.

With Eon Mode, it's easy and quick to add and remove nodes from the cluster because it does not require reshuffling data via rebalancing, as you must do in Vertica Enterprise Mode.

Use Case 2: Complex analytic workload requires more compute nodes

This use case showcases the idea that complex analytic workloads on large working data sets benefit from high shard count and node count. You create an initial cluster of type large with 24 nodes and 24 shards. As needed, you can add and remove nodes to improve throughput scaling.

Use Case 3: Workload isolation

This use case showcases the idea of having separate subclusters to isolate ETL and report workloads. You create an initial cluster with 6 nodes and 6 shards for servicing queries, and add another 6-node subcluster for supporting ETL workloads. You may need to configure the network load balancer from AWS to separate the ETL workload from SELECT queries. Workload isolation can also be useful for isolating different users with varying Vertica skills.

Use Case 4: Shrink your cluster to save costs

This example showcases the idea that you can shrink your cluster by half during off-peak hours to save costs. To shrink the cluster size, drop nodes from the cluster. Vertica automatically rebalances the shards among the remaining nodes. When you shrink the cluster to a size smaller than the initial cluster (number of nodes = number of shards), the nodes may subscribe to more than two shards and have the following impact:

  • The catalog size is larger because nodes are subscribing to more shards.
  • The depot is shared by more shard subscriptions, which may lead to the evictions of files.
  • Each node processes more data, which may have performance impact on queries.

For more information, see Using Eon Mode.