Maintaining Data Integrity and High Availability in an Eon Mode Database

The nodes in your Eon Mode database's primary subclusters are responsible for maintaining the data in your database. They perform operations that require committing transactions, such as loading and deleting data. These nodes (collectively called the database's primary nodes) maintain the integrity of the data in your database. They can be spread across multiple primary subclusters. Their health is key to maintaining the data integrity in your database.

The nodes in secondary subclusters play no role in maintaining data. They cannot commit transactions, load data, or perform other data maintenance tasks. Your database cluster can lose all of its secondary nodes and not endanger the data integrity of your database.

Maintaining data integrity the is top goal of your database. If your database loses too many primary nodes, it cannot safely process data. In this case, it shuts down to prevent data inconsistency or corruption.

High availability (remaining running even if individual nodes are lost) is another goal of Vertica. To help it limit shutdowns due to the loss of primary nodes, it has data redundancy features. With these features enabled, your database continues to run even if it loses a primary node. In many cases, your database can continue to run even if it loses more than one primary node.

Eon Mode Databases and K-safety

K-safety is a measure of how resilient your database is to losing a primary node. Vertica recommends that your database always have a K-safety value of 1 (K=1). The value K is the number of redundant copies of metadata and subscriptions the primary nodes in your database cluster maintain. In a K=1 database, each primary node maintains a copy of another primary node's data and subscriptions in addition to its own (becoming the "buddy" of the other node). If a primary node fails or is shut down, the node with its redundant data takes over processing for it. Because this node is now performing the work of two nodes, your database's performance may suffer until you recover or replace the missing primary node.

In a K=1 database, the loss of a single primary node does not cause a shutdown. There is always a primary node able to fill in for the down node. The loss of an additional node could result in a database shutdown depending on which additional node fails. You should always replace or recover down primary nodes as fast as possible to prevent this possibility.

A database with a K-safety value of 1 may be able to continue running if more than one node fails. See below for details.

When Vertica Sets the K-safety Value in an Eon Mode Database

When you have three or more primary nodes in your database, Vertica automatically sets the database's K-safety to 1 (K=1). It also automatically configures shard subscriptions so that each node can act as a backup for another node.

This behavior is different than an Enterprise Mode database, where you must design your database's physical schema to meet several criteria before you can have Vertica mark the database as K-safe. See Difference Between Enterprise Mode and Eon Mode K-safe Designs below for details.

Databases with less than three primary nodes have no data redundancy (K=0). Vertica recommends you only use a database with less than three primary nodes for testing.

Primary Node Requirements for Database Operation

Because your database relies on its primary nodes to maintain data, there are several requirements they must meet for your database to continue running safely:

  • When your database's K-safety is set to 1 (K=1), there must be at least three primary nodes in your database cluster. Having at least three primary nodes in your database allows Vertica to maintain data integrity if a primary node goes down.

    If you have manually set the K-safe value to 2 (see Difference Between Enterprise Mode and Eon Mode K-safe Designs below) you must have at least 5 primary nodes.

    Vertica prevents you from removing primary nodes if your cluster would fall below the lower limit for your database's K-safety setting. If you want to remove nodes in a database at this lower limit, you must lower the K-safety level using the MARK_DESIGN_KSAFE function.

  • More than half (at least 50% plus one) of the database's primary nodes must be up. Having more than 50% of the primary nodes up is referred to as having a quorum of primary nodes.
  • Every shard in the database must have at least one primary node that is up subscribing to it. Having every shard with an active subscriber is called having full shard coverage. If your database has a shard with no active subscriber, the data in that shard is inaccessible.

If your database does not meet all of these conditions, it shuts down to prevent potential data inconsistency. These K-safety requirements are similar those of an Enterprise Mode database.

Because Vertica does not rely on them to maintain data, the nodes in secondary clusters have no impact on whether the database can continue running.

Critical Nodes and Subclusters

Vertica designates any node or subcluster in the database whose loss would cause a shutdown as critical. For example, in a K=1 database, if a node goes down, the node that Vertica maintains a list of critical nodes and subclusters in two system tables: CRITICAL_NODES and CRITICAL_SUBCLUSTERS. Before stopping nodes or subclusters, check these tables to ensure the node or subcluster you intend to stop is not critical.

Difference Between Enterprise Mode and Eon Mode K-safe Designs

In an Enterprise Mode database, you use the MARK_DESIGN_KSAFE function to enable high availability in your database. You call this function after you have designed your database's physical schema to meet all the requirements for K-safe design (often, by running the database designer). If you attempt to mark your database as K-safe when the physical schema does not support the level K-safety you pass to MARK_DESIGN_KSAFE, it returns an error. See Designing Segmented Projections for K-Safety for more information.

In Eon Mode, you do not need to use the MARK_DESIGN_KSAFE because Vertica automatically makes the database K-safe when you have three or more primary nodes. You can use this function to change the K-safety level of your database. In an Eon Mode database, this function changes how Vertica configures shard subscriptions. You can call MARK_DESIGN_KSAFE with any level of K-safety you want. It only has an effect when you call REBALANCE_SHARDS to update the shard subscriptions for the nodes in your database.

Usually, you do not use a K-safety value of higher than 1 in Eon Mode, as adding replacement nodes to a cluster is easy in a cloud environment.