Data Integrity and High Availability in an Eon Mode Database

The nodes in your Eon Mode database's primary subclusters are responsible for maintaining the data in your database. These nodes (collectively called the database's primary nodes) plan the Tuple Mover mergeout operations that manage the data in the shards. They can also execute these operations if they are the best candidate to do so (see The Tuple Mover in Eon Mode Databases).

The primary nodes can be spread across multiple primary subclusters—they all work together to maintain the data in the shards. The health of the primary nodes is key for your database to continue running normally.

The nodes in secondary subclusters do not plan Tuple Mover operations. They can execute Tuple Mover mergeout operations if a primary node assigns it to them. Your database cluster can lose all of its secondary nodes and still maintain the data in the shards.

Maintaining data integrity the is top goal of your database. If your database loses too many primary nodes, it cannot safely process data. In this case, it goes into read-only mode to prevent data inconsistency or corruption.

High availability (having the database continue running even if individual nodes are lost) is another goal of Vertica. It has several redundancy features to help it prevent downtime. With these features enabled, your database continues to run even if it loses one or more primary nodes.

There are two requirements for the database to continue normal operations: maintaining quorum, and maintaining shard coverage.

Maintaining Quorum

The basic requirement for the primary nodes in your Eon Mode database is maintaining a quorum of primary nodes running at all times. To maintain quorum, more than half of the primary nodes (50% plus 1) must be up. For example, in a database with 6 primary nodes, at least 4 of them must be up. If half or more of the primary nodes are down, your database goes into read-only mode to prevent potential data integrity issues. In a database with 6 primary nodes, the database goes into read-only if it loses 3 or more of them. See Database Read-Only Mode below.

Vertica only counts the primary nodes that are currently part of the database when determining whether the database has quorum. Removing primary nodes cannot result in a loss of quorum. During the removal process, Vertica adjusts the node count to prevent the loss of quorum.

At a minimum, your Eon Mode database must have at least one primary node to function. In most cases, it needs more than one. See Minimum Node Requirements for Eon Mode Database Operation below.

Maintaining Shard Coverage

In order to continue to process data, your database must be able to maintain the data in its shards. To maintain the data, each shard must have a subscribing primary node that is responsible for running the Tuple Mover on it. This requirement is called having shard coverage. If one or more shards do not have a primary node maintaining its data, your database loses shard coverage and goes into read-only mode (explained below) to prevent possible data integrity issues.

The measure of how resilient your Eon Mode database is to losing a primary node is called its K-safety level. The value K is the number of redundant shard subscriptions your Eon Mode database cluster maintains. It also represents the number of primary nodes in your database that can fail and still be able to run safely. In many cases, your database can lose more than K nodes and still continue to run normally, as long as it maintains shard coverage.

Vertica recommends that your database always have a K-safety value of 1 (K=1). In a K=1 database, each shard has two subscribers: a primary subscriber that is responsible for the shard, and a secondary subscriber that can fill in if the primary subscriber is lost. The primary subscriber is responsible for running the Tuple Mover on the data in the shard. The secondary subscriber maintains a copy of the shard's catalog metadata. so it can fill in if the primary subscriber is lost.

If a shard's primary subscriber fails, the secondary subscriber fills in for it. Because it does not maintain a separate depot for its secondary subscription, the secondary subscriber always directly accesses the shard's data in communal storage. This direct access impacts your database's performance while a secondary subscriber fills in for a primary subscriber. For this reason, always restart or replace a down primary node as soon as possible.

With primary and secondary subscribers in a K=1 database, the loss of a single primary node does not affect the database's ability to process and maintain data. However, if the secondary subscriber fails while standing in for the primary subscriber, your database would lose shard coverage. and be forced to go into read-only mode.

Elastic K-safety

Vertica uses a feature called elastic K-safety to help limit the possibility of shard coverage loss. By default, if either the primary or secondary subscriber to a shard fails, Vertica subscribes an additional primary node to the shard. This subscription takes time to be established, as the newly-subscribed node must get a copy of the shard's metadata. If the shard's sole subscriber fails while the new subscriber is getting the shard's metadata, the database loses shard coverage and can shut down. Once the newly-subscribed node gets a copy of the metadata, it is able to take over maintenance of the shard in case the other subscriber fails. At this point, your database once again has two subscribers for the shard.

Once the down nodes recover and rejoin the subcluster, Vertica removes the subscriptions it added for elastic K-safety. Once all of the nodes rejoin the cluster, the shard subscriptions are the same as they were before the node loss.

With elastic K-safety, your database could manage to maintain shard coverage through the gradual loss of primary nodes, up to the point that it loses quorum. As long as there is enough time for newly-subscribed nodes to gather the shard's metadata, your database is able to maintain shard coverage. However, your database could still be forced into read-only mode due to loss of shard coverage if it lost the primary and secondary subscribers to a shard before a new primary node could complete the process of subscribing to the shard.

Vertica stops adding new subscriptions when your database gets close to losing quorum. It continues to add new subscriptions if your cluster has more than N ÷ 2 + K + 1 primary nodes up, where N is the total number of primary nodes in the database. For example, if you have 10 primary nodes in your K=1 database, Vertica adds new subscriptions as long as the number of primary nodes that are up is greater than 7 (10 ÷ 2 + 1 + 1). If the number of up primary nodes falls to 6 in this database, adding an additional subscription does not make sense. Losing another primary node would force the database to shut down due to a loss of quorum.

Database Read-Only Mode

If your database loses either quorum or primary shard coverage, it goes into read-only mode. This mode prevents potential data corruption that could result when too many nodes are down or unable to reach other nodes in the database. Read-only mode prevents changes being made to the database that require updates to the global catalog.

DML and DDL in Read-only Mode

In read-only mode, statements that change the global catalog (such as most DML and DDL statements) fail with an error message. For example, executing DDL statements such as CREATE TABLE while the database is in read-only mode results in the following error:

=> CREATE TABLE t (a INTEGER, b VARCHAR);
ERROR 10428:  Transaction commit aborted since the database is currently in read-only mode
HINT:  Commits will be restored when the database restores the quorum

DML statements such as COPY return a different error. Vertica stops them from executing before they perform any work:

=> COPY warehouse_dimension from stdin;
ERROR 10422:  Running DML statements is not possible in read-only mode
HINT:  Running DMLs will be restored when the database restores the quorum

By returning the error early, Vertica avoids performing all of the work required to load data, only to fail when it tries to commit the transaction.

DDL and DML statements that do not affect the global catalog still work. For example, you can create a local temporary table and load data into it while the database is in read-only mode.

Queries in Read-only Mode

Queries can run on any subcluster that has shard coverage. For example, suppose you have an Eon Mode database with a 3-node primary and a 3-node secondary subcluster. If two of the primary nodes go down, the database loses quorum and goes into read-only mode. The primary subcluster also loses shard coverage, because two of its nodes are down. The remaining node does not have a subscription to at least some of the shards. In this case, queries on the remaining primary node (except for some system table queries) always fail:

=> SELECT * FROM warehouse_dimension;
ERROR 9099:  Cannot find participating nodes to run the query

The secondary subcluster still has shard coverage so you can execute queries on it.

Monitoring Read-only Mode

Besides noticing DML and DDL statements returning errors, you can determine whether the database has gone into read-only mode by monitoring system tables:

  • The NODES system table has a column named is_readonly that becomes true for all nodes when the database is in read-only mode.

    => SELECT node_name, node_state, is_primary, is_ro, subcluster_name FROM nodes;
    
          node_name       | node_state | is_primary | is_readonly |  subcluster_name   
    ----------------------+------------+------------+-------------+--------------------
     v_verticadb_node0001 | UP         | t          | t           | default_subcluster
     v_verticadb_node0002 | DOWN       | t          | t           | default_subcluster
     v_verticadb_node0003 | DOWN       | t          | t           | default_subcluster
     v_verticadb_node0004 | UP         | f          | t           | analytics
     v_verticadb_node0005 | UP         | f          | t           | analytics
     v_verticadb_node0006 | UP         | f          | t           | analytics
    (6 rows)
  • When the database goes into read-only mode, every node that is still up in the database records a Cluster Read-only event (event code 20). You can find these events by querying the event monitoring system tables such as ACTIVE_EVENTS:

    => \x
    Expanded display is on.
    
    => SELECT * FROM ACTIVE_EVENTS WHERE event_code = 20;
    -[ RECORD 1 ]-------------+--------------------------------------------------------------------------
    node_name                 | v_verticadb_node0001
    event_code                | 20
    event_id                  | 0
    event_severity            | Critical
    event_posted_timestamp    | 2021-11-22 15:57:24.514475+00
    event_expiration          | 2089-12-10 19:11:31.514475+00
    event_code_description    | Cluster Read-only
    event_problem_description | Cluster cannot perform updates due to quorum loss and can only be queried
    reporting_node            | v_verticadb_node0001
    event_sent_to_channels    | Vertica Log
    event_posted_count        | 1
    -[ RECORD 2 ]-------------+--------------------------------------------------------------------------
    node_name                 | v_verticadb_node0004
    event_code                | 20
    event_id                  | 0
    event_severity            | Critical
    event_posted_timestamp    | 2021-11-22 15:57:24.515022+00
    event_expiration          | 2089-12-10 19:11:31.515022+00
    event_code_description    | Cluster Read-only
    event_problem_description | Cluster cannot perform updates due to quorum loss and can only be queried
    reporting_node            | v_verticadb_node0004
    event_sent_to_channels    | Vertica Log
    event_posted_count        | 1
    -[ RECORD 3 ]-------------+--------------------------------------------------------------------------
    node_name                 | v_verticadb_node0005
    event_code                | 20
    event_id                  | 0
    event_severity            | Critical
    event_posted_timestamp    | 2021-11-22 15:57:24.515019+00
    event_expiration          | 2089-12-10 19:11:31.515019+00
    event_code_description    | Cluster Read-only
    event_problem_description | Cluster cannot perform updates due to quorum loss and can only be queried
    reporting_node            | v_verticadb_node0005
    event_sent_to_channels    | Vertica Log
    event_posted_count        | 1
    -[ RECORD 4 ]-------------+--------------------------------------------------------------------------
    node_name                 | v_verticadb_node0006
    event_code                | 20
    event_id                  | 0
    event_severity            | Critical
    event_posted_timestamp    | 2021-11-22 15:57:24.515172+00
    event_expiration          | 2089-12-10 19:11:31.515172+00
    event_code_description    | Cluster Read-only
    event_problem_description | Cluster cannot perform updates due to quorum loss and can only be queried
    reporting_node            | v_verticadb_node0006
    event_sent_to_channels    | Vertica Log
    event_posted_count        | 1
    

    See Monitoring Events

Recover From Read-Only Mode

To recover from read-only mode, restart the down nodes. Restarting the nodes resolves loss of quorum or loss of primary shard coverage that caused the database to go into read-only mode.

Once the down nodes restart and rejoin the database, Vertica restarts on the nodes that were in read-only mode. This step is necessary to allow the nodes to resubscribe to their shards. During this restart, client connections to these nodes will drop. For example, users connected via vsql to one of the nodes where Vertica is restarting see the message:

server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Users using vsql to connect to nodes as Vertica restarts see the message:

vsql: FATAL 4149:  Node startup/recovery in progress. Not yet ready 
to accept connections

Once Vertica restarts on the nodes, the database resumes normal operation.

When Vertica Sets the K-safety Value in an Eon Mode Database

When you have three or more primary nodes in your database, Vertica automatically sets the database's K-safety to 1 (K=1). It also automatically configures shard subscriptions so that each node can act as a backup for another node, as described in Maintaining Shard Coverage.

This behavior is different than an Enterprise Mode database, where you must design your database's physical schema to meet several criteria before you can have Vertica mark the database as K-safe. See Difference Between Enterprise Mode and Eon Mode K-safe Designs below for details.

Databases with less than three primary nodes have no data redundancy (K=0). Vertica recommends you only use a database with less than three primary nodes for testing.

Minimum Node Requirements for Eon Mode Database Operation

The K-safety level of your database determines the minimum number of primary nodes it must have:

  • When K=0, your database must have at least 1 primary node. Setting K to 0 allows you to have a single-node Eon Mode database. Note that in a K=0 database, the loss of a primary node will result in the database going into read-only mode.

  • When K=1 (the most common case), your database must have at least three primary nodes. This number of primary nodes allows Vertica to maintain data integrity if a primary node goes down.

  • If you want to manually set the K-safe value to 2 (see Difference Between Enterprise Mode and Eon Mode K-safe Designs below) you must have at least 5 primary nodes.

Vertica prevents you from removing primary nodes if your cluster would fall below the lower limit for your database's K-safety setting. If you want to remove nodes in a database at this lower limit, you must lower the K-safety level using the MARK_DESIGN_KSAFE function and then call REBALANCE_SHARDS.

Critical Nodes and Subclusters

Vertica designates any node or subcluster in the database whose loss would cause the database to go into read-only mode as critical. For example, in an Eon Mode database, when a primary node goes down, nodes with secondary subscriptions to its shards take over maintaining the shards' data. These nodes become critical. Their loss would cause the database to lose shard coverage and be forced to go into read-only mode.

When elastic K-safety is enabled (which is the default) Vertica subscribes additional primary nodes to a down primary node's shards. After these nodes finish subscribing by getting a copy of the shard's metadata, they are ready to fill in if the secondary subscriber also goes down. When this happens, the node filling in for the down node is no longer considered critical.

Vertica maintains a list of critical nodes and subclusters in two system tables: CRITICAL_NODES and CRITICAL_SUBCLUSTERS. Before stopping nodes or subclusters, check these tables to ensure the node or subcluster you intend to stop is not critical.

Difference Between Enterprise Mode and Eon Mode K-safe Designs

In an Enterprise Mode database, you use the MARK_DESIGN_KSAFE function to enable high availability in your database. You call this function after you have designed your database's physical schema to meet all the requirements for K-safe design (often, by running the database designer). If you attempt to mark your database as K-safe when the physical schema does not support the level K-safety you pass to MARK_DESIGN_KSAFE, it returns an error. See Designing Segmented Projections for K-Safety for more information.

In Eon Mode, you do not need to use the MARK_DESIGN_KSAFE because Vertica automatically makes the database K-safe when you have three or more primary nodes. You can use this function to change the K-safety level of your database. In an Eon Mode database, this function changes how Vertica configures shard subscriptions. You can call MARK_DESIGN_KSAFE with any level of K-safety you want. It only has an effect when you call REBALANCE_SHARDS to update the shard subscriptions for the nodes in your database.

Usually, you do not use a K-safety value of greater than 1 in a cloud-based Eon Mode database. Adding replacement nodes to a cluster is easy in a cloud environment.