Reviving an Eon Mode Database Cluster

If you have terminated your Eon Mode database's cluster, but have not deleted the database's communal storage, you can revive your database. Reviving the database restores it to its pre-shutdown state. The revival process requires creating a new database cluster and configuring it to use the database's communal storage location. See Stopping, Starting, Terminating, and Reviving Eon Mode Database Clusters for more information.

You can also use the revive process to restart a database when its nodes do not have persistent local storage. You may choose to configure your node's instances in your cloud-based Eon Mode cluster with non-persistent local storage to reduce cost. Cloud providers such as AWS and GCP charge less for instances when they are not required to retain data when you shut them down.

You revive a database using either the Management Console or admintools. The MC and admintools offer different revival methods:

  • The MC always revives onto a newly-provision cluster that it creates itself. It cannot revive onto an existing cluster. Use the MC to revive a database when you do not have a cluster already provisioned for your database.
  • admintools only revives onto an existing database cluster. You can manually create a cluster to revive your database. See Installing Manually.

    You can also revive a database whose hosts use instance storage where data is not persistently stored between shutdowns. In this case, admintools treats the existing database cluster as a new cluster, because the hosts do not contain the database's catalog data.

  • Currently, only admintools lets you revive just the primary subclusters in a database cluster. This option is useful if you want to revive the minimum number of nodes necessary to start your database. See Reviving Only Primary Subclusters below.

    The MC always revives the entire database cluster.

You cannot revive a database from a communal storage location that is currently running on another cluster. The revive process fails if it detects that there is a cluster already running the database. Having two instances of a database running on separate clusters using the same communal storage location leads to data corruption.

Reviving Using the Management Console

You can use a wizard in the Management Console to provision a new cluster and revive a database onto it from a browser. For details, see:

Revive Using admintools

You can use admintools to revive your Eon Mode database on an existing cluster.

Cluster Requirements

This existing cluster must:

  • Have the same version (or later version) of Vertica installed on it. You can repurpose an existing Vertica cluster whose database you have shut down. Another option is to create a cluster from scratch by manually installing Vertica (see Installing Manually).
  • Contain a number of hosts in the cluster that is equal to or greater than either:
    • The total number of nodes that the database cluster had when it shut down.
    • The total number of primary nodes the database cluster had when it shut down. When you supply a cluster that matches the number of primary nodes in the database, admintools revives just the primary nodes.

When reviving, you supply admintools with a list of the hosts in the cluster to revive the database onto. The number of hosts in this list must match either the total number of nodes or the number of primary nodes in the database when it shut down. If the number of nodes you supply does not match either of these values, admintools returns an error.

You do not need to use all of the hosts in the cluster to revive the database. You can revive a database onto a subset of the hosts in the cluster. But you must have at least enough hosts to revive all of the primary nodes.

For example, suppose you want to revive a database that had 16 nodes when it was shut down, with four of those nodes being primary nodes. In that case, you can revive:

  • Just the primary nodes onto a cluster that contains at least four nodes.
  • All of the 16 nodes onto a cluster that contains at least 16 nodes.

You may choose to revive your database onto a cluster with more nodes that is necessary in cases where you want to quickly add new nodes. You may also want to revive just the primary nodes in a database onto a larger cluster. In this case, you can use the extra nodes in the cluster to start one or more secondary subclusters.

Required Database Information

To revive the database, you must know:

  • The name of the database to revive (note that the database name is case sensitive)
  • The version of Vertica that created the database, so you can use the same or later version
  • The total number of all nodes or the number of primary nodes in the database when it shut down
  • The URL and credentials for the database's communal storage location
  • The user name and password of the database administrator
  • The IP addresses of all hosts in the cluster you want to revive onto

If you do not know what version of Vertica created the database or are unsure how many nodes it had, see Getting Database Details From a Communal Storage Location below.

Required Database Settings

Before starting the revive process, verify the following conditions are true for your Eon Mode environment:

Eon environment Revived database requirements
All
  • The uppermost directories of the catalog, data, and depot directories on all nodes exist and are owned by the database dbadmin
  • The cluster has no other database running on it
Azure

If your database does not use Azure managed identities to authenticate with the communal storage blob container, the following values must be set:

  • AzureStorageCredentials
  • AzureStorageEndpointConfig

See Azure Blob Storage Object Store for details.

S3: AWS, on‑premises

The following configuration parameters are set:

If migrating to an on-premises database, set configuration parameter AWSEnableHttps to be compatible with the database TLS setup: AWSEnableHttps=1 if using TLS, otherwise 0. If settings are incompatible, the migration returns with an error.

GCP

The following configuration parameters are set:

Getting Database Details From a Communal Storage Location

To revive a database, you must know:

  • The version of Vertica that created it (so you can use the same or a later version)
  • The total number of nodes (when reviving both primary and secondary nodes) or primary nodes (when just reviving the primary nodes) in the database's cluster when it shut down.

If you do not know these details, you can determine them based on the contents of the communal storage location.

If you are not sure which version of Vertica created the database stored in a communal storage location, examine the cluster_config.json file. This file is stored in the communal storage location in the folder named metadata/databasename. For example, suppose you have a database named mydb stored in the communal storage location s3://mybucket/mydb. Then you can download and examine the file s3://mybucket/mydb/metadata/mydb/cluster_config.json.

In the cluster_config.json, the Vertica version that created the database is stored with the JSON key named DatabaseVersion near the top of the file:

{
   "CatalogTruncationVersion" : 804,
   "ClusterLeaseExpiration" : "2020-12-21 21:52:31.005936",
   "Database" : {
      "branch" : "",
      "name" : "verticadb"
   },
   "DatabaseVersion" : "v10.1.0",
   "GlobalSettings" : {
      "TupleMoverServices" : -33,
      "appliedUpgrades" : [
 . . . 

In this example, you can revive the storage location using Vertica version 10.1.0 or later.

If you do not know how many nodes or primary nodes the cluster had when it shut down, use the --display-only option of the admintools revive_db tool. Adding this option prevents admintools from reviving the database. Instead, it validates the files in the communal storage and reports details about the nodes that made up the database cluster. Parts of this report show the total number of nodes in the cluster and the number of primary nodes:

$ admintools -t revive_db --display-only --communal-storage-location \ 
             s3://mybucket/verticadb -d verticadb
Attempting to retrieve file: [s3://mybucket/verticadb/metadata/verticadb/cluster_config.json]

Validated 6-node database verticadb defined at communal storage s3://mybucket/verticadb.

Expected layout of database after reviving from communal storage: s3://mybucket/verticadb

== Communal location details: ==
{
 "communal_storage_url": "s3://mybucket/verticadb",
 "num_shards": "3",
 "depot_path": "/vertica/data",
   . . .
]

Number of primary nodes: 3

You can use grep to find just the relevant lines in the report:

$ admintools -t revive_db --display-only --communal-storage-location \
             s3://mybucket/verticadb -d verticadb | grep  'Validated\|primary nodes'
Validated 6-node database verticadb defined at communal storage s3://mybucket/verticadb.
Number of primary nodes: 3

Creating a Parameter File

For Eon Mode deployments that are not on AWS, you must create a configuration file to pass the parameters listed in the table in the previous section to admintools. Traditionally this file is named auth_params.conf although you can choose any file name you want.

For on-premises Eon Mode databases, this parameter file is the same one you used when initially installing the database. See the following links for instructions on creating a parameter file for the communal storage solution you are using for your database:

For databases running on Microsoft Azure, the parameter file is only necessary if your database does not use managed identities. This file is the same format that you use to manually install an Eon Mode database. See Manually Creating an Eon Mode Database on Azure for more information.

To revive an Eon Mode database on GCP manually, create a configuration file to hold the GCSAuth parameter and optionally, the GCSEnableHttp parameter.

You must supply the GCSAuth parameter to enable Vertica to read from the communal storage location stored in GCS. The value for this parameter is the HMAC access key and secret:

GCSAuth = HMAC_access_key:HMAC_secret_key

See Creating an HMAC Key for more information about HMAC keys.

If your Eon Mode database does not use encryption when accessing communal storage on GCS, then disable HTTPS access by adding the following line to auth_params.conf:

GCSEnableHttps = 0

Running the revive_db Tool

Use the admintools revive_db tool to revive the database:

  1. Use SSH to access a cluster host as an administrator.
  2. Depending on your environment, run one of the following admintools command lines:
    • AWS:

      $ admintools -t revive_db \
       --communal-storage-location=s3://communal_store_path \
       -s host1,... -d database_name
    • For on premises and other environments, run the command:
      $ admintools -t revive_db -x auth_params.conf \
        --communal-storage-location=storage-schema://communal_store_path \ 
        -s host1_ip,... -d database_name

This example revives a six-node on-premises database:

$ admintools -t revive_db -x auth_params.conf \
   --communal-storage-location=s3://mybucket/mydir \
   -s 172.16.116.27,172.16.116.28,172.16.116.29,172.16.116.30,\
   172.16.116.31,172.16.116.32 -d VMart

The following example demonstrates reviving a three-node database hosted on GCP:

$ admintools -t revive_db -x auth_params.conf \
--communal-storage-location gs://mybucket/verticadb \
-s 10.142.0.35,10.142.0.38,10.142.0.39 -d VerticaDB

Attempting to retrieve file: 
   [gs://mybucket/verticadb/metadata/VerticaDB/cluster_config.json]
Validated 3-node database VerticaDB defined at communal storage 
  gs://mybucket/verticadb .
Cluster lease has expired.
Preparation succeeded all hosts
Calculated necessary addresses for all nodes.
Starting to bootstrap nodes. Please wait, databases with a large 
  catalog may take a while to initialize.
>>Calling bootstrap on node v_verticadb_node0002 (10.142.0.38)
>>Calling bootstrap on node v_verticadb_node0003 (10.142.0.39)
Load Remote Catalog succeeded on all hosts
Database revived successfully.

Reviving Only Primary Subclusters

You can revive just the primary subclusters in an Eon Mode database. Make the list of hosts you pass to the admintools revive_db tool's --hosts (or -s) argument match the number of primary nodes that were in the database when it shut down. For example, if you have a six-node Eon Mode database that had three primary nodes, you can revive just the primary nodes by supplying three hosts in the --hosts argument:

$ admintools -t revive_db --communal-storage-location=s3://verticadb -d verticadb \
             -x auth_params.conf --hosts node01,node02,node03
Attempting to retrieve file: [s3://verticadb/metadata/verticadb/cluster_config.json]
Consider reviving to only primary nodes: communal storage indicates 6 nodes, while 
  3 nodes were specified

Validated 3-node database verticadb defined at communal storage s3://verticadb.
Cluster lease has expired.
Preparation succeeded all hosts

Calculated necessary addresses for all nodes.
Starting to bootstrap nodes. Please wait, databases with a large catalog may take a 
  while to initialize.
>>Calling bootstrap on node v_verticadb_node0002 (192.168.56.103)
>>Calling bootstrap on node v_verticadb_node0003 (192.168.56.104)
Load Remote Catalog succeeded on all hosts

Database revived successfully.

In a database where you have revived only the primary nodes, the secondary nodes are down. Their IP address is set to 0.0.0.0 so they are not part of the database. For example, querying the NODES system table in the database revived in the previous example shows the secondary nodes are all down:

=> SELECT node_name,node_state,node_address,subcluster_name FROM NODES;
      node_name       | node_state |  node_address  |  subcluster_name   
----------------------+------------+----------------+--------------------
 v_verticadb_node0001 | UP         | 192.168.56.102 | default_subcluster
 v_verticadb_node0002 | UP         | 192.168.56.103 | default_subcluster
 v_verticadb_node0003 | UP         | 192.168.56.104 | default_subcluster
 v_verticadb_node0004 | DOWN       | 0.0.0.0        | analytics
 v_verticadb_node0005 | DOWN       | 0.0.0.0        | analytics
 v_verticadb_node0006 | DOWN       | 0.0.0.0        | analytics

Secondary nodes that have not been revived may cause error messages if your database has the large cluster feature enabled. (See Large Cluster for more information about the large cluster feature.)

For example, adding a node to a secondary subcluster can fail if the new node would be assigned a control node that has not been revived. In this case, Vertica reports that adding the node failed because the control node has an invalid IP address.

If you encounter errors involving control nodes with invalid IP addresses, consider reviving the unrevived secondary subcluster, as explained below.

Because Vertica considers these unrevived nodes to be down, it may not allow you to remove them or remove their subcluster while they are in their unrevived state. The best way to remove the nodes or the secondary subcluster is to revive them first.

Reviving Unrevived Secondary Subclusters

If you revived just the primary subclusters in your database, you can later choose to revive some or all of the secondary subclusters. Your cluster must have hosts that are not nodes in the database that Vertica can use to revive the unrevived nodes. If your cluster does not have enough of these non-node hosts, you can add more hosts. See Adding Hosts to a Cluster.

You revive a secondary subcluster by using the admintools' restart_subcluster tool. You supply it with the list of hosts in the --hosts argument where the nodes will be revived. The number of hosts in this list must match the number of nodes in the subcluster. You must revive all nodes in the subcluster at the same time. If you pass restart_subcluster a list with fewer or more hosts than the number of nodes defined in the subcluster, it returns an error.

The follow example demonstrates reviving the secondary subcluster named analytics shown in the previous examples.

$ admintools -t restart_subcluster -d verticadb --hosts node04,node05,node06 \
             -p 'password' -c analytics
Updating hostnames of nodes in subcluster analytics.
	Replicating configuration to all nodes
	Generating new configuration information and reloading spread
Hostnames of nodes in subcluster analytics updated successfully.
*** Restarting subcluster for database verticadb ***
	Restarting host [192.168.56.105] with catalog [v_verticadb_node0004_catalog]
	Restarting host [192.168.56.106] with catalog [v_verticadb_node0005_catalog]
	Restarting host [192.168.56.107] with catalog [v_verticadb_node0006_catalog]
	Issuing multi-node restart
	Starting nodes: 
		v_verticadb_node0004 (192.168.56.105)
		v_verticadb_node0005 (192.168.56.106)
		v_verticadb_node0006 (192.168.56.107)
	Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
	Node Status: v_verticadb_node0004: (DOWN) v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN) 
	Node Status: v_verticadb_node0004: (DOWN) v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN) 
	Node Status: v_verticadb_node0004: (DOWN) v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN) 
	Node Status: v_verticadb_node0004: (INITIALIZING) v_verticadb_node0005: (INITIALIZING) v_verticadb_node0006: (INITIALIZING) 
	Node Status: v_verticadb_node0004: (UP) v_verticadb_node0005: (UP) v_verticadb_node0006: (UP) 
Syncing catalog on verticadb with 2000 attempts.

See Also