Copying the Database to Another Cluster

It is important to secure backup locations and strictly limit access to backups to users who are already permitted to access all data in the database. Compromising a backup means compromising the database.

You can use vbr to copy the entire database to another Vertica cluster. This feature helps you perform tasks such as copying a database between a development and a production environment. Copying your database to another cluster is essentially a simultaneous backup and restore operation. The data is backed up from the source database cluster and restored to the destination cluster in a single operation.

The copycluster task is not compatible with HDFS Storage Locations. Copycluster uses the Linux rsync tool to copy files from the source cluster to the target cluster. HDFS storage backup and restore is based on use of snapshots. Data in an HDFS storage location is backed up to HDFS itself. Vertica cannot transfer data to a remote HDFS cluster the same way that it can for a Linux cluster.

The directory locations for the Vertica catalog, data, and temp directories must be identical on the source and target database. Use the following vsql query to view the source database directory locations. This example sets expanded display, for illustrative purposes, and lists the columns of most interest: node_name, storage_path, and storage_usage.

=> \x
Expanded display is on.
VMart=> select node_name,storage_path, storage_usage from disk_storage;
-[ RECORD 1 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/VMart/v_vmart_node0001_catalog/Catalog
storage_usage | CATALOG
-[ RECORD 2 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/VMart/v_vmart_node0001_data
storage_usage | DATA,TEMP
-[ RECORD 3 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | home/dbadmin/SSD/schemas
storage_usage | DATA
-[ RECORD 4 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/SSD/tables
storage_usage | DATA
-[ RECORD 5 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/SSD/schemas
storage_usage | DATA
-[ RECORD 6 ]-+-----------------------------------------------------
node_name     | v_vmart_node0002
storage_path  | /home/dbadmin/VMart/v_vmart_node0002_catalog/Catalog
storage_usage | CATALOG
-[ RECORD 7 ]-+-----------------------------------------------------
node_name     | v_vmart_node0002
storage_path  | /home/dbadmin/VMart/v_vmart_node0002_data
storage_usage | DATA,TEMP
-[ RECORD 8 ]-+-----------------------------------------------------
node_name     | v_vmart_node0002
storage_path  | /home/dbadmin/SSD/tables
storage_usage | DATA
.
.
.

Notice the directory paths for the Catalog, Data, and Temp storage. These paths are the same on all nodes in the source database and must be the same in the target database.

When copying a database to another cluster, if the target data is different, vbr overwrites all existing data. To retain existing data on the target cluster, create a full database backup of the target before invoking the copycluster vbr task.

Identifying Node Names for Target Cluster

Before you can configure the target cluster, you need to know the exact names that admintools supplied to all nodes in the source database.

To see the node names, run a query such as:

=> select node_name from nodes;
  node_name             
------------------
 v_vmart_node0001 
 v_vmart_node0002 
 v_vmart_node0003 
(3 rows)    

You can also find the node names by running admintools from the command line. For example, for the VMart database, you can enter a command such as:

$ /opt/vertica/bin/admintools -t node_map -d VMART 
DATABASE | NODENAME         | HOSTNAME 
-----------------------------------------------
VMART    | v_vmart_node0001 | 192.168.223.xx 
VMART    | v_vmart_node0002 | 192.168.223.yy 
VMART    | v_vmart_node0003 | 192.168.223.zz 

Configuring the Target Cluster

Configure the target to allow the source database to connect to it and restore the database. The target cluster must:

  • Have the same number of nodes as the source cluster.
  • Have a database with the same name as the source database. The target database can be completely empty.
  • Have the same node names as the source cluster. The node names listed in the NODES system tables on both clusters must match.
  • Be accessible from the source cluster.
  • Have the same database administrator account, and all nodes must allow a database administrator of the source cluster to log in through SSH without a password.

    Passwordless access within the cluster is not the same as passwordless access between clusters. The SSH ID of the administrator account on the source cluster and the target cluster are likely not the same. You must configure each host in the target cluster to accept the SSH authentication of the source cluster.

  • Have adequate disk space for the vbr --task copycluster command to complete.

Creating a Configuration File for CopyCluster

You must create a configuration file specifically for copying your database to another cluster. In the configuration file, specify the host names of nodes in the target cluster as the backup hosts. You must define backupHost; however, vbr ignores the backupDir option and always stores the data in the catalog and data directories of the target database.

You cannot use an object-level backup with the copycluster command. Instead, you must perform a full database backup.

The following example shows how you can set up vbr to copy a database on a 3-node cluster, v_vmart, to another cluster, test-host.

[Misc]
snapshotName = CopyVmart
tempDir = /tmp/vbr

[Database]
dbName = vmart
dbUser = dbadmin
dbPassword = password
dbPromptForPassword = False

[Transmission]
encrypt = False
port_rsync = 50000

[Mapping]
; backupDir is not used for cluster copy
v_vmart_node0001= test-host01
v_vmart_node0002= test-host02
v_vmart_node0003= test-host03

Copying the Database

You must stop the target cluster before you invoke copycluster.

To copy the cluster, run vbr from a node in the source database using the database administrator account:

$ vbr -t copycluster -c copycluster.ini 
Starting copy of database VMART.
Participating nodes: vmart_node0001, vmart_node0002, vmart_node0003, vmart_node0004.
Enter vertica password: 
Snapshotting database.
Snapshot complete.
Determining what data to copy.
[==================================================] 100%
Approximate bytes to copy: 987394852 of 987394852 total.
Syncing data to destination cluster.
[==================================================] 100%
Reinitializing destination catalog.
Copycluster complete!

If the copycluster task is interrupted, the destination cluster retains any data files that have already transferred. If you attempt the operation again. Vertica does not need to resend these files.