Copying the Database to Another Cluster

You can use the vbr utility to copy the entire database to another Vertica cluster. This feature helps you perform tasks such as copying a database between a development and a production environment. Copying your database to another cluster is essentially a simultaneous backup and restore operation. The data is backed up from the source database cluster and restored to the destination cluster in a single operation.

Note: The copycluster task is not compatible with HDFS Storage Locations. Copycluster uses the Linux rsync command to copy files from the source cluster to the target cluster. HDFS storage backup and restore is based on use of snapshots. Data in an HDFS storage location is backed up to HDFS itself. Vertica cannot transfer data to a remote HDFS cluster the same way that it can for a Linux cluster.

The directory locations for the Vertica catalog , data, and temp directories must be identical on the source and target database. Use the following vsql query to view the source database directory locations. This example sets expanded display, for illustrative purposes, and lists the columns of most interest, node_name, storage_path, and storage_usage.

VMart=> \x
Expanded display is on.
VMart=> select node_name,storage_path, storage_usage from disk_storage;
-[ RECORD 1 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/VMart/v_vmart_node0001_catalog/Catalog
storage_usage | CATALOG
-[ RECORD 2 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/VMart/v_vmart_node0001_data
storage_usage | DATA,TEMP
-[ RECORD 3 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | home/dbadmin/SSD/schemas
storage_usage | DATA
-[ RECORD 4 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/SSD/tables
storage_usage | DATA
-[ RECORD 5 ]-+-----------------------------------------------------
node_name     | v_vmart_node0001
storage_path  | /home/dbadmin/SSD/schemas
storage_usage | DATA
-[ RECORD 6 ]-+-----------------------------------------------------
node_name     | v_vmart_node0002
storage_path  | /home/dbadmin/VMart/v_vmart_node0002_catalog/Catalog
storage_usage | CATALOG
-[ RECORD 7 ]-+-----------------------------------------------------
node_name     | v_vmart_node0002
storage_path  | /home/dbadmin/VMart/v_vmart_node0002_data
storage_usage | DATA,TEMP
-[ RECORD 8 ]-+-----------------------------------------------------
node_name     | v_vmart_node0002
storage_path  | /home/dbadmin/SSD/tables
storage_usage | DATA
.
.
.

Notice the directory paths for the Catalog, Data, and Tempstorage. These paths are the same on all nodes in the source database and must be the same in the target database.

Note: When copying a database to another cluster, if the target data is different, vbr overwrites all existing data. To retain existing data on the target cluster, create a full database backup of the target before invoking the copycluster vbr task.

Identifying Node Names for Target Cluster

Before you can configure the target cluster, you need to know the exact names that admintools supplied to all nodes in the source database.

To see the node names, run a query such as:

VMart=> select node_name from nodes;
  node_name             
------------------
 v_vmart_node0001 
 v_vmart_node0002 
 v_vmart_node0003 
(3 rows)    

You can also find the node names by running Admintools from the command line. For example, for the VMart database, you can enter a command such as:

$ /opt/vertica/bin/admintools -t node_map -d VMART 
DATABASE | NODENAME         | HOSTNAME 
-----------------------------------------------
VMART    | v_vmart_node0001 | 192.168.223.xx 
VMART    | v_vmart_node0002 | 192.168.223.yy 
VMART    | v_vmart_node0003 | 192.168.223.zz 

Configuring the Target Cluster

Configure the target to allow the source database to connect to it and restore the database. The target cluster must:

Creating a Configuration File for CopyCluster

You must create a configuration file specifically for copying your database to another cluster. In the configuration file, specify the host names of nodes in the target cluster as the backup hosts. When you use the copycluster command, the vbr requires that you define the backupHost. However, the vbr ignores the backupDir option, and always stores the data in the catalog and data directories of the target database.

You cannot use an object-level backup with the copycluster command. Instead, you must perform a full database backup.

The following example shows how you can set up the vbrto copy a database on a 3-node cluster, v_vmart, to another cluster, test-host.

[Misc]
snapshotName = CopyVmart
restorePointLimit = 5
tempDir = /tmp/vbr
retryCount = 5
retryDelay = 1

[Database]
dbName = vmart
dbUser = dbadmin
dbPassword = password
dbPromptForPassword = False

[Transmission]
encrypt = False
checksum = False
port_rsync = 50000

[Mapping]
; backupDir is not used for cluster copy
v_vmart_node0001= test-host01
v_vmart_node0002= test-host02
v_vmart_node0003= test-host03

Copying the Database

You must stop the target cluster before you invoke copycluster.

To copy the cluster, run vbr from a node in the source database using the database administrator account. The following example demonstrates how you can copy a cluster using a configuration file located in the current directory.

$ vbr -t copycluster -c copycluster.ini 
Starting copy of database VMART.
Participating nodes: vmart_node0001, vmart_node0002, vmart_node0003, vmart_node0004.
Enter vertica password: 
Snapshotting database.
Snapshot complete.
Determining what data to copy.
[==================================================] 100%
Approximate bytes to copy: 987394852 of 987394852 total.
Syncing data to destination cluster.
[==================================================] 100%
Reinitializing destination catalog.
Copycluster complete!

If the copy cluster task is interrupted, the destination cluster retains any data files that have already transferred. If you attempt the operation again.Vertica does not need to resend these files.

Related Tasks