Share this article:

Vertica on Amazon Web Services Backup and Restore Guide

To read this article in PDF format, click here.

Introduction

This document outlines three best-practice approaches for backing up and restoring Vertica clusters on Amazon Web Services (AWS) and the advance preparation required for each.

The examples in this document recover a sample database, VMart, from multiple failures.

To complete the steps in this guide, you should have an Vertica cluster on AWS with data on it.

For more information about backups, visit the Backing Up and Restoring Vertica Databases section of the Vertica documentation.

K-Safe Cluster Configuration

You can protect against isolated node failures with a K-safe cluster configuration. A K-safe cluster stores buddy data on other nodes in the cluster to prevent data loss if a node fails. For a cluster to be K-safe, it must consist of at least 3 nodes.

To learn more about K-safe cluster configurations, visit the Designing For K-Safety section of the Vertica documentation.

Recover from a K-Safe Failure

To recover from an isolated node failure, your cluster must still be in the UP state. Recovering from a K-safe failure does not require a backup.

If a node goes down and you cannot re-connect to it, you must re-create the node. Follow these steps to re-create a source node and recover from a K-safe failure:

 If your primary node fails, you must reassign your Elastic IP, and then copy your .pem key file to one of your other running nodes to recover.

Create a Target Node

Create a target AWS instance ensuring the following:

  • Subnet, network, and VPC—Your target instance must be in the same subnet, network, and VPC. It must also have the same network configurations as the nodes in your source cluster.
  • Cluster placement and availability zone—Your target instance must be in the same cluster placement group and availability zone as the nodes in your source cluster.
  • IP address—The internal IP address of your target instance must be the same as the IP address of the source node. Use the Network Interfaces option during instance creation to assign internal IP addresses.
  • Version compatibility—Your target instance must use the same Vertica AMI version and hotfix version as the nodes in your source cluster.
  • Instance type—Your target instance must use the same instance type as the nodes in your source cluster.

Recover a Node

  1. On all your nodes, delete the .pem key information for the failed source node in the following locations:
    • /root/.ssh/known_hosts
    • /home/dbadmin/.ssh/known_hosts
  2. On your main node, run the install script install_vertica, specifying:
    • your own key file
    • -Y option as point-to-point
    • dba user password disabled
    • dba user dbadmin:
    sudo /opt/vertica/sbin/install_vertica -i ~/userkey.pem -Y --point-to-point \
    --dba-user-password-disabled --dba-user dbadmin
  3. Connect to your target node with SSH, and configure its storage to match your source cluster's storage.
  4. Configure your target node:
    1. Create an empty catalog and data directory matching the source node:

      mkdir -p /vertica/data/VMart/v_vmart_node0003_catalog
    2. Change the owner of the catalog and data directory to verticadba:

      sudo chown dbadmin:verticadba /vertica/data/VMart
  5. Restart the target node using admintools, specifying the IP address of your target node and your database name:

    admintools -t restart_node -s 10.0.10.15 -d VMart

Your target node will now recover using data from its buddy, making your cluster K-safe once again. Depending on the size of your database, recovery may take some time.

Full Backup

A full backup captures a complete image of your database at a specific point in time. This option is the safest and most stable backup approach. A full backup lets you recover from a non K-safe loss, such as a multi-node failure or a total cluster failure.

AWSclusterFullBackup1.png

Storage Considerations

When you choose storage for your full backup, be aware of these considerations:
  • If you are using ephemeral storage, you must perform a full backup.
  • You must use EBS storage for the backup volume.

To perform a full backup, you need the following:

  • Properly formatted and mounted backup volumes
  • An AWS snapshot of your backup volumes
  • A backup configuration file

Find more information about full backups in the Types of Backups section of the Vertica documentation.

Related Topics

The process of creating a full backup on AWS requires the following tasks:

Prepare Backup Volumes

  1. Find the size of the data catalog on each source node:

    df -h /vertica/data/ 

    Look for the number in the Used column.

  2. Find the largest data catalog of all the nodes in your source cluster, and add a 20–50% safety margin.
  3. Create and mount a new EBS volume to each source node. This volume should equal your largest data catalog with its safety margin.
  4. Verify that your newly mounted volumes appear on each source node:

    ls /dev
  5. Format all volumes, using an Vertica supported file system:

    sudo mkfs.ext4 /dev/xvdf
  6. Create the backup folder. Specify the location where you mount the new backup volume:

    sudo mkdir /vertica/backup 
  7. Mount the volume to /vertica/backup, and make it persistent:

    sudo mount /vertica/backup
    sudo bash -c "echo '/dev/xvdf /vertica/backup ext4 defaults 0 0' > /etc/fstab"
  8. Set dbadmin as the owner for all /vertica/backup:

    sudo chown dbadmin:verticadba /vertica/backup
  9. Verify the success of the mount operation by entering:

    df -h
  10. Repeat the formatting and mounting steps on all nodes.

Create a Backup Configuration File

Create a Backup configuration file by running vbr.py with the --setupconfig option:

/opt/vertica/bin/vbr.py --config-file vmart_backup.ini --task backup

For more information about creating a backup configuration file, see Creating vbr.py Configuration Files in the Vertica documentation.

Perform a Full Backup

  1. Run vbr.py, specifying the --config-file and --task backup options:

    /opt/vertica/bin/vbr.py --config-file vmart_backup.ini  --task backup 
    If your configuration file relies on a password file, you may need to copy your password file to each of your nodes to run vbr.py.
  2. Verify that vbr.py completes without errors, and then make a snapshot of the backup volume on each of your nodes.
  3. Save your admintools configuration file, which has information on your node IP addresses and mapping:

    /opt/vertica/conf/admintools.conf

The snapshots, your backup configuration file, and your admintools configuration file make up your complete initial backup. Later in time, you may make an incremental backup by repeating steps 1 and 2.

  • Incremental snapshots can take different amounts of time compared to the initial snapshot, because only the differences from the first snapshot are saved. If the differences are few, subsequent backups and snapshots can be quite fast.
  • Snapshots are taken asynchronously. Therefore, you can continue to write the next backup as the previous snapshot completes.

For more information on EBS snapshots, visit the AWS documentation.

See the Creating Full and Incremental Backups section of the Vertica documentation for additional information on incremental backups.

Restore from a Full Backup

If a cluster fails, use your backup configuration file and backup volume snapshots to restore your cluster from your last full backup.

Related Topics

The process of restoring from a full backup on AWS requires the following tasks:

Create a Target Cluster

Create a target cluster on AWS. Your new instances must match your source cluster in the following ways:

  • Network and VPC—The target instances must be in the same network and VPC as each other. However, they do not have to be in the same network and VPC as your source cluster.
  • Number of instances and nodes—You must use the same number of instances and nodes as your source cluster.
  • IP addresses—The internal IP addresses of the target nodes must be the same as the internal IP addresses of the nodes in the source cluster.
  • Version compatibility—Your target cluster must be running the same AMI and Vertica versions and using the same hotfix version as your source cluster.

Your new instances may differ from the source cluster in the following ways:

  • Your target cluster may be in a different availability zone.
  • You may use a different instance type for your target cluster.

Restore a Database from a Full Backup

After you create a target cluster, you must restore the database that was on the source cluster:

  1. Using your source cluster backup snapshots, create and attach one volume for each node in your cluster and attach them to the nodes in your target cluster. Verify that the snapshots from the source cluster match with their respective nodes on the target cluster.
    You must use the correct device mapping. For example, a backup snapshot taken for node 1, must be recreated in the new cluster at node 1.
  2. Mount the backup location on all nodes of the target cluster with the same file path as your source cluster:

    sudo bash -c "echo '/dev/xvdf /vertica/backup ext4 defaults 0 0' > /etc/fstab" 
    sudo mkdir /vertica/backup
    sudo mount /vertica/backup
  3. Verify the success of your mounting operation by checking for data in your backup folder:

    ls /vertica/backup/
  4. Using your admintools.conf file backup as a reference, create an empty database on the target cluster with the same dbadmin username, password, data path, and database name as your source database.
  5. Stop the database, if it is running.
  6. Run a restore operation:

    /opt/vertica/bin/vbr.py --config-file vmart_backup.ini --task restore
  7. Start the database to conclude the restoration process.

Hard-Link Backup on AWS RAID-0

The way you perform a hard-link backup on AWS differs from doing a hard-link backup procedure on traditional bare metal Vertica installations. The use of a software RAID-0 device requires a different approach for AWS. Very small timing differences occur when taking the snapshot of the EBS volumes that make up RAID-0 devices. These timing differences can cause inconsistencies that make the backup invalid.

Before performing a hard-link backup, you must freeze or unmount the RAID-0 file system.

You can choose between two options for performing a hard-link backup:

  • Stop the cluster with admintools.
  • Freeze the cluster with the fsfreeze command.
You cannot perform a hard-link backup on AWS RAID-0 if your installation uses ephemeral volumes.

Hard-Link Backup with admintools

One way you can perform a hard-link backup is to stop the cluster with admintools. If you backup the volumes when the database is down, you can restore them without requiring you to run the vbr.py backup and restore script. However, if you want to maintain multiple point-in-time backups, you can still use vbr.py.

Use this approach if your service-level agreements allow you to stop the database for a period of time long enough to initiate snapshots.

AWSclusterHardLinkBackup2.png

Perform Hard-Link Backup Using admintools

Before performing this task, identify the instance IP addresses in your source cluster and make note of your RAID volumes for later use. Then, create a backup configuration file with hard-link backup enabled.

For information about enabling hard-link backup within a backup configuration file, see Configuring the Hard Link Local Parameter in the full Vertica documentation.

  1. Stop the database:

    admintools -t stop_db -d VMart
  2. On each node, unmount your data volume:

    umount /vertica/data
  3. Create a snapshot of the RAID-0 volumes across your cluster, and make note of the each volume's corresponding snapshot ID. You need this information to assign snapshots to their correct node and volume designation during the restore process.
  4. On each node, mount your data volume:

    mount /vertica/data
  5. Start your database:

    admintools -t start_db -d VMart
  6. Save the following:
    • Your RAID-0 configuration file for each node:

      /etc/mdadm.conf
    • Your backup configuration file:

      vmart_backup.ini
    • [Optional] Your admintools configuration file, which has information on your node IP addresses and mapping:

      /opt/vertica/conf/admintools.conf

Hard-Link Backup with the fsfreeze Command

Another way you can perform a hard-link backup is freezing the cluster with the fsfreeze command. Because you can perform this kind of backup without stopping the cluster, users experience minimal performance effects.You must use vbr.py to restore from a backup performed using the fsfreeze command.

AWSclusterHardLinkBackup1.png

Perform a Hard-Link Backup with the fsfreeze Command

Before you can perform this procedure, you must identify the instance IP addresses in your source cluster, and make note of your RAID volumes. You also must create a backup configuration file with hard-link backup enabled. For information about enabling hard-link backup within a backup configuration file, see Configuring the Hard Link Local Parameter in the full Vertica documentation.

  1. Make a hard link backup on the RAID-0 device.

    /opt/vertica/bin/vbr.py --config-file vmart_backup.ini --task backup
  2. Freeze the RAID-0 volume across the cluster for a consistent snapshot of the EBS volumes that constitute that device. Freezing halts all database/SQL operation until you unfreeze the volume.

    Do not create a snapshot of a RAID-0 volume without freezing it first. Performing a snapshot without freezing your RAID volume invalidates your snapshot. Always check the return code of the fsfreeze command to ensure the device is frozen before you proceed.
    for IP in 10.0.10.13 10.0.10.14 10.0.10.15; do ssh $IP sudo fsfreeze --freeze /vertica/data;
  3. Create a snapshot of the RAID-0 volumes across your cluster. Make note of the each volume's corresponding snapshot ID. You will need this information to assign snapshots to their correct volume designation during the restore process.
  4. After the snapshot has started for all EBS volumes on all nodes, unfreeze the file system:

    for IP in 10.0.10.13 10.0.10.14 10.0.10.15; do ssh $IP sudo fsfreeze --unfreeze /vertica/data;
    You do not need to wait for the snapshot to complete before unfreezing.
  5. Save the following:
    • Your RAID-0 configuration file for each node:

      /etc/mdadm.conf
    • Your backup configuration file:

      vmart_backup.ini
    • [Optional] Your admintools configuration file, which has information on your node IP addresses and mapping:

      /opt/vertica/conf/admintools.conf

Restore from a Hard-Link Backup

If a cluster fails, use your backup configuration file and backup volume snapshots to restore your cluster from your last hard-link backup.

Related Topics

The process of restoring from a hard-link backup on AWS requires the following tasks:

Create a Target Cluster

Create a target cluster on AWS. Your new instances must match your source cluster in the following ways:

  • Network and VPC—The target instances must be in the same network and VPC as each other. However, they do not have to be in the same network and VPC as your source cluster.
  • Number of instances and nodes—You must use the same number of instances and nodes as your source cluster.
  • IP addresses—The internal IP addresses of the target nodes must be the same as the internal IP addresses of the nodes in the source cluster.
  • Version compatibility—Your target cluster must be running the same AMI and Vertica versions and using the same hotfix version as your source cluster.

Your new instances may differ from the source cluster in the following ways:

  • Your target cluster may be in a different availability zone.
  • You may use a different instance type for your target cluster.

Restore a Database from a Hard-Link Backup

  1. Create EBS volumes from your backup snapshots.
    Before you attach your new volumes, stop and detach any existing RAID volumes mounted to the target cluster.
  2. Attach the new EBS volumes to their respective nodes and volumes on the target cluster.
    You must use the correct device mapping. For example, a backup snapshot taken for node 1, volume /dev/xvdf must be re-created in the new cluster at node 1, volume /dev/xvdf.
  3. Rebuild your RAID-0 device by restoring your RAID configuration file:
    /etc/mdadm.conf
  4. When the volumes have finished attaching, remount the RAID:

    for IP in 10.0.10.13 10.0.10.14 10.0.10.15; do ssh $IP sudo mount /vertica/data ;
  5. Create an empty database on the target cluster with the same dbadmin username, password, datapath, and database name as your source database.
  6. Run a restore operation:

    /opt/vertica/bin/vbr.py --config-file vmart_backup.ini --task restore

Share this article: