Always Have a Backup Plan

Posted May 6, 2015 by Sarah Lemaire, Manager, Vertica Documentation

Three 3D arrows, different colors pointing in different directions

In my never-ending quest to watch every heist movie ever made, I’ve learned a thing or two. Most notably, I’ve learned this: always have a backup plan. That, and never trust any character played by Edward Norton. But mainly, always have a backup plan.

Why should you always have a backup plan? Because something, inevitably, goes wrong. Whether you experience something as simple as a deleted record, or as serious as a full system deletion, you need a backup.

Humans created databases for a reason. There’s no way we could keep millions of records in our heads or establish a filing system that could tell us what we need to know in milliseconds. Instead, we can store all that data on hardware and use platforms like Vertica to analyze it. But what if something happens to that data? What if it becomes corrupted or someone accidentally wipes it out? It’s unlikely that we have people or papers that can accurately replicate all that data. That’s why it’s so important to perform regular database backups.

Luckily, Vertica supplies a comprehensive utility, the vbr.py Python script that lets you back up, restore, list backups, and copy your database to another cluster.

So when should you run this script to back up your database? It’s a good idea to perform backups regularly. For example, you might run incremental backups nightly or weekly, depending on business needs and available resources. Additionally, you should create backups in the following situations:

  • After loading or altering a large volume of data
  • Before performing database maintenance
  • Before upgrading to another database version
  • Before and after adding, removing, or replacing nodes

Backup Basics

In Vertica, you can create full and incremental database backups, as well as backups of schemas or tables (object-level backups). Running a full backup backs up your whole database. Additionally, you can run object-level backups, which result in the backing up of specific objects like individual schemas or tables. After running an initial backup, you can run incremental backups that will back up changes made since the last backup.

No matter what type of backup you want to create, you’ll go through these basic steps:

  1. Create a configuration file.
  2. Run the backup.
  3. Run additional backups if needed.
  4. Restore backup when needed.

Creating a configuration file

The vbr.py utility looks to a configuration file for the information required to back up and restore a full- or object-level backup or to copy a cluster. Vertica doesn’’t provide a default configuration file, so, the first step is always creating one specific to your needs.

The vbr.py utility prompts you to answer questions about what to back up and then creates the configuration file. We’’ll go over specific responses to questions in the Scenarios section of this blog.

Alternatively, you can copy and edit an existing configuration file and save the changes to a different file name.

Running the vbr.py Utility

The database must be running for you to create a backup. The first time you execute the vbr.py utility with a new configuration file, it creates a set of backup directories as specified in your configuration file. When you run the vbr.py utility again, the utility will create subdirectories, but within the structure defined during the first invocation.

Running Additional Backups

If you want to create incremental backups, run the utility again with the same confirmation file. This procedure will essentially add to your previous backups.

Restoring Data from a Backup

The requirements for restoring data vary depending on if you are restoring a full database backup or an object-level backup.

To restore a full database backup, ensure that:

  • The target database is down. You can’t restore a full backup when the database is running.
  • All of the backup hosts are up and available.
  • The backup directory exists and contains the backups from which to restore.
  • The cluster to which you’re restoring the backup has the same number of hosts as the one used to create the backup. The node names and the IP addresses of the target host must match what is specified in the configuration file.
  • The database you’re restoring must already exist on the cluster to which you are restoring data. The database can be completely empty without any data or schema. As long as the database name matches the name in the backup, and all of the node names match the names of the nodes in the configuration file, you can restore to it.

To restore an object-level backup, make sure that:

  • The database exists and is up. You cannot restore an object-level backup into an empty database.
  • The vbr.py configuration file you use for the restore task specifies the backup to restore.

You cannot restore part of an object-level backup. Likewise, you cannot restore an object if the only backup you have is a full database backup.

To restore data, run the vbr.py script again with the –task restore and -config-file directives.

Scenarios

Now let’’s look at some common backup and restore scenarios.

Scenario 1: Creating a full database backup and adding to that backup on a regular basis.

The Situation:

In this scenario, we create a full backup snapshot named ‘fullBackup’. You don’t need to include a date in the backup name; Vertica automatically augments the backup subdirectories with date and time indicators. Also notice that since we want to create a full backup and not an object-level one, we leave the Specify objects line blank (by pressing Enter). We also accept the default Config file name (by pressing Enter) because the file name is automatically populated with what we specified as the snapshot name.

The Execution:

1. Create the configuration file by running the vbr.py utility with the –setupconfig directive:
/opt/vertica/bin/vbr.py --setupconfig

2. Answer the prompt, as follows:

The resulting configuration file looks like:

[Misc]
snapshotName = fullBackup
restorePointLimit = 1
[Database]
dbName = VMart
dbUser = dbadmin
[Transmission]
[Mapping]
v_vmart_node0001 = 123.45.67.89:/home/dbadmin

3. Run the vbr.py utility with the –task backup and –config-file  directives:

vbr.py --task backup --config-file fullBackup.ini
[dbadmin@localhost ~]$ vbr.py --task backup --config-file fullBackup.ini
Preparing...
Found Database port: 5433
Copying...
[==================================================] 100%
ll child processes terminated successfully.
Committing changes on all backup sites...backup done!
4. Create incremental backups by running the backup script again:
vbr.py --task backup --config-file fullBackup.ini

View the backups using the vbr.py –listbackup task, with your specific configuration file:

To restore the full database backup, run the vbr.py utility with the –task restore and -config-file directives, as shown here:
vbr.py --task restore --config-file fullBackup.ini

Scenario 2: Create a full backup of a database on one cluster (cluster A) and restore the backup on another cluster (cluster B).

The Situation:

You can restore a backup to a cluster that is different than your source cluster using the copycluster directive. However, you can do so ONLY if the following conditions are true:

  • The source and target cluster are properly configured. The source cluster must fulfil the requirements mentioned here and the target cluster must fulfill the requirements mentioned here.
  • The location of the catalog, data, and temp directories are identical on the source and target database.
  • The target cluster and source cluster have the same node names.
  • The target cluster is accessible from the source cluster.
  • The target and source clusters have the same database administrator account, and all nodes must allow a database administrator of the source cluster to login through SSH without a password.
  • The target cluster has adequate disk space for the vbr.py –task copycluster command to complete.

The Execution

1. Backup the target cluster if you want to retain existing data on the target cluster.

2. Create a configuration file and specify the host names of nodes in the target cluster as the backup hosts, as shown here:
The copy_cluster.ini configuration file looks like this:

[Mapping]
v_vmart_node0001= target-host01:/home/dbadmin/backups
v_vmart_node0002= target-host02:/home/dbadmin/backups
v_vmart_node0003= target-host03:/home/dbadmin/backups

3. Copy the database using the copycluster command:
vbr.py --task copycluster --config-file copy_cluster.iniThe copycluster command is essentially a simultaneous backup and restore — the data is   backed up from the source database cluster and restored to the destination cluster in a single operation.

Scenario 3: Create backups of specific schemas or tables and then restore a schema or table at a later date.

The Situation

You can create a backup of specific objects by specifying the objects in the configuration file. In this scenario, we want to back up the store schema.

The Execution

1. Create the configuration file and specify the objects you want to back up:
Specify objects (no default): public, storeThis is what the resulting configuration file looks like. Notice the addition of the objects line:

To backup a specific table, just use the table name:
Specify objects (no default): store.store_orders_fact

2. Restore an object-level backup by running the vbr.py utility with the restore command with the object-level configuration file:
vbr.py --task restore --config-file storeBackup.ini

In Closing

Hopefully, your day-to-day operations don’t play out like a heist movie and hopefully you’ll never experience a situation where all your data is compromised. But just in case, you now have a backup plan.

For more information, check out our other media related to Backup and Restore here.

You can also read our full documentation here.