Configuring Backup Hosts
The vbr
utility lets you back up your database to one or more hosts (known as backup hosts), that can be outside of your database cluster.
You can use one or more backup hosts or a single S3 bucket to back up your database. Use the vbr
configuration file to specify which backup host each node in your cluster should use.
Before you back up to hosts outside of the local cluster, configure the target backup locations to work with the vbr
utility. The backup hosts you use must:
- Have sufficient backup disk space.
- Be accessible from your database cluster through SSH.
- Have passwordless SSH access for the Database Administrator account.
- Set
AllowTcpForwarding = Yes
in your sshd_config file. If TCP forwarding is not enabled, backups fail with the following message:
"Errors connecting to remote hosts: Check SSH settings, and that the same Vertica version is installed on all nodes. " -
Have either the Vertica rpm or Python 2.7 and rsync 3.0.5 or later installed.
- If you are using a stateful firewall, configure your
tcp_keepalive_time
andtcp_keepalive_intvl sysctl
settings to use values less than your firewall timeout value.
Creating Configuration Files for Backup Hosts
Create separate configuration files for full or object-level backups, using distinct names for each configuration file. Also, use the same node, backup host, and directory location pairs. Specify different backup directory locations for each database.
Note: For optimal network performance when creating a backup, Open Text Corporation recommends that you give each node in the cluster its own dedicated backup host.
Preparing Backup Host Directories
Before vbr
can back up a database, you must prepare the target backup directory. Run vbr
with a task type of init
to create the necessary manifests for the backup process. You need to perform the init process only once. After that, Vertica maintains the manifests automatically.
Estimating Backup Host Disk Requirements
Wherever you plan to save data backups, consider the disk requirements for incremental backups at your site. Also, if you use more than one archive, multiple archives potentially require more disk space. Open Text Corporation recommends that each backup host have space for at least twice the database node footprint size. Follow this recommendation regardless of the specifics of your site's backup schedule and retention requirements.
To estimate the database size from the used_bytes
column of the storage_containers
system table:
VMart=> select sum(used_bytes) from storage_containers where node_name='v_mydb_node0001'; total_size ------------ 302135743 (1 row)
If your site uses multiple backup host locations, you can estimate the database size requirements per node. Use a query, such as the following, substituting a backup host name for node_name:
select node_name,sum(used_bytes) as size_in_bytes from v_monitor.storage_containers group by node_name;
Making Backup Hosts Accessible
You must verify that any firewalls between the source database nodes and the target backup hosts allow connections for SSH and rsync on port 50000.
The backup hosts must be running identical versions of rsync and Python as those supplied in the Vertica installation package.
Setting Up Passwordless SSH Access
To access a backup host, the database administrator must meet two requirements to run the vbr
utility:
- Have an account on each backup host, with write permissions to the backup directory.
- Have passwordless SSH access from each database cluster host to the corresponding backup host.
How you fulfill these requirements depends on your platform and infrastructure.
SSH access among the backup hosts and access from the backup host to the database node is not neccessary.
If your site does not use a centralized login system (such as LDAP), you can usually add a user with the useradd
command or through a GUI administration tool. See the documentation for your Linux distribution for details.
If your platform supports it, you can enable passwordless SSH logins using the ssh-copy-id
command to copy a database administrator's SSH identity file to the backup location from one of your database nodes. For example, to copy the SSH identity file from a node to a backup host named backup01
:
> ssh-copy-id -i dbadmin@backup01| Password:
Try logging into the machine with "ssh 'dbadmin@backup01'"
. Then, check the contents of the ~/.ssh/authorized_keysfile
to verify that you have not added extra keys that you did not intend to include.
> ssh backup01
Last login: Mon May 23 11:44:23 2011 from host01
Repeat the steps to copy a database administrator's SSH identity to all backup hosts you use to back up your database.
After copying a database administrator's SSH identity, you should be able to log in to the backup host from any of the nodes in the cluster. You are not prompted for a password.
Increasing the SSH Maximum Connection Settings for a Backup Host
If your configuration requires backing up multiple nodes to one backup host (n:1), increase the number of concurrent SSH connections to the SSH daemon (sshd
). By default, the number of concurrent SSH connections on each host is 10
, as set in the sshd_config
file with the MaxStartups
keyword. The MaxStartups
value for each backup host should be greater than the total number of hosts being backed up to this backup host.
To increase the MaxStartups
value:
- Log on as root to access the config file.
- Open the SSH configuration file (
/etc/ssh/sshd_config
) in a text editor. -
If the MaxStartups line is commented out with #, delete that character. Replace the current value, which may be a single integer or three integers like 10:30:60 with the new value. For example, to back up a 50 node cluster to one machine, use a value of 60:
MaxStartups 60
For more information on configuring
MaxStartups
, refer to the man page for that parameter. - Save the file.
- Reload the file using the following command:
sudo /etc/init.d/sshd reload
If you are using Red Hat 7/CentOS 7, use the following command instead:
sudo /bin/systemctl reload sshd.service
- Exit from root.