Vertica Analytics Platform Version 9.2.x Documentation
Database branching lets you create a snapshot of your current database state. You can revive a database branch as you would a full Vertica database. You usually revive a branch on a different cluster while the master database (the original database where you created the branch) continues to run. They are intended to be a short-lived duplicate of your database for use in testing, workload isolation, or maintenance.
A branch is a completely separate instance of your database. Changes to your master database after you create the branch do not affect the branch. Also, changes to a branch do not affect the master database or other branches.
Currently, branches are always data immutable: you cannot make any changes to the branch that alter the ROS containers holding its data.
How Database Branching Works
When you create a database branch, Vertica stops removing stale information from communal storage. When you revive the branch, Vertica uses this preserved information to reconstruct a copy of the catalog for the branch. The branch uses the same ROS containers that master was using at the time you created the branch. Even if you have altered data on master after you created the branch (for example, by dropping a table) the branch will have the same data master did at the moment when you created the branch.
Branches prevent your database from deleting old ROS containers. Without the ability to remove old containers, your database's communal data storage use will grow over time. For this reason, always use branches as short-lived instances of your database. They are not suitable for longer-term uses such as backups or keeping historical snapshots. Once you drop all of the branches in your database, Vertica can once again delete stale information from communal storage.
A branch can alter its own copy of the catalog. Therefore, you can perform actions on the branch that only change the catalog, such as adding or dropping users. The underlying data containers never change, however. Vertica responds to statements that change a ROS container (such as using INSERT or COPY) with an error.
The ability to alter the catalog can sometimes lead to confusion if you are not aware of which statements only change the catalog. For example, you can create a table on a data immutable branch because tables are defined in the catalog. However, Vertica responds with an error when you try to insert data into it:
=> CREATE TABLE t (a INT); CREATE TABLE => INSERT INTO t VALUES (10); ERROR 8849: Feature is unsupported on a data immutable branch
TEMP and LOCAL TEMP tables are fully supported on branches, as they do not alter ROS containers. See below.
Branch Restrictions and Requirements
Data-immutable branches have the following restrictions and requirements:
- The cluster you use to revive a branch must have the same number of nodes as the master database did when you created the branch. After reviving the branch, you can add or remove nodes as needed.
- As with your master database, you can only have a single instance of a branch running at a time. If you want to run multiple instances the same branch simultaneously, you must create multiple branches based on the same version of the master database. See Creating Multiple Branches at Once
- You cannot create more than two branches if you are using a Community Edition license.
- You cannot create more than 100 branches when using an other Vertica licenses.
- You cannot run back up or restore on a database branch.
- You also cannot use backup or restore on the master database when it has branches. You must drop all branches before either backing up or restoring.
- Because branches are data immutable, you cannot perform the following actions that would cause Vertica to alter a data container:
- Adding or altering data using statements such as INSERT, COPY, DELETE, UPDATE, REFRESH, MERGE, ALTER TABLE, TRUNCATE TABLE, and ADD or DROP COLUMN
- Adding or removing table constraints
- Running the Tuple Mover's mergeout function. Vertica disables the Tuple Mover on branches.
- Running the database designer
- Merging, dropping, or other partition management operations.
- Machine Learning operations that create and upload DFS files to shared storage
- Creating or dropping a branch. You can only create branches from the master database.
Other operations on a branch may work, such as CREATE TABLE, but really aren't useful, or may only work partially.
Because branches have their own copy of the catalog, you can perform actions that only change the catalog, including:
Dropping a table does not free the underlying ROS containers used by the branch. Only dropping all branches allows Vertica to delete ROS containers.
- Perform all operations (CREATE, INSERT, DROP, and so on) on temporary tables
- ADD, DROP, and ALTER users
- Add or remove nodes
Common Uses For Database Branches
Branches are useful in cases where you might otherwise have to create a copy of your Vertica database. They are more efficient because the branch uses the same data as the master database. You do not copy all of your database's data to another S3 bucket. Instead, you just create the branch and then revive it on a new cluster.
Ways you can use database branches include:
- Creating a stand-in read-only database for use while your master database is down for maintenance or an upgrade. You can create a branch and then revive it for use by clients that only need to read data (or only write data to temporary tables). After the maintenance is over, you can drop the branch and make your master database available for all operations. If you have an external load balancer, you can use it to transparently replace your master database with the branch while you are performing maintenance.
- Starting one or more branches to perform analytics without impacting the performance of your master database. You can even scale out as many clusters as you need to perform analytics in parallel. Once your analytics are done, you can drop the branches and terminate the clusters.
- Perform workload testing in a sandbox before rolling out new scripts to the master database. You can monitor the performance impact of new queries on you database using actual data before you run them on your master database.
In This Section
Was this topic helpful?