Sidestepping Catastrophes with Vertica Backup and Recovery

Posted April 5, 2016 by Sanjay Baronia

Accidents do happen! Data can become corrupted. It can be unintentionally deleted and, in some rare cases, you can lose all your data.

Do you have a plan B to recover data in a timely manner?

According to Veeam, organizations suffer an average of 13 incidents of unplanned application downtime per year. And a survey from the Ponemon Institute, which was sponsored by Emerson Network Power, states that the average cost per minute of unplanned downtime is approx. $7,900. Thus, the traditional practice of backing up and protecting your data continues to be instrumental in reducing downtime in today’’s changing IT landscapes.

With the rapid adoption of social media, mobile applications, the Internet of Things, and cloud storage, organizations are generating large volumes of data at unprecedented speeds in a variety of formats. Many organizations trust Vertica to manage and analyze critical data, generating new insights and offerings to its customers. These data analytical environments are rapidly growing and becoming integral to the “new way” of running a business, analyzing data for near real-time decision making, whether for marketing campaigns, stock trades, or security applications. As these Big Data infrastructures continue to increase in value, organizations need to have a proven strategy for backup and recovery, complying with both business recovery point objective (RPO) and recovery time objective (RTO).

How will you balance risk tolerance against cost to determine a viable protection level for your Big Data environment?

Reasons to Use Backup and Recovery

1. Service Level Agreements (SLA) for Recovery

To manage the impact of downtime and to provide business continuity, almost all organizations have SLAs (some even come with penalties written into customer contracts) to recover data in a timely manner. For example, if one has to recover the entire database then depending on the size of the database, the RTO could be several hours and if one is recovering a specific table then it could be minutes to under an hour.

2. Routine Maintenance

Customers periodically perform software and hardware upgrades to their Big Data environment. This could involve not only upgrading the Vertica software but also upgrading the underlying Linux operating system, virtualization software if applicable, or the application software. This maintenance could also include upgrading the local disk or adding and/or deleting nodes. With so many pieces of the analytical stack to consider and maintain, you risk potential data loss. For this reason, a general best-practice mitigation strategy is to back up Vertica prior to the maintenance window.

3. Application Development Life Cycle Acceleration

Organizations clone the production database from time to time. This enables the development and test teams, as part of the software release cycles, to develop and test applications on the latest dataset.  For this the DBAs leverage backups to create and refresh the development and test copy of the database, accelerating the application development life cycle.

4. Granular Recovery

Depending on their varying needs, DBAs either back up the entire database or perform a targeted backup of specific tables that are critical and need to be protected. Typically, for recoveries, DBAs recover specific tables that may be affected rather than recovering the entire database. Very rarely do customers recover the entire database. For this and many other reasons, granular recovery capability is an important use case and is available in Vertica v7.2 and above.

Backup and Disaster Recovery Options

Vertica offers a suite of backup and recovery options. Since Vertica stores data in files that are never modified after being written, all backup and recovery operations are consistent.

Vertica enables you to create low-cost, space-efficient copies of the Vertica database within the local cluster infrastructure. We refer to these as “hard link local backups.” These copies are fast because no actual user data is copied to an external backup environment. Vertica copies only the catalog data and initiates Linux-based hard links within the file system. The backups that are created share the same set of storage blocks and each tracks its own point-in-time copy, so there is no duplication of storage consumption. This approach protects against user and application errors. However, this approach does not protect against hardware failures.

 

sidestep1

To protect your database against hardware failures, Vertica enables you to back up your data to an external storage location. Be mindful that establishing an external backup environment is a bit more costly, given that you need to purchase and allocate additional storage to keep these backups. With Vertica, you can create hot backups, create incremental copies with indefinite number of restore points, and back up an entire or a subset (schemas, tables) of the database. Further, Vertica offers the flexibility and levels of granularity to restore specific objects (schemas, tables) from a full backup.

 

sidestep2

 

 

 

 

 

 

 

 

To protect against unplanned data center outages, Vertica provides the option to perform remote backups. As this is the most expensive option, typically, you’’ll want to leverage the infrastructure at the remote site for other business operations, making this added investment worthwhile. To create a remote copy, Vertica leverages the “Copy Cluster” functionality to replicate a Vertica database from a source data center to another target data center. In the past, Vertica required that the target database had to be down during the replication process, rendering the remote copy unusable for business operations. However with the recent release of Vertica “Excavator”, you can replicate individual objects (tables and schemas) to a target database location that can be actively servicing read operations to analytic applications.

 

sidestep4

 

 

 

 

 

 

 

 

Finally with Big Data, losing data or even access to it can be catastrophic to your business if you can’’t restore data quickly, negatively affecting productivity and market share. It is critical that you have mechanisms to sidestep these disasters. Vertica provides a robust backup and disaster recovery solution to recover both entire databases and specific tables in a timely manner. For more information on Vertica Backup and Recovery, please refer to the Vertica documentation and knowledgebase article.

And, if you are looking to evaluate Vertica, please download the Vertica community edition and give it a spin for an unlimited time for up to 1 TB across three nodes.