Vertica Integration with Talena: Connection Guide
For Vertica 7.1.x and 7.2.x
Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third-party vendor’s software. Other versions of the third-party product may work with Vertica. However, other versions may not have been tested. This document provides guidance using the latest versions of Vertica and Talena as of June, 2016.
Vertica users can deploy Talena to provide self-service access to production data for sandbox use and to automate the backup and recovery process. The Talena metadata catalog, named Talena FastFind, allows users to quickly identify data sets they want to recover from backups. Talena stores backups in a de-duplicated and compressed format and are erasure coded to protect against hardware failures. Talena allows development and data science teams to quickly gain access to production data in sandboxes to support application iteration needs.
You can use Talena in a Vertica environment for a variety of data management purposes, including the following:
- Backup and Data Recovery
- Data Pipeline
- Data Mirroring
This document is based on the results of testing the following versions:
- Vertica 7.1.0-1 with Talena 1.6 on CentOS 6.6
- Vertica 7.2.0-1 with Talena 1.6 on CentOS 6.6
For product downloads, Talena documentation, and licensing, contact the Talena team through the Talena site.
The Vertica JDBC client driver is built into Talena. You do not need to download or install drivers for Talena to connect to Vertica.
You must create a Talena Repository for each Vertica database you want to use as a target location in backups, restores, pipelines, or mirrors. To do so, use the Talena main menu Data Repositories selection. A repository definition specifies the connection information for the Vertica database, including a list of the hosts in the database, the port, the database administrator username, and the database password. After you create the repository, you can use it in the Talena workflow jobs you define.
The following is an example of a repository definition:
Talena uses policies, which are user-defined configuration parameter sets, to govern when and how workflows are executed. You can set up one or more policies for backup and recovery. To do so, use the Talena main menu Policies selection. When you set up a workflow, you can choose the policy you want to govern it.
Backup policies allow you to define duration of preservation as a restore point, immediate or scheduled backups, and the priority of the backup. You can also define specific hours to backup, and specify days of the week, as shown in the following:
You can also select the data you want to backup, as shown in the following:
Recovery policies allow you to define an immediate or scheduled recovery, and you can set a priority level for each recovery policy, as shown in the following:
Talena also allows you to pipeline or mirror a database to another repository, such as disaster recovery, test environment, and development sandboxes.
Pipelining allows you to use the backup and recover policies to create an immediate or individually-scheduled backup and recover to copy data between repositories. The following is an example of a data pipeline:
Mirroring is similar to pipelining but it performs the data copy in a single unified workflow. You can schedule the backup, but you cannot schedule the recover. The recover starts as soon as the backup completes. The following shows an example of mirroring:
The dashboard and detail screens allow you to monitor running workflows and review the outcome and details of older workflows, as shown in the following:
The dashboard also contains warnings and logs if issues occurred during the workflow run, as shown in the following:
The following section explains possible errors and workarounds.
You may see the following error after attempting to create a new Vertica repository:
ROLLBACK 5702: Couldn't create new UDx side process: Java Binary not found: /home/dbadmin/Voltage_Java_UDF/jdk1.7.0_79/bin/java
If you see this error, set the JavaBinaryForUDx.config parameter to your Java path:
You may see the following error after attempting to restore:
com.talena.agent.vertica.exception.VerticaConnectionException: java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](4136) ERROR: Node "v_vmart_node0001" does not exist
Select the schemas and/or specific objects to restore. Make sure not to mark any unsegmented projections for restore. For more information, see the Known Limitations section in this document.
You may see the following warning on a completed backup job:
[Vertica][VJDBC](5924) ERROR: Insufficient resources to get resource from JVM pool [Timedout waiting for resource request: Request exceeds limits: Memory(KB) Exceeded: Requested = 247691, Free = 0 (Limit = 990764, Used = 990764) (queueing threshold)]
You may need to adjust your JVM resource pool maxmemorysize in Vertica. For more information, see Monitoring Resource Pools in the Vertica documentation.
Contact Talena Support for tuning recommendations.
If you upgrade Vertica to a newer version, you may see an error such as the following:
Cause : [Vertica][VJDBC](6999) ERROR: The library [TalenaVerticaParallelExportLibrary] for the function [TalenaParallelExport(varchar, varchar, varchar)] was compiled with an incompatible SDK Version [v7.1.0-1]
If the Vertica version you upgrade to is higher than the latest version supported by Talena, this error can occur. Contact Talena with the specific version you were running and the version you upgraded to. Talena evaluates the version and if necessary recreates its UDx to be SDK compatible.
To verify compatibility, consult the Talena documentation or contact Talena Support prior to upgrading Vertica.
You cannot recover unsegmented projections unless the target location has the same database name, the same number of nodes, and the same node names. Unsegmented projections contain all these elements in their names and definition. Talena attempts to run the DDL to create the unsegmented projections and it fails if the target topology differs from the source. Talena attempts to recover the data using a default segmented super projection. The unsegmented projections might not be recreated during the Talena recovery, but can be manually created to fit the target topology and refreshed after the recovery.
If you are recovering from a K-Safe repository to a non K-Safe repository, the buddy projections are restored. After the recovery completes, you must manually identify and remove the buddy projections.
Data Masking and Filtering
Talena does not currently support the data masking and sampling features for pipelining and mirroring with Vertica.
Ownership and Grants
Where possible, Talena retains the original grants on objects being recovered, dependent on the users and roles that exist in the target repository. Currently, Talena does not retain the ownership of the objects to the original owner.
|For More Information About…||… See|
|Big Data and Analytics Community||https://vertica.com/big-data-analytics-community-content/|