Vertica Integration with Alation: Connection Guide

 

For Vertica 7.x

 

About Vertica Documentation

Vertica connection documents provide basic information about setting up connections to Vertica from software that third-party vendors create. These documents provide guidance using one specific version of Vertica and one version of the third-party vendor's software. Other versions of the third-party product may work with Vertica . However, other versions may not have been tested.

Overview

Integrated with your Vertica database, Alation’s data accessibility platform offers solutions in collaborative analytics, effective data governance, data search and discovery, and data optimization.

  • Data analysts can find data and compose queries faster using an optimized query tool. This tool includes capabilities such as SmartSuggest (table suggestions based on query patterns), table sampling, and semantic search of data documentation.
  • Data warehouse managers can review data usage within their Vertica instances for data optimization, usage analysis and auditing, and increased security.
  • Data analysts can track documentation progress and identify data usage with automated data governance for increased analyst productivity and compliance.
  • Accelerated time-to-insight is available to all employees through semantic data search and discovery of all tables and queries.

This document helps you to understand how you can leverage Alation to gain insight into data usage.

Alation uses the connection to Vertica to automatically create a data dictionary, calculate which database objects are used most often by the queries that users execute, and provide a collaborative knowledge-sharing and query-writing environment.

This document assumes that the reader is familiar with both Alation and Vertica .

This document is based on the results of testing Alation 2.08 with Vertica 7.x.

Before You Begin

Before you can connect Alation to Vertica , you must install the Alation software. Alation provides the Vertica 6.1.3 JDBC drivers with Alation, so you do not need to download them. As soon as you install Alation, you can log in on the Sign In page and create a connection to your Vertica database.

Connecting Alation to Vertica

Before you can connect to Vertica , define the connection using the following steps:

  1. To create a new connection, click Add Data Source.alation_add_data_source_screen.png
  2. Select Vertica from the Select a Database Type dropdown list.alation_database_type_menu.png
  3. To register the database, on the Register a new database window, enter the host name or IP address and the Vertica database name and title. Alation uses the database title as the name of the connection to Vertica .alation_register_database.png
  4. Click Add to register your Vertica database with Alation. Once the database is registered, Alation displays the following window:alation_general_settings.png
  5. Before Alation can calculate the popularity of your database objects, it needs metadata about the schemas, tables, views, and columns of your Vertica database. To extract the metadata about a database for Alation, select the Metadata Extraction tab. Enter your Vertica credentials and click Save.alation_fetch_metadata.png
  6. To import the metadata from your database, click Fetch from DB Now. Alation submits a job that extracts all the schema details from Vertica . Monitor the job status on the Metadata Extraction tab until it completes. alation_metadata_extraction_done.png
  1. After extracting the metadata, Alation profiles each database object. By default, the profiling skips any views. If you want Alation to profile the views, on the Per-Object Parameters tab, uncheck Skip Views.alation_per_object_parameters.png
  2. To start profiling your database, select the Data Profiling tab and click Launch Job Now.alation_data_profiling.png
  3. Monitor the data profiling job on the Data Profiling tab. After the profiling completes, Alation displays metadata for the Vertica database objects. The following example shows schema data for the vmart database:alation_after_profiling.png

The next window lists information about all the tables in the vmart.public schema. This information includes the columns and number of rows.

alation_after_profiling2.png

The following window displays information about each column in a table: data type and content. (On its dashboard, Alation refers to database columns as attributes.)

alation_after_profiling3.png

Calculating Database Object Popularity with Alation

Alation defines a database object's popularity as how frequently certain schemas, tables, and columns are used in queries that your users run. Alation calculates the popularity as the number of times user queries access a given schema, table, or column.

After you extract the Vertica database metadata into Alation, the Popularity fields are initially empty.

alation_popularity1.png

To calculate an Vertica data object's popularity, Alation needs to capture the queries that are executing on Vertica . After it has the queries, Alation calculates the popularity for each schema, table, and column by running its engine on top of the captured queries.

To enable Alation to calculate the popularity, you need to first save information about user queries.

  1. To store the query information, create a table in your Vertica database. This table contains information from two system tables:
    • DC_REQUEST_ISSUED
    • QUERY_REQUESTS
=> DROP TABLE alation_qrylog_table;
=> SELECT * INTO alation_qrylog_table
FROM (
SELECT ri.session_id, ri.user_name, qr.start_timestamp, qr.end_timestamp, qr.request_duration_ms, ri.request_id, ri.request
FROM v_internal.dc_requests_issued ri
JOIN v_monitor.query_requests qr
ON  ri.transaction_id = qr.transaction_id AND ri.request_id = qr.request_id AND ri.session_id = qr.session_id
WHERE date(start_timestamp) > ?
) AS alation_querylog;
-- alation user is a non-admin user
=> GRANT SELECT ON alation_qrylog_table TO alation_user; 
  1. On the Query Log Ingestion tab, configure Alation to retrieve the following information from Vertica :
    • Specify the database tables where the query information is stored, in the preceding example, alation_qrylog_table.
    • Specify whether to manually ingest the queries, or to schedule the query log ingestion on a daily, weekly, or monthly basis.

Schedule the query log ingestion at a time when your Vertica database is not in heavy use.

alation_query_log_ingestion.png

At the bottom of this page, view the status of the query log ingestion operation. After it completes, Alation displays the popularity at the schema, table, and column levels, as shown in the following graphics. This information helps you identify the most-used and least-used database objects.

In the following figure, you can see that user queries access the public schema more than other schemas.

alation_popularity15.png

The next figure shows that queries access the product_dimension table more than other tables.

alation_popularity5.png

Alation also calculates the popularity for table columns. (Alation calls them attributes.) In the following example, the most accessed columns in the product_dimension table are:

  • department_description
  • fat_content
  • product_price

alation_popularity4.png

For More Information

For more information about how Alation works with Vertica , see www.alation.com.