End-To-End Data Transformation Guide: Using the dbt-vertica Adapter for Analytics

This video explains the integration of dbt and Vertica using the open-source dbt-vertica-adapter. Check out the video to understand how you can quickly transform your raw data directly in Vertica and take advantage of the new features in the adapter.

dbt or Data Build Tool, is a data transformation tool that streamlines the process of transforming raw data into curated data sets that are clean, easy to query, and ready for analysis. In dbt, data transformations and business logic are expressed using SQL statements and Jinja language. Jinja turns dbt into a programing environment for SQL.

You can use dbt and OpenText™ Vertica™ to build a robust and efficient data transformation pipeline. dbt code is processed in-database leveraging Vertica’s high performance. In addition, dbt provides software development best practices such as, version control, built-in testing framework, documentation, and code reusability that allows SQL developers to work like Software Engineers.

The dbt adapter for Vertica enables the connection to Vertica from dbt. dbt-vertica is an open-source project available in Vertica’s GitHub repository. It is written in Python and yml, and uses the vertica-python driver.

Vertica + dbt Data Stack

This is a diagram of dbt and Vertica data stack. dbt transforms data that is already loaded into Vertica and uses the dbt-vertica adapter to connect to your database. Unlike traditional data transformation tools, dbt data transformations are executed directly in your database. No data is transferred out of Vertica. The transformed data can then be used by BI tools and other technologies for further analysis.

Solution Overview

This document provides an example of a data transformation pipeline using VMart and includes the steps to configure the environment and connect to Vertica using the dbt-vertica adapter.

The solution presented in this document is categorized in the following sections:

You can expand and collapse topics for more details as you go along reading the guide.

Scope

Familiarity with dbt-core, Vertica, SQL, Jinja, and VS Code is required and assumed.

This document does not cover

  • Installation of Vertica and the VMart database.

  • Data loading: The process to load the raw data into Vertica.

  • Data visualization: The steps to visualize the transformed data.

  • dbt cloud.

Environment

This is the environment we used to execute the examples in this document:

Important dbt-core and vertica-python are automatically installed when you install the dbt-vertica adapter.

    Vertica environment:

    • Vertica Analytical Database 23.3.0 installed on a 3-node Linux VM cluster.

    • VMart example database

    dbt environment on Windows:

    • Visual Studio Code (vscode) version 1.76.2

    • git version 2.40.0

    • python version 3.11.2

    • dbt-vertica adapter version 1.5.0

    • vertica-python driver version 1.3.1

    • dbt-core version 1.5.0

    A database SQL client that can connect to the database:

    • In our testing, we used DBVisualizer version 13.0.4 to examine the data in tables or views in a user-friendly manner.

Configuring the Environment to Use dbt with Vertica

This section explains how to install the dbt-vertica adapter and initialize a dbt project. The configuration is based on a Windows environment.

Configuring a dbt Profile to Connect to Vertica

This section explains how you can configure the connection to Vertica also known as the dbt profile. The dbt profiles file is generated automatically when you initialize a project as described in the previous section. The dbt profiles file is in the following directory: C:\Users\<my_user>\.dbt\profiles.yml.

Example of a Data Transformation Pipeline Using VMart

The following solution consists of a data transformation pipeline with the purpose of business analytics and reporting. We want to calculate the on-time delivery rate, quantity accuracy rate, and the perfect order rate by vendor. We will call this final model vendor_performance. We will show how the vendor_performance model is created throughout a series of transformation steps that begin with the raw data.

Summary and Conclusions

We created a comprehensive example of a data transformation pipeline that uses business logic to transform raw data into a dimension and fact table for business analytics and reporting.

Vertica + dbt constitute a powerful solution for in-database data transformation that leverages the speed of your database.

For More Information