How to Use Database Designer to Optimize Vertica Physical Data Model in Six Steps

Posted June 16, 2023 by Maurizio Felici, Field Chief Technologist

A key with a key hole and Unlock Vertica Potential in text at the bottom

Vertica provides out-of-the-box a powerful tool to tune its Physical Data Model: Database Designer (DBD from now on). You can run DBD using the Management Console GUI but in this article we will focus on what they call “programmatic interface” (read… plain SQL) and – as practical examples are often more enlightening than “theoretical” documentation I am going to show how to use DBD via SQL to optimize the standard TPCH schema.

We will start by creating the initial TPCH model as defined in the standard TPCH documentation. So, for example, table “PART” is defined as:

As you can see we didn’t use any projection attribute: no encoding, no segmentation criteria, no sort order, etc… This means that projections associated with this table will be created automatically the first time we load some data (we call them auto-projections) using default encoding, segmentation and sort order as described in the standard Vertica documentation.

Then we will have to load some data using – for example – a script like this:

It’s now time to use DBD…

Step 1. Create a new design model:

Step 2. Tell DBD which tables we want to consider; we need to create an optimal model for ALL tables under “tpch” schema so…

Step 3. Now we define the queries we want to consider. You can provide them to DBD one by one or concatenate them all in a single file (separated by semi-colon). I prefer the second option:

The last parameter (‘true’) is to get a detailed report with the number of accepted/rejected queries.

Step 4. Then we tell DBD which kind of design we need: either an initial/replacement design (we call it comprehensive design) or adding new projections to an existing model (we call it incremental design). In our case we opt for a comprehensive model:

Step 5. Now we define our optimization objective: optimize query performance (query) rather than minimise database size and optimise load performance (load) or something in between (balanced). We opt for query performance:

Step 6. Finally we ask DBD to produce – and optionally deploy – the model. Normally I tend to have a look to the model before deploying it but in this case I decided to deploy it directly:

That’s all! Now our system is using an optimized data model!
You might wonder how long it took to go through all these steps, well this is the answer:

Less than one second! Less than one second to optimize and deploy the physical model. Not too bad… isn’t it?