Vertica

Vertica's Blog

Ad-Hoc Queries — Cascading Pool to the Rescue

cascading_waterfalls_10

To understand HP Vertica workload management, you should have a basic understanding of Vertica resource pools. For details about HP Vertica’s resource pool architecture and the parameter definitions, see the product documentation.

Because HP Vertica is a columnar database, you’ll rarely encounter an I/O bottleneck. Memory consumption is often the resource bottleneck that workloads have to contend with in an HP Vertica database. You can tune resource pool parameters to control memory allocation and runtime priority among these competing workloads.
In HP Vertica, you can define custom resource pools that can be configured to limit memory usage, concurrency, and runtime priority. Optionally, you can restrict each database user to use a specific resource pool to control memory consumption. And, in HP Vertica 7.1, we introduced a new feature to meet the customer requirement for ad-hoc queries—cascading pool.

Prior to version 7.1, we recommended redirecting truly ad-hoc queries to different sets of custom pools. Because they are difficult to anticipate, it is almost impossible to do this. So, to simplify things for the customer, to integrate better with third-party BI tools, HP Vertica introduced the cascading pool feature in release 7.1.

Here’s how cascading pools work. Let’s assume there are two resource pools: R1 (a starter pool) and R2 (a secondary/cascading pool). When a query’s execution time exceeds the pre-set RUNTIMECAP in R1, it cascades to R2. When that happens, all the resources are released from pool R1 and moved to pool R2 (from an accounting perspective). The query continues to execute without interruption. This, of course, assumes that there are enough resources available in pool R2; otherwise, the query has to wait in the queue or re-plan.

How does this feature help an HP Vertica customer? A typical HP Vertica customer often has two or more different types of workloads in their production environments. Prior to HP Vertica 7.1, customers needed to figure out a way to properly classify a query based on certain criteria (memory footprint by profiling, for example). Customers then had to use a program or script to direct the query to a resource pool. But for ad-hoc queries, this was a very difficult task.
With cascading pools, customer can route all queries through the starter pool R1 and let the queries cascade to the secondary pool R2 automatically.

Furthermore, this feature means that users need not know the existence of the secondary cascading pools. After secondary cascading pools are configured, they work in the background; you don’t need to grant end users explicit access to these secondary pools. So in some sense, one pool – the starter pool – is all that HP Vertica customers and third-party BI tools need.

The HP Big Data Platform Corporate Presales group has developed a video that demonstrates the new cascading pool feature in Vertica 7.1. The video shows how to manage complex workloads using cascading pools.

This workload management demo was performed on a 3-node HP Vertica cluster running version 7.1. Each node has 16 CPU cores and 64 GB of memory.

The test workload consists of:

  • 7500 small queries
  • 700 medium queries
  • 150 large queries

To simulate the real-life scenario, these queries are submitted to HP Vertica based on different schedules and batch sizes.

The same workload was run in two different scenarios:

  • Using the “out-of-the-box” default G pool
  • Using the cascading pools

Figure 1 shows the test results from using the default GENERAL pool and the cascading pools. On average, there is a 30% performance improvement for all three types of workloads when using the cascading pools.

Cascading pools

Figure 1: Total runtime using the default GENERAL pool and the cascading pools.

So what do these cascading pools look like?

create resource pool pool_long priority 10 memorysize ‘0%’ maxmemorysize ‘60%’ plannedconcurrency 4 maxconcurrency NONE executionparallelism 16 queuetimeout 3600 RUNTIMEPRIORITY LOW RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP NONE CPUAFFINITYSET NONE CPUAFFINITYMODE DEFAULT;

create resource pool pool_medium priority 30 memorysize ‘0%’ maxmemorysize ‘60%’ plannedconcurrency 8 maxconcurrency NONE executionparallelism 8 queuetimeout 3600 RUNTIMEPRIORITY MEDIUM RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP ‘100 SECONDS’ CASCADE TO pool_long CPUAFFINITYSET ‘4-15′ CPUAFFINITYMODE SHARED;

create resource pool pool_fast priority 50 memorysize ‘3G’ maxmemorysize ‘20%’ plannedconcurrency 24 maxconcurrency NONE executionparallelism 1 queuetimeout 3600 RUNTIMEPRIORITY HIGH RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP ’10 SECONDS’ CASCADE TO pool_medium CPUAFFINITYSET ‘0-3′ CPUAFFINITYMODE SHARED;

Let’s highlight some of the key design considerations for the cascading pools used in this demo.

  • The demo uses three pools: pool_fast, pool_medium, and pool_long. All queries start in pool_fast. If a query takes more than 10 seconds to finish in pool_fast, it cascades to pool_medium and continues running in pool_medium. If it runs for another 90 seconds and still does not finish, then it cascades to pool_long. The runtime is cumulative as a query moves from one pool to another.
  • pool_fast is meant for catching all these short-running tactical queries. It runs run in “high priority” mode and shares 4 CPU cores (of a total of 16 cores) with other pools.
  • These three pools are designed in such a way that the secondary pools have larger memory budgets than the source pools. This technique ensures to make sure that when a query moves from one pool to another, the secondary/cascading pool has enough resources to keep the query running uninterrupted. That way, the query does not have to wait in the queue for resources to be freed up or retry itself. Minimizing query retries during the pool move improves performance.

By simulating the real-life scenario of managing three different types of query workloads, large, medium, and small, you can see that, compared to the GENERAL pool, a carefully designed set of cascading pools can produce on average 30% performance gain for all workload types. A best practice in designing HP Vertica cascading pools is to minimize query retries during the pool move.

For truly ad-hoc types of workloads, in prior HP Vertica releases, a good knowledge of query profiles and memory footprints was needed in order to take advantage of HP Vertica resource pools for the optimal mixed workload performance. With HP Vertica 7.1, by using cascading pools, you can point all queries to one pool and let HP Vertica do the heavy lifting in the background automatically.

Po Hong and Satish Sathiyavageswaran are respectively Senior Solutions Architect and Solutions Architect in HP Big Data Platform Corporate Presales, which specializes in HP Vertica performance tuning and workload management.

HP Vertica and Qlik Team Up to Hack for a Cause

On November 17 2014, HP Vertica took part in the annual Qlik Hackathon in Orlando, Florida. The hackathon, which took place the day before the Qlik World Conference, brought together engineers, developers, and Qlik experts from all over the world. The challenge was simple: leverage features from Qlik and its partners to create an application to aid the United Nations in their commitment to maintain international peace and security.

QlikHack1Qlik Hackathon: Bringing people together

Qlik and HP Vertica: The Perfect Fit According to a Senior Political Affairs Officer at the United Nations Department of Political Affairs:

Internet information streams offer an instant snapshot, in real time, of the state of Read More »

Tech Support Series: Optimizing Projections

Welcome to another installment of our Top Tech Support Questions Answered blog series. In our first blog , we discussed ways to optimize your database for deletes. In this installment, we’ll talk about optimizing projections.

People often ask, “How can I optimize my projections for maximum query performance?” Like with many questions, the answer is “It depends.” This is because every database has a different structure and uses data in very different ways.

But fear not, we do have some general guidelines you can use to optimize your projections.

Your first step should always be to run Database Designer. The HP Vertica Database Designer creates optimized projections based on sample queries and sample data that you provide. However, you Read More »

Financial Analytics with HP Vertica and Vichara Technologies

74375f59-d325-4850-aca9-1c68a267418d

If you’re an investor on Wall Street you want to know with a good amount of certainty that you aren’t throwing your money into a crapshoot. As it turns out, the government feels the same way. Since the Financial crisis of 2008, the government has added regulations for FSIs (Financial Services Institutions) to follow in order to prevent a repeat scenario. Financial organizations rely on FSIs like Vichara Technologies in order to analyze risk and to provide a comprehensive, aggregated view of a firm’s potential to default at any point of time. As you can imagine, those analyses require looking at a LOT of data.

Take this particular case:

Read More »

HP Discover: Introducing HP Vertica OnDemand

HP-Vertica-OnDemand-2

Yesterday the sun rose over the beautiful city of Barcelona, and with it, HP Discover. Back in Cambridge MA the excitement is just as high as we finally get to reveal to the world what we’ve all been tirelessly working on. We have some great announcements for you from the HP Vertica platform, one of which is the exciting launch of HP Vertica OnDemand.

So let’s say you’ve got a startup with an awesome product , but you still have a pretty tight budget. We understand, we’ve been there. Small organizations like yours face challenges in Big Data and analytics every day. But now there’s good news—HP Vertica OnDemand!

HP Read More »

New Configuration Parameter Storage and Setting Options

New Configuration Parameter Storage from Vertica Systems on Vimeo.

Set and Clear Configuration Parameters from Vertica Systems on Vimeo.

HP Vertica 7.1.x introduces a new way to store and set configuration parameters.

Previously, configuration parameter values were stored in individual vertica.conf files on each node. As of HP Vertica 7.1.0, these values are stored in the database catalog, allowing the values to stay consistent over all nodes.

Why is this better? Well, when the values were stored at the node level, it was possible that down nodes could have inconsistent values when returning to the cluster because the SET_CONFIG_PARAMETER statement only acts on up nodes. Additionally, there was no way Read More »

Connecting HP Vertica to DbVisualizer

In our latest video entry, we show you how to connect the DbVis Software database tool, DbVisualizer, to your HP Vertica database. DbVisualizer is a database tool that allows you to manage and administer your database visually. To learn more about DbVisualizer, visit their webpage at dbvis.com. You can also go to vertica.com/documentation to find more HP Vertica documentation.

Get Started With Vertica Today

Subscribe to Vertica