To understand HP Vertica workload management, you should have a basic understanding of Vertica resource pools. For details about HP Vertica’s resource pool architecture and the parameter definitions, see the product documentation.
Because HP Vertica is a columnar database, you’ll rarely encounter an I/O bottleneck. Memory consumption is often the resource bottleneck that workloads have to contend with in an HP Vertica database. You can tune resource pool parameters to control memory allocation and runtime priority among these competing workloads.
In HP Vertica, you can define custom resource pools that can be configured to limit memory usage, concurrency, and runtime priority. Optionally, you can restrict each database user to use a specific resource pool to control memory consumption. And, in HP Vertica 7.1, we introduced a new feature to meet the customer requirement for ad-hoc queries—cascading pool.
Prior to version 7.1, we recommended redirecting truly ad-hoc queries to different sets of custom pools. Because they are difficult to anticipate, it is almost impossible to do this. So, to simplify things for the customer, to integrate better with third-party BI tools, HP Vertica introduced the cascading pool feature in release 7.1.
Here’s how cascading pools work. Let’s assume there are two resource pools: R1 (a starter pool) and R2 (a secondary/cascading pool). When a query’s execution time exceeds the pre-set RUNTIMECAP in R1, it cascades to R2. When that happens, all the resources are released from pool R1 and moved to pool R2 (from an accounting perspective). The query continues to execute without interruption. This, of course, assumes that there are enough resources available in pool R2; otherwise, the query has to wait in the queue or re-plan.
How does this feature help an HP Vertica customer? A typical HP Vertica customer often has two or more different types of workloads in their production environments. Prior to HP Vertica 7.1, customers needed to figure out a way to properly classify a query based on certain criteria (memory footprint by profiling, for example). Customers then had to use a program or script to direct the query to a resource pool. But for ad-hoc queries, this was a very difficult task.
With cascading pools, customer can route all queries through the starter pool R1 and let the queries cascade to the secondary pool R2 automatically.
Furthermore, this feature means that users need not know the existence of the secondary cascading pools. After secondary cascading pools are configured, they work in the background; you don’t need to grant end users explicit access to these secondary pools. So in some sense, one pool – the starter pool – is all that HP Vertica customers and third-party BI tools need.
The HP Big Data Platform Corporate Presales group has developed a video that demonstrates the new cascading pool feature in Vertica 7.1. The video shows how to manage complex workloads using cascading pools.
This workload management demo was performed on a 3-node HP Vertica cluster running version 7.1. Each node has 16 CPU cores and 64 GB of memory.
The test workload consists of:
- 7500 small queries
- 700 medium queries
- 150 large queries
To simulate the real-life scenario, these queries are submitted to HP Vertica based on different schedules and batch sizes.
The same workload was run in two different scenarios:
- Using the “out-of-the-box” default G pool
- Using the cascading pools
Figure 1 shows the test results from using the default GENERAL pool and the cascading pools. On average, there is a 30% performance improvement for all three types of workloads when using the cascading pools.
Figure 1: Total runtime using the default GENERAL pool and the cascading pools.
So what do these cascading pools look like?
create resource pool pool_long priority 10 memorysize ‘0%’ maxmemorysize ‘60%’ plannedconcurrency 4 maxconcurrency NONE executionparallelism 16 queuetimeout 3600 RUNTIMEPRIORITY LOW RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP NONE CPUAFFINITYSET NONE CPUAFFINITYMODE DEFAULT;
create resource pool pool_medium priority 30 memorysize ‘0%’ maxmemorysize ‘60%’ plannedconcurrency 8 maxconcurrency NONE executionparallelism 8 queuetimeout 3600 RUNTIMEPRIORITY MEDIUM RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP ‘100 SECONDS’ CASCADE TO pool_long CPUAFFINITYSET ‘4-15′ CPUAFFINITYMODE SHARED;
create resource pool pool_fast priority 50 memorysize ‘3G’ maxmemorysize ‘20%’ plannedconcurrency 24 maxconcurrency NONE executionparallelism 1 queuetimeout 3600 RUNTIMEPRIORITY HIGH RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP ’10 SECONDS’ CASCADE TO pool_medium CPUAFFINITYSET ‘0-3′ CPUAFFINITYMODE SHARED;
Let’s highlight some of the key design considerations for the cascading pools used in this demo.
- The demo uses three pools: pool_fast, pool_medium, and pool_long. All queries start in pool_fast. If a query takes more than 10 seconds to finish in pool_fast, it cascades to pool_medium and continues running in pool_medium. If it runs for another 90 seconds and still does not finish, then it cascades to pool_long. The runtime is cumulative as a query moves from one pool to another.
- pool_fast is meant for catching all these short-running tactical queries. It runs run in “high priority” mode and shares 4 CPU cores (of a total of 16 cores) with other pools.
- These three pools are designed in such a way that the secondary pools have larger memory budgets than the source pools. This technique ensures to make sure that when a query moves from one pool to another, the secondary/cascading pool has enough resources to keep the query running uninterrupted. That way, the query does not have to wait in the queue for resources to be freed up or retry itself. Minimizing query retries during the pool move improves performance.
By simulating the real-life scenario of managing three different types of query workloads, large, medium, and small, you can see that, compared to the GENERAL pool, a carefully designed set of cascading pools can produce on average 30% performance gain for all workload types. A best practice in designing HP Vertica cascading pools is to minimize query retries during the pool move.
For truly ad-hoc types of workloads, in prior HP Vertica releases, a good knowledge of query profiles and memory footprints was needed in order to take advantage of HP Vertica resource pools for the optimal mixed workload performance. With HP Vertica 7.1, by using cascading pools, you can point all queries to one pool and let HP Vertica do the heavy lifting in the background automatically.
Po Hong and Satish Sathiyavageswaran are respectively Senior Solutions Architect and Solutions Architect in HP Big Data Platform Corporate Presales, which specializes in HP Vertica performance tuning and workload management.