Workload Management Metrics ? A Golden Triangle

Posted July 17, 2014 by Po Hong

Modern databases are often required to process many different kinds of workloads, ranging from short/tactical queries, to medium complexity ad-hoc queries, to long-running batch ETL jobs to extremely complex data mining jobs (See my previous blog on workload classification for more information.) DBAs must ensure that all concurrent workload, along with their respective Service Level Agreements (SLAs), can co-exist well with each other while maximizing a system?s overall performance.

So what is concurrency? Why should a customer care about concurrency?

Concurrency is a term used to describe having multiple jobs running in an overlapping time interval in a system. It doesn’t necessarily mean that they are or ever will be running at the same instant. Concurrency is synonymous to multi-tasking and it is fundamentally different from parallelism, which is a common point of confusion. Parallelism represents a state in which two or more jobs are running at the exact same instant. The simplest example might be a single CPU computer. On such a computer, you can, in theory, run multiple jobs by context-switching between them. This gives the user the illusion of virtual parallelism or that multiple jobs are running on the single CPU at the same time. However if you take a snapshot at any given instant , you?ll find there is one and only one job running. In contrast, actual parallel processing is enabled by multiple working units (e.g. multiple cpu/cores in a modern database server such as the HP DL380p). Because Vertica is an MPP columnar database and an inherent multi-threaded application, it can take advantage of this multiple-CPU/core server architecture to process queries in both a concurrent and a parallel manner.

Most customers do not usually care about concurrency directly. Rather, they have a specific requirement to execute a certain workload in a database governed by a set of throughput and response time (latency) objectives. Throughput (TP) is defined as the number of queries/jobs that a database can perform in a unit of time and is the most commonly used metric to measure a database?’s performance. Response time (or latency) is the sum of queuing time and runtime and as such it depends on both concurrency (as a proxy for overall system load) and query performance (= inverse of runtime).

For a given workload, the three metrics: throughput (TP), concurrency, and performance are related to each other by the simple equation:

Throughput = Concurrency * Performance

Knowing any two of these three metrics, you can derive the third. This relationship can be visually illustrated by the following Workload Management Metrics Triangle:

workload_golden_triangle

Concurrency is often NOT a direct customer requirement because it depends on query performance and throughput SLA. Customer requirements are usually in the form of something like this: ?”We need to process 10K queries in one hour with an average response time of 1 min or less.”? So throughput (TP) is often the metric that customer is interested in and concurrency is a “?derived”? metric.

Let?s consider a hypothetical customer POC requirement of processing twelve hundred queries in one minute and assume that there are two competing systems, X and Y.

On System X, executing such a workload would require a currency level of 40 with an average query runtime of 2s.

On System Y, assuming average query response is 100ms, executing the same workload, requires a concurrency level of only 2 (because 20/s=2*1/100ms).

What does this mean for the customer? Clearly System Y with its superior query processing capability needs far less concurrency to satisfy the SLA than System X and hence it is a better platform (from a purely technical perspective).

To summarize, for a given throughput (TP) SLA, the better the query/job performance, the less concurrency it needs. Less concurrency generally means less or more efficient resource usage and better overall system performance (since there will be more spare system resources to process other workloads). The goal of any workload performance tuning exercise should never be about increasing concurrency. Instead it should focus on minimizing a query?’s resource usage, improving its performance and applying the lowest possible concurrency level to satisfy a customer’?s throughput (TP) and response time (latency) requirement.

Po Hong is a senior pre-sales engineer in Vertica?s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.