Author Archive

Ad-Hoc Queries — Cascading Pool to the Rescue


To understand HP Vertica workload management, you should have a basic understanding of Vertica resource pools. For details about HP Vertica’s resource pool architecture and the parameter definitions, see the product documentation.

Because HP Vertica is a columnar database, you’ll rarely encounter an I/O bottleneck. Memory consumption is often the resource bottleneck that workloads have to contend with in an HP Vertica database. You can tune resource pool parameters to control memory allocation and runtime priority among these competing workloads.
In HP Vertica, you can define custom resource pools that can be configured to limit memory usage, concurrency, and runtime priority. Optionally, you can restrict each database user to use a specific resource pool to control memory consumption. And, in HP Vertica 7.1, we introduced a new feature to meet the customer requirement for ad-hoc queries—cascading pool.

Prior to version 7.1, we recommended redirecting truly ad-hoc queries to different sets of custom pools. Because they are difficult to anticipate, it is almost impossible to do this. So, to simplify things for the customer, to integrate better with third-party BI tools, HP Vertica introduced the cascading pool feature in release 7.1.

Here’s how cascading pools work. Let’s assume there are two resource pools: R1 (a starter pool) and R2 (a secondary/cascading pool). When a query’s execution time exceeds the pre-set RUNTIMECAP in R1, it cascades to R2. When that happens, all the resources are released from pool R1 and moved to pool R2 (from an accounting perspective). The query continues to execute without interruption. This, of course, assumes that there are enough resources available in pool R2; otherwise, the query has to wait in the queue or re-plan.

How does this feature help an HP Vertica customer? A typical HP Vertica customer often has two or more different types of workloads in their production environments. Prior to HP Vertica 7.1, customers needed to figure out a way to properly classify a query based on certain criteria (memory footprint by profiling, for example). Customers then had to use a program or script to direct the query to a resource pool. But for ad-hoc queries, this was a very difficult task.
With cascading pools, customer can route all queries through the starter pool R1 and let the queries cascade to the secondary pool R2 automatically.

Furthermore, this feature means that users need not know the existence of the secondary cascading pools. After secondary cascading pools are configured, they work in the background; you don’t need to grant end users explicit access to these secondary pools. So in some sense, one pool – the starter pool – is all that HP Vertica customers and third-party BI tools need.

The HP Big Data Platform Corporate Presales group has developed a video that demonstrates the new cascading pool feature in Vertica 7.1. The video shows how to manage complex workloads using cascading pools.

This workload management demo was performed on a 3-node HP Vertica cluster running version 7.1. Each node has 16 CPU cores and 64 GB of memory.

The test workload consists of:

  • 7500 small queries
  • 700 medium queries
  • 150 large queries

To simulate the real-life scenario, these queries are submitted to HP Vertica based on different schedules and batch sizes.

The same workload was run in two different scenarios:

  • Using the “out-of-the-box” default G pool
  • Using the cascading pools

Figure 1 shows the test results from using the default GENERAL pool and the cascading pools. On average, there is a 30% performance improvement for all three types of workloads when using the cascading pools.

Cascading pools

Figure 1: Total runtime using the default GENERAL pool and the cascading pools.

So what do these cascading pools look like?

create resource pool pool_long priority 10 memorysize ‘0%’ maxmemorysize ‘60%’ plannedconcurrency 4 maxconcurrency NONE executionparallelism 16 queuetimeout 3600 RUNTIMEPRIORITY LOW RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP NONE CPUAFFINITYSET NONE CPUAFFINITYMODE DEFAULT;

create resource pool pool_medium priority 30 memorysize ‘0%’ maxmemorysize ‘60%’ plannedconcurrency 8 maxconcurrency NONE executionparallelism 8 queuetimeout 3600 RUNTIMEPRIORITY MEDIUM RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP ‘100 SECONDS’ CASCADE TO pool_long CPUAFFINITYSET ‘4-15’ CPUAFFINITYMODE SHARED;

create resource pool pool_fast priority 50 memorysize ‘3G’ maxmemorysize ‘20%’ plannedconcurrency 24 maxconcurrency NONE executionparallelism 1 queuetimeout 3600 RUNTIMEPRIORITY HIGH RUNTIMEPRIORITYTHRESHOLD 0 RUNTIMECAP ’10 SECONDS’ CASCADE TO pool_medium CPUAFFINITYSET ‘0-3’ CPUAFFINITYMODE SHARED;

Let’s highlight some of the key design considerations for the cascading pools used in this demo.

  • The demo uses three pools: pool_fast, pool_medium, and pool_long. All queries start in pool_fast. If a query takes more than 10 seconds to finish in pool_fast, it cascades to pool_medium and continues running in pool_medium. If it runs for another 90 seconds and still does not finish, then it cascades to pool_long. The runtime is cumulative as a query moves from one pool to another.
  • pool_fast is meant for catching all these short-running tactical queries. It runs run in “high priority” mode and shares 4 CPU cores (of a total of 16 cores) with other pools.
  • These three pools are designed in such a way that the secondary pools have larger memory budgets than the source pools. This technique ensures to make sure that when a query moves from one pool to another, the secondary/cascading pool has enough resources to keep the query running uninterrupted. That way, the query does not have to wait in the queue for resources to be freed up or retry itself. Minimizing query retries during the pool move improves performance.

By simulating the real-life scenario of managing three different types of query workloads, large, medium, and small, you can see that, compared to the GENERAL pool, a carefully designed set of cascading pools can produce on average 30% performance gain for all workload types. A best practice in designing HP Vertica cascading pools is to minimize query retries during the pool move.

For truly ad-hoc types of workloads, in prior HP Vertica releases, a good knowledge of query profiles and memory footprints was needed in order to take advantage of HP Vertica resource pools for the optimal mixed workload performance. With HP Vertica 7.1, by using cascading pools, you can point all queries to one pool and let HP Vertica do the heavy lifting in the background automatically.

Po Hong and Satish Sathiyavageswaran are respectively Senior Solutions Architect and Solutions Architect in HP Big Data Platform Corporate Presales, which specializes in HP Vertica performance tuning and workload management.

Workload Management Metrics – A Golden Triangle

Modern databases are often required to process many different kinds of workloads, ranging from short/tactical queries, to medium complexity ad-hoc queries, to long-running batch ETL jobs to extremely complex data mining jobs (See my previous blog on workload classification for more information.) DBAs must ensure that all concurrent workload, along with their respective Service Level Agreements (SLAs), can co-exist well with each other while maximizing a system’s overall performance.

So what is concurrency? Why should a customer care about concurrency?

Concurrency is a term used to describe having multiple jobs running in an overlapping time interval in a system. It doesn’t necessarily mean that they are or ever will be running at the same instant. Concurrency is synonymous to multi-tasking and it is fundamentally different from parallelism, which is a common point of confusion. Parallelism represents a state in which two or more jobs are running at the exact same instant. The simplest example might be a single CPU computer. On such a computer, you can, in theory, run multiple jobs by context-switching between them. This gives the user the illusion of virtual parallelism or that multiple jobs are running on the single CPU at the same time. However if you take a snapshot at any given instant , you’ll find there is one and only one job running. In contrast, actual parallel processing is enabled by multiple working units (e.g. multiple cpu/cores in a modern database server such as the HP DL380p). Because Vertica is an MPP columnar database and an inherent multi-threaded application, it can take advantage of this multiple-CPU/core server architecture to process queries in both a concurrent and a parallel manner.

Most customers do not usually care about concurrency directly. Rather, they have a specific requirement to execute a certain workload in a database governed by a set of throughput and response time (latency) objectives. Throughput (TP) is defined as the number of queries/jobs that a database can perform in a unit of time and is the most commonly used metric to measure a database’s performance. Response time (or latency) is the sum of queuing time and runtime and as such it depends on both concurrency (as a proxy for overall system load) and query performance (= inverse of runtime).

For a given workload, the three metrics: throughput (TP), concurrency, and performance are related to each other by the simple equation:
Throughput = Concurrency * Performance

Knowing any two of these three metrics, you can derive the third. This relationship can be visually illustrated by the following Workload Management Metrics Triangle:


Concurrency is often NOT a direct customer requirement because it depends on query performance and throughput SLA. Customer requirements are usually in the form of something like this: “We need to process 10K queries in one hour with an average response time of 1 min or less.” So throughput (TP) is often the metric that customer is interested in and concurrency is a “derived” metric.

Let’s consider a hypothetical customer POC requirement of processing twelve hundred queries in one minute and assume that there are two competing systems, X and Y.

On System X, executing such a workload would require a currency level of 40 with an average query runtime of 2s.

On System Y, assuming average query response is 100ms, executing the same workload, requires a concurrency level of only 2 (because 20/s=2*1/100ms).

What does this mean for the customer? Clearly System Y with its superior query processing capability needs far less concurrency to satisfy the SLA than System X and hence it is a better platform (from a purely technical perspective).

To summarize, for a given throughput (TP) SLA, the better the query/job performance, the less concurrency it needs. Less concurrency generally means less or more efficient resource usage and better overall system performance (since there will be more spare system resources to process other workloads). The goal of any workload performance tuning exercise should never be about increasing concurrency. Instead it should focus on minimizing a query’s resource usage, improving its performance and applying the lowest possible concurrency level to satisfy a customer’s throughput (TP) and response time (latency) requirement.

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.

A Method for Vertica Workload Classification


Modern analytic databases such as HP Vertica often need to process a myriad of workloads ranging from the simplest primary-key lookup to complex analytical queries that include dozens of large tables and joins between them. Different types of load jobs (such as batch type ETL jobs and near real-time trickle loads) keep the data up-to-date in an enterprise data warehouse (EDW). Therefore, an enterprise class database like HP Vertica must have a robust yet easy-to-use mixed-workload management capability.

The Resource Manager

HP Vertica manages complex workloads using the Resource Manager. With the Resource Manager, you manage resource pools, which are pre-defined subsets of system resources with an associated queue. HP Vertica is preconfigured with a set of built-in resource pools that allocate resources to different request types. The General pool allows for a certain concurrency level based on the RAM and cores in the machines.

HP Vertica provides a sophisticated resource management scheme that allows diverse, concurrent workloads to run efficiently on the database. For basic operations, the built-in general pool is usually sufficient. However, you can customize this pool to handle specific workload requirements.

In more complex situations, you can also define new resource pools that can be configured to limit memory usage, concurrency, and query priority. You can even optionally restrict each database user to use a specific resource pool to control memory consumption of their requests.

Understanding and Classifying Workloads

Before you start thinking about resource pools and workload optimization in HP Vertica, you must first develop a solid understanding of the customer’s workloads and know how to properly classify them.

What should you use for classification criterion? You could pick apart a complex query, studying its structure in detail, counting and adding up the number of tables, joins and aggregate functions, number and types of derived tables and analytical functions to come up with some weighted score. However this kind of approach is extremely tedious and subjective and, as a result is not a practical option.

What if we use the standalone runtime of a query as the criterion? This method is also problematic because a query that runs in one minute while using up 80% of a system’s resources should obviously not be in the same category as another query that runs in the same amount of time (one minute) but uses < 0.1% of the resources. In HP Vertica, the best proxy for query complexity is its memory usage. Being a modern MPP columnar database, HP Vertica is rarely, if ever, I/O bound. HP Vertica is also less likely to hit the CPU bottleneck because of the tremendous power and speed of modern multi-core CPUs. Therefore, the most common resource bottleneck in a production HP Vertica cluster running a complex mixed-workload is often memory. Because of this, the HP Vertica Resource Manager focuses on establishing equitable memory allocation among different workloads or pools. This ensures that no resource pool is starved out of memory in the worst-case scenario -- under full system load. Determining Memory Requirements for a Query

If you can somehow determine quickly how much memory a query requires per node, then you can use that value to classify an HP Vertica query (or any other job). Based on extensive real-world performance tuning experience gained from working with some of HP Vertica’s biggest and most demanding customers, I have found the following classification rules to be very easy to use and effective:

Small: <200 MB Medium: 200-800 MB Large: >800MB

How can you quickly determine a query’s memory requirement? It turns out that HP Vertica has a convenient profiling option (similar to EXPLAIN.) You can use the PROFILE statement to get the total memory required for the query (among other things). As a best practice, you should set up a small and dedicated profiling pool for this purpose, as shown in the following example:


Creating a specific profiling pool forces your query to borrow from the default general pool for any extra memory that it needs to execute. If you use the general pool (a common mistake), depending on the detailed pool settings, the reserved memory may be more than a query actually needs. HP Vertica could be “fooled” to report on reserved memory as opposed to the actual allocated/used memory under certain circumstances and this would skew your result.

For more information on the Resource Manager, see Managing Worloads in the HP Vertica Documentation set.

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.

How to make ROLLUP fly in Vertica?

ROLLUP screenshot

ROLLUP is a very common Online Analytic Processing (OLAP) function and is part of ANSI SQL. Many customers use ROLLUP to write reports that automatically perform sub-total aggregations across multiple dimensions at different levels in one SQL query. The following is an example:

WHERE B.DATE_KEY BETWEEN 20120301 AND 20120331

Note: ROLLUP syntax is not supported in the current version of Vertica 7.0. This
is used only as an illustration. See blog text for more details

The business question the above query intends to answer is: For the month of March, 2014, show the total sales across both departments (dept_code) and stores (store_code) to generate sub-total/total sales figures at three levels:

  1. At the detailed level of each pair of (dept_code, store_code)
  2. At the level of only (dept_code) but aggregating across all stores
  3. Finally a grand sales total aggregating across all departments and stores

Assuming performance is not an issue, clearly ROLLUP is a very powerful and useful function that can do a lot of data crunching in the background and deliver a quite complex report.

Unlike the more familiar GROUP BY SQL function in which the column order is not material, ROLLUP works by creating subtotals that “roll up” from the most detailed level to a grand total and requires an ordered list of grouping expressions to be supplied as arguments. First ROLLUP calculates the standard aggregate values specified in the standard GROUP BY (without ROLLUP) and then ROLLUP moves from right to left through the list of grouping columns, starting with the lowest level of aggregation to create progressively higher-level subtotals. This process continues until all grouping columns have been consumed and aggregations computed. The operator essentially “rolls up” its grouping operations, culminating in a grand total in the end.

If N is the number of grouping columns, ROLLUP function will create (N+1) levels of subtotals (including a final grand total). The column order in a ROLLUP operator is critical since it works by removing the left most column at each step. For example:

ROLLUP(A, B, C) creates 4 groups: (A, B, C), (A, B), (A), ().

Note there are NO groups such as (A, C) and (B, C).

Unfortunately the current version of Vertica 7.0 does not support ROLLUP. So what is the alternative or workaround for the existing Vertica customers? Will the workaround perform on a large data set? Fortunately, the answers to both questions are yes. However it does take some ingenuity in physical design and SQL re-write to make ROLLUP work and perform in the currently supported Vertica releases (version 5.1 and later).

The key considerations are to create and pre-load a local temp table which is sorted on ALL the ROLLUP columns in the same exact order, rewrite SQL at each level using GROUP BY into a sub-query and finally UNION ALL of these sub-queries together. By applying this strategy to the ROLLUP SQL posted at the beginning of this blog, we arrive at the following alternative/workaround:


/*+ DIRECT */
WHERE B.DATE_KEY BETWEEN 20140301 AND 20140331



The fact that temp table AGG_TEMP is sorted on the ROLLUP columns (dept_code, store_code) is important for performance because this means that all the GROUP BY sub-queries at individual levels will be able to exploit the highly efficient GROUPBY PIPELINED operator (explicit in the query’s explain plan) in Vertica to get the job done quickly and accurately.

A nice property of GROUPBY PIPELINED operator (in contrast to the more resource intensive GROUPBY HASH) is that its performance is largely independent of the actual table size. As a result this workaround can easily scale up to include tables with 10+B or even 100+B rows without any noticeable drop in query performance (as measured in first fetch time).

To conclude I would like to mention that there has been serious work going on in Vertica Engineering to implement ROLLUP (and many other OLAP functions) in the up-coming Vertica releases. Stay tuned!

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.

Facebook and Vertica: A Case for MPP Databases

I have just come back from a business trip to China where I visited several large Chinese telecom customers to talk about the recent big Vertica win at Facebook. Two questions these customers had constantly asked me were: What’s the future of MPP databases? Will Hadoop become one database that rules the whole analytic space?

These seemed to be odd questions considering that Facebook, one of the juggernauts in the Open Source community in general and Hadoop world in particular, has recently picked Vertica to be the anchoring database to satisfy its ever-increasing analytical demands and has since put the biggest Vertica cluster (with ~300 nodes and effective data storage of 6+ PB) into production. It tells me that if a Hadoop power-house and the inventor of Hive (the most popular SQL-on-Hadoop database) like Facebook, with its teams of brilliant programmers and bound-less resources, still thinks that it needs a MPP database like Vertica in its “Big Data” technology stack in the foreseeable future, it sends a clear and strong message. Obviously Facebook thinks the answers to both questions are NO, not so fast. In the meantime, Facebook will continue to use Hive/HBase and other Hadoop technologies for the tasks they are good at: ETL, handling unstructured data and conducting complex data-mining types of deep analysis.
So why does Facebook think that it needs a MPP database? Facebook has been running an EDW (Oracle Exadata ~50TB) for some time but feels that their existing EDW is running out of steam because it cannot keep up with the rapid data growth especially as mobile platform becomes more and more popular. Facebook would like to take advantage of the established commercial MPP databases for lower cost, robust eco-system, improved data security and better scalability/performance. Their main reasons for going with an MPP database can be summarized as follows:

  • Rapidly expanding analytical needs at Facebook,
  • MapReduce is too slow, plus security concerns
  • In-Memory Database (IMDB) is too expensive and too immature
  • Current SQL-on-Hadoop databases are not good enough and too immature

Facebook has invited four MPP vendors (including Vertica) to participate in two rounds of competitive POCs before declaring Vertica as the ultimate winner on the basis of Vertica’s low TCO, ease of management and superior ad-hoc query performance.

There have recently been many SQL-on-Hadoop offerings in the last couple of years, both open source and proprietary, including but not limited to Hive, Hadapt, Citus, Impala, Stinger and Apache Drill. Though their effort in making Hadoop more SQL friendly is welcome, my general impression is that they are still a long way off in terms of closing the performance gap to the popular MPP databases in the marketplace (e.g. Vertica). Depending on your perspective, you may argue that this gap is not exactly getting narrower at any pace that foretells its closing any time soon.

There is strong reason for me to believe that the SQL-on-Hadoop camp may have over-estimated the effectiveness of bolting/wrapping around open source SQL optimizers (e.g. PostgreSQL) to HDFS and severely underestimated the effort and time it takes to produce an enterprise quality MPP database whose core optimizer/execution engine technology requires years of intensive real world use to mature, and 100s (if not 1000s) of customers to validate and millions of cases to test and train. This is certainly more about practice than theory or concept. Query optimization is fundamentally a software problem and there is a limit to what any “brute force” hardware-based approach can do. To echo and rephrase what the authors of the MapReduce and Parallel Databases: Friends or Foes?” said, smart software (like MPP databases) is still a good idea in the age of Hadoop and “Big Data” and there is plenty of room and opportunity for MPP databases to thrive for a long time to come….

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.

Can Vertica Climb a Tree?


The answer is YES if it is the right kind of tree. Here “tree” refers to a common data structure that consists of parent-child hierarchical relationship such as an org chart. Traditionally this kind of hierarchical data structure can be modeled and stored in tables but is usually not simple to navigate and use in a relational database (RDBMS). Some other RDBMS (e.g. Oracle) has a built-in CONNECT_BY function that can be used to find the level of a given node and navigate the tree. However if you take a close look at its syntax, you will realize that it is quite complicated and not at all easy to understand or use.

For a complex hierarchical tree with 10+ levels and large number of nodes, any meaningful business questions that require joins to the fact tables, aggregate and filter on multiple levels will result in SQL statements that look extremely unwieldy and can perform poorly. The reason is that such kind of procedural logic may internally scan the same tree multiple times, wasting precious machine resources. Also this kind of approach flies in the face of some basic SQL principles, simple, intuitive and declarative. Another major issue is the integration with third-party BI reporting tools which may often not recognize vendor-specific variants such as CONNECT_BY.

Other implementations include ANSI SQL’s recursive SQL syntax using WITH and UNION ALL, special graph based algorithms and enumerated path technique. These solutions tend to follow an algorithmic approach and as such, they can be long on theory but short on practical applications.
Since SQL derives its tremendous power and popularity from its declarative nature, specifying clearly WHAT you want to get out of a RDBMS but not HOW you can get it, a fair question to ask is: Is there a simple and intuitive approach to the modeling and navigating of such kind of hierarchical (recursive) data structures in a RDBMS? Thankfully the answer is yes.

In the following example, I will discuss a design that focuses on “flattening” out such kind of hierarchical parent-child relationship in a special way. The output is a wide sparsely populated table that has extra columns that will hold the node-ids at various levels on a tree and the number of these extra columns is dependent upon the depth of a tree. For simplicity, I will use one table with one hierarchy as an example. The same design principles can be applied to tables with multiple hierarchies embedded in them. The following is a detailed outline of how this can be done in a program/script:

  1. Capture the (parent, child) pairs in a table (table_source).
  2. Identify the root node by following specific business rules and store this info in a new temp_table_1.
    Example: parent_id=id.
  3. Next find the 1st level of nodes and store them in a temp_table_2. Join condition:
  4. Continue to go down the tree and at the end of each step (N), store data in temp_table_N.
    Join condition:, where M=N+1.
  5. Stop at a MAX level (Mevel) when there is no child for any node at this level (leaf nodes).
  6. Create a flattened table: table_flat by adding in total (Mlevel+1) columns named as LEVEL,
  7. A SQL insert statement can be generated to join all these temp tables together to load
    into the final flat table: table_flat.

  8. When there are multiple hierarchies in one table, the above procedures can be repeated for each
    hierarchy to arrive at a flattened table in the end.


This design is general and is not specific to any particular RDBMS architecture, row or column or hybrid. However the physical implementation of this design naturally favors columnar databases such as Vertica. Why? The flattened table is usually wide with many extra columns and these extra columns tend to be sparsely populated and they can be very efficiently stored in compressed format in Vertica. Another advantage is that when a small set of these columns are included in the select clause of an SQL, because of Vertica’s columnar nature, the other columns (no matter how many there are) will not introduce any performance overhead. This is as close to “free lunch” as you can get in a RDBMS.

Let’s consider the following simple hierarchical tree structure:

Vertica Tree diagram

There are four levels and the root node has an ID of 1. Each node is assumed to have one and only one parent (except for the root node) and each parent node may have zero to many child nodes. The above structure can be loaded into a table (hier_tab) having two columns: Parent_ID and Node_ID, which represent all the (parent, child) pairs in the above hierarchical tree:

CHart 1

It is possible to develop a script to “flatten” out this table by starting from the root node, going down the tree recursively one level at a time and stopping when there is no data left (i.e. reaching the max level or depth of the tree). The final output is a new table (hier_tab_flat):

Chart 2

What’s so special above this “flattened” table? First, this table has the same key (Node_ID) as the original table; Second, this table has several extra columns named as LEVEL_N_ID and the number of these columns is equal to the max number of levels (4 in this case) plus one extra LEVEL column; Third, for each node in this table, there is a row that includes the ID’s of all of its parents up to the root (LEVEL=1) and itself. This represents a path starting from a node and going all the way up to the root level.The power of this new “flattened” table is that it has encoded all the hierarchical tree info in the original table. Questions such as finding a level of a node and all the nodes that are below a give node, etc. can be translated into relatively simple SQL statements by applying predicates to the proper columns.

Example 1: Find all the nodes that are at LEVEL=3.Select Node_ID From hier_tab_flat Where LEVEL=3;Example 2: Find all the nodes that are below node= 88063633.

This requires two logical steps (which can be handled in a front-end application to generate the proper SQL).

Step 2.1. Find the LEVEL of node= 88063633 (which is 3).

Select LEVEL From hier_tab_flat Where Node_ID=88063633;

Step 2.2. Apply predicates to the column LEVE_3_ID:

Select Node_ID From hier_tab_flat Where LEVE_3_ID =88063633;

Complex business conditions such as finding all the nodes belonging to node=214231509 but excluding the nodes that are headed by node=88063633 can now be translated into the following SQL:

Select Node_ID
From hier_tab_flat
Where LEVE_2_ID=214231509
And LEVE_3_ID <> 88063633 ;

By invoking the script that flattens one hierarchy repeatedly, you can also flatten a table with multiple hierarchies using the same design. With this flattened table in your Vertica tool box, you can climb up and down any hierarchical tree using nothing but SQL.

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle

Get Started With Vertica Today

Subscribe to Vertica