Vertica

Archive for March, 2014

Database Designer in HP Vertica 7

With the HP Vertica 7, you can use Database Designer with Management Console. As in previous releases, you can still run Database Designer from Admin Tools, but its integration with Management Console offers an additional easy-to-use method for creating a database design.

Database Designer optimizes query performance and minimizes the disk storage that the database uses. It does this by analyzing your logical schema, sample data, and, optionally, your sample queries. Then, Database Designer creates a physical schema design (a set of projections) that can be deployed automatically or manually.

Check out the following demo to get started with the new Database Designer feature in Management Console.

* When using this new feature, remember that, to create the design, you must be a DBADMIN user or have the DBUSER role assigned to you with write access to the tables in your schema.


For more information, visit www.vertica.com/documentation.

Viewing Query Plans and Profile Data in Management Console 7

With HP Vertica 7.0, Management Console offers a new way to visualize your query plans and get profile information about your queries. You can run EXPLAIN on a query using Management Console’s Query Plan Visualizer, and it provides you with a visual representation of the query plan with the lowest cost. Management Console highlights and links to key information in the output, so you can spot issues at a glance. When you profile a query, Management Console provides a graphical view of what resources were used when HP Vertica executed the query.

Our new video tutorial walks you through using Management Console to view query plans and profile data. See the video below, and make sure to check out the other video tutorials we offer here.


You can also read more about the feature in this blog post: Visualizing Your Query Plan with Management Console 7.

HP Vertica presents three talks at ICDE, Chicago, March 31-April 4, 2014

g10278016032009_jpghighres (1)

For the past three years, HP Vertica has presented innovative topics at prestigious database conferences such as the International Conference on Data Engineering (ICDE), the Very Large Databases (VLDB) conference, and the Extremely Large Database (XLDB) conference. This year, our engineering team proudly announces three talks at the upcoming ICDE held in Chicago, IL, USA, March 31-April 4, 2014.

  • On 3/31/2014, at SMDB, Ben Vandiver introduces Flex Zone, one of the new features of our recent release, HP Vertica Analytics Platform 7. Flex Zone enables the
    smooth data load and exploration flexibility of NoSQL solutions while maintaining a unified SQL interface over structured and semi-structured data.
  • On 4/1/2014, Ramakrishna Varadarajan presents Vertica’s customizable physical design tool, called the Database Designer, which produces designs optimized for various scenarios and applications. For a given workload and space budget, Database Designer automatically recommends a physical design that optimizes query performance, storage footprint, fault tolerance, and database recovery to meet different customer requirements.
  • On 4/2/2014, Jaimin Dave introduces HP Vertica SQL Query Optimizer. The Query Optimizer was written from the ground up for the HP Vertica Analytic Database. Jaimin will discuss its design and the tradeoffs encountered during its implementation. He’ll also argue that the full power of today’s database systems can be realized only with a carefully designed custom Query Optimizer, written specifically for the system in which it operates.

Click here to find details and schedule. Please do attend the talks and stop by and say hello to our presenters and their co-authors. We’ll be happy to tell you more about our designs and the trade-offs we encountered.

Can Vertica Climb a Tree?

big_basin_0939_mg_1143

The answer is YES if it is the right kind of tree. Here “tree” refers to a common data structure that consists of parent-child hierarchical relationship such as an org chart. Traditionally this kind of hierarchical data structure can be modeled and stored in tables but is usually not simple to navigate and use in a relational database (RDBMS). Some other RDBMS (e.g. Oracle) has a built-in CONNECT_BY function that can be used to find the level of a given node and navigate the tree. However if you take a close look at its syntax, you will realize that it is quite complicated and not at all easy to understand or use.

For a complex hierarchical tree with 10+ levels and large number of nodes, any meaningful business questions that require joins to the fact tables, aggregate and filter on multiple levels will result in SQL statements that look extremely unwieldy and can perform poorly. The reason is that such kind of procedural logic may internally scan the same tree multiple times, wasting precious machine resources. Also this kind of approach flies in the face of some basic SQL principles, simple, intuitive and declarative. Another major issue is the integration with third-party BI reporting tools which may often not recognize vendor-specific variants such as CONNECT_BY.

Other implementations include ANSI SQL’s recursive SQL syntax using WITH and UNION ALL, special graph based algorithms and enumerated path technique. These solutions tend to follow an algorithmic approach and as such, they can be long on theory but short on practical applications.
Since SQL derives its tremendous power and popularity from its declarative nature, specifying clearly WHAT you want to get out of a RDBMS but not HOW you can get it, a fair question to ask is: Is there a simple and intuitive approach to the modeling and navigating of such kind of hierarchical (recursive) data structures in a RDBMS? Thankfully the answer is yes.

In the following example, I will discuss a design that focuses on “flattening” out such kind of hierarchical parent-child relationship in a special way. The output is a wide sparsely populated table that has extra columns that will hold the node-ids at various levels on a tree and the number of these extra columns is dependent upon the depth of a tree. For simplicity, I will use one table with one hierarchy as an example. The same design principles can be applied to tables with multiple hierarchies embedded in them. The following is a detailed outline of how this can be done in a program/script:

  1. Capture the (parent, child) pairs in a table (table_source).
  2. Identify the root node by following specific business rules and store this info in a new temp_table_1.
    Example: parent_id=id.
  3. Next find the 1st level of nodes and store them in a temp_table_2. Join condition:
    temp_table_1.id=table_source.parent_id.
  4. Continue to go down the tree and at the end of each step (N), store data in temp_table_N.
    Join condition: temp_table_M.parent_id=temp_table_N.id, where M=N+1.
  5. Stop at a MAX level (Mevel) when there is no child for any node at this level (leaf nodes).
  6. Create a flattened table: table_flat by adding in total (Mlevel+1) columns named as LEVEL,
    LEVEL_1_ID,….LEVEL_Mlevel_ID.
  7. A SQL insert statement can be generated to join all these temp tables together to load
    into the final flat table: table_flat.

  8. When there are multiple hierarchies in one table, the above procedures can be repeated for each
    hierarchy to arrive at a flattened table in the end.

 

This design is general and is not specific to any particular RDBMS architecture, row or column or hybrid. However the physical implementation of this design naturally favors columnar databases such as Vertica. Why? The flattened table is usually wide with many extra columns and these extra columns tend to be sparsely populated and they can be very efficiently stored in compressed format in Vertica. Another advantage is that when a small set of these columns are included in the select clause of an SQL, because of Vertica’s columnar nature, the other columns (no matter how many there are) will not introduce any performance overhead. This is as close to “free lunch” as you can get in a RDBMS.

Let’s consider the following simple hierarchical tree structure:

Vertica Tree diagram

There are four levels and the root node has an ID of 1. Each node is assumed to have one and only one parent (except for the root node) and each parent node may have zero to many child nodes. The above structure can be loaded into a table (hier_tab) having two columns: Parent_ID and Node_ID, which represent all the (parent, child) pairs in the above hierarchical tree:

CHart 1

It is possible to develop a script to “flatten” out this table by starting from the root node, going down the tree recursively one level at a time and stopping when there is no data left (i.e. reaching the max level or depth of the tree). The final output is a new table (hier_tab_flat):

Chart 2

What’s so special above this “flattened” table? First, this table has the same key (Node_ID) as the original table; Second, this table has several extra columns named as LEVEL_N_ID and the number of these columns is equal to the max number of levels (4 in this case) plus one extra LEVEL column; Third, for each node in this table, there is a row that includes the ID’s of all of its parents up to the root (LEVEL=1) and itself. This represents a path starting from a node and going all the way up to the root level.The power of this new “flattened” table is that it has encoded all the hierarchical tree info in the original table. Questions such as finding a level of a node and all the nodes that are below a give node, etc. can be translated into relatively simple SQL statements by applying predicates to the proper columns.

Example 1: Find all the nodes that are at LEVEL=3.Select Node_ID From hier_tab_flat Where LEVEL=3;Example 2: Find all the nodes that are below node= 88063633.

This requires two logical steps (which can be handled in a front-end application to generate the proper SQL).

Step 2.1. Find the LEVEL of node= 88063633 (which is 3).

Select LEVEL From hier_tab_flat Where Node_ID=88063633;

Step 2.2. Apply predicates to the column LEVE_3_ID:

Select Node_ID From hier_tab_flat Where LEVE_3_ID =88063633;

Complex business conditions such as finding all the nodes belonging to node=214231509 but excluding the nodes that are headed by node=88063633 can now be translated into the following SQL:

Select Node_ID
From hier_tab_flat
Where LEVE_2_ID=214231509
And LEVE_3_ID <> 88063633 ;

By invoking the script that flattens one hierarchy repeatedly, you can also flatten a table with multiple hierarchies using the same design. With this flattened table in your Vertica tool box, you can climb up and down any hierarchical tree using nothing but SQL.

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle

HP Vertica Tutorials You Asked, We Listened.

Over recent months, we’ve heard our community request short, instructional videos and tutorials to help them learn more about the rich and powerful features of the HP Vertica Analytics Platform.

Well, we heard you, and have developed and posted some initial videos to help you maximize your investment in HP Vertica. We’ve posted a new videos that highlight new features in HP Vertica 7 (“Crane”). Among the videos we’ve posted are:

  • A two-part series on the HP Vertica Connector for HCatalog. Part 1 provides an overview. Part 2 includes a demonstration:
  • A demonstration of the power of HP Vertica Flex Zone:
  • A tuturial on how to run Database Designer in Management Console 7:
  • A five-part series that demonstrates how to set up HP Vertica 7 with Amazon Web Services
  • An introduction to HP Vertica 7 Fault Groups

You can see these and all video tutorials here. Here’s a sample:


Stay tuned in the weeks ahead. We’ll be posting new videos that highlight new features in Management Console, how to use Fault Groups to set up large clusters, and more.

We’d love to hear more from you! If you have any suggestions or ideas for topics for future videos, let us know. You can post your ideas on our forum at community.vertica.com, or you can send ideas to vertica-docfeedback@hp.com

We’re committed to your success! Check back soon to see what’s new in HP Vertica Tutorials!

Gartner Magic Quadrant Released – HP Vertica Enters the Leader’s Quadrant

The Gartner Magic Quadrant has long been recognized as critical research that organizations rely on to weigh, evaluate, and ultimately select vendors as the infrastructure for their IT initiatives.

Yesterday, Gartner released the 2014 Gartner Magic Quadrant for Data Warehouse and Database Management Systems. We are very proud to announce that the HP Vertica Analytics Platform has entered the Leaders Quadrant, gaining in both terms of execution as well as its ability to fulfill our vision of storing, exploring, and serving data and insights to thousands of organizations.

We encourage you to read this complimentary report, as you consider HP Vertica to handle your most extreme Big Data analytics initiatives. And, we especially want to thank all of our innovative customers that push us each and every day to build the best, most scalable, and open analytics platform on the planet.

Want to get started with HP Vertica? Download our Community Edition – it’s free up to 1 TB with no time limit.

Vertica Meet Up – 3/12/14

Yesterday’s (3/12/14) meetup at the HP Vertica Cambridge office was a huge success. With some pretty phenomenal networking (and quesadillas) we packed the top floor with people buzzing about everything Big Data and HP Vertica. After the introductions, our VP & GM Colin Mahony kicked off the presentations, sharing with us his passion for analytics, data warehousing, and the exciting news that HP Vertica has made it into Leaders Quadrant of the Gartner Magic Quadrant. Following that, the rest of the team demoed several of the new and exciting innovations HP Vertica has to offer. On a final note, dont’t forget to sign up for the upcoming HP Vertica Big Data Conference 2014 coming this August!

We wanted to thank everyone for showing up last night, both new friends and old, and can’t wait to do it again soon.

Below are some of the snap shots that yours truly took of the festivities, see you again next time!

Get Started With Vertica Today

Subscribe to Vertica