Vertica

Archive for the ‘Vertica 7’ Category

Facebook and Vertica: A Case for MPP Databases

I have just come back from a business trip to China where I visited several large Chinese telecom customers to talk about the recent big Vertica win at Facebook. Two questions these customers had constantly asked me were: What’s the future of MPP databases? Will Hadoop become one database that rules the whole analytic space?
These seemed to be odd questions considering that Facebook, one of the juggernauts in the Open Source community in general and Hadoop world in particular, has recently picked Vertica to be the anchoring database to satisfy its ever-increasing analytical demands and has since put the biggest Vertica cluster (with ~300 nodes and effective data storage of 6+ PB) into production. It tells me that if a Hadoop power-house and the inventor of Hive (the most popular SQL-on-Hadoop database) like Facebook, with its teams of brilliant programmers and bound-less resources, still thinks that it needs a MPP database like Vertica in its “Big Data” technology stack in the foreseeable future, it sends a clear and strong message. Obviously Facebook thinks the answers to both questions are NO, not so fast. In the meantime, Facebook will continue to use Hive/HBase and other Hadoop technologies for the tasks they are good at: ETL, handling unstructured data and conducting complex data-mining types of deep analysis.

So why does Facebook think that it needs a MPP database? Facebook has been running an EDW (Oracle Exadata ~50TB) for some time but feels that their existing EDW is running out of steam because it cannot keep up with the rapid data growth especially as mobile platform becomes more and more popular. Facebook would like to take advantage of the established commercial MPP databases for lower cost, robust eco-system, improved data security and better scalability/performance. Their main reasons for going with an MPP database can be summarized as follows:

  • Rapidly expanding analytical needs at Facebook,
  • MapReduce is too slow, plus security concerns
  • In-Memory Database (IMDB) is too expensive and too immature
  • Current SQL-on-Hadoop databases are not good enough and too immature

Facebook has invited four MPP vendors (including Vertica) to participate in two rounds of competitive POCs before declaring Vertica as the ultimate winner on the basis of Vertica’s low TCO, ease of management and superior ad-hoc query performance.

There have recently been many SQL-on-Hadoop offerings in the last couple of years, both open source and proprietary, including but not limited to Hive, Hadapt, Citus, Impala, Stinger and Apache Drill. Though their effort in making Hadoop more SQL friendly is welcome, my general impression is that they are still a long way off in terms of closing the performance gap to the popular MPP databases in the marketplace (e.g. Vertica). Depending on your perspective, you may argue that this gap is not exactly getting narrower at any pace that foretells its closing any time soon.

There is strong reason for me to believe that the SQL-on-Hadoop camp may have over-estimated the effectiveness of bolting/wrapping around open source SQL optimizers (e.g. PostgreSQL) to HDFS and severely underestimated the effort and time it takes to produce an enterprise quality MPP database whose core optimizer/execution engine technology requires years of intensive real world use to mature, and 100s (if not 1000s) of customers to validate and millions of cases to test and train. This is certainly more about practice than theory or concept. Query optimization is fundamentally a software problem and there is a limit to what any “brute force” hardware-based approach can do. To echo and rephrase what the authors of the MapReduce and Parallel Databases: Friends or Foes?” said, smart software (like MPP databases) is still a good idea in the age of Hadoop and “Big Data” and there is plenty of room and opportunity for MPP databases to thrive for a long time to come….

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.

Database Designer in HP Vertica 7

With the HP Vertica 7, you can use Database Designer with Management Console. As in previous releases, you can still run Database Designer from Admin Tools, but its integration with Management Console offers an additional easy-to-use method for creating a database design.

Database Designer optimizes query performance and minimizes the disk storage that the database uses. It does this by analyzing your logical schema, sample data, and, optionally, your sample queries. Then, Database Designer creates a physical schema design (a set of projections) that can be deployed automatically or manually.

Check out the following demo to get started with the new Database Designer feature in Management Console.

* When using this new feature, remember that, to create the design, you must be a DBADMIN user or have the DBUSER role assigned to you with write access to the tables in your schema.


For more information, visit www.vertica.com/documentation.

Viewing Query Plans and Profile Data in Management Console 7

With HP Vertica 7.0, Management Console offers a new way to visualize your query plans and get profile information about your queries. You can run EXPLAIN on a query using Management Console’s Query Plan Visualizer, and it provides you with a visual representation of the query plan with the lowest cost. Management Console highlights and links to key information in the output, so you can spot issues at a glance. When you profile a query, Management Console provides a graphical view of what resources were used when HP Vertica executed the query.

Our new video tutorial walks you through using Management Console to view query plans and profile data. See the video below, and make sure to check out the other video tutorials we offer here.


You can also read more about the feature in this blog post: Visualizing Your Query Plan with Management Console 7.

The ConvergedSystems 300 video


Yesterday, we posted a new video to the site featuring our very own product marketing manager Steve Sarsfield. The video breaks down the key advantages of the new ConvergedSystems 300 as an all-in-one platform for your big data analytics needs. Check out the video above or go to the main ConvergedSystems 300 here to learn more.

HP Vertica Tutorials You Asked, We Listened.

Over recent months, we’ve heard our community request short, instructional videos and tutorials to help them learn more about the rich and powerful features of the HP Vertica Analytics Platform.

Well, we heard you, and have developed and posted some initial videos to help you maximize your investment in HP Vertica. We’ve posted a new videos that highlight new features in HP Vertica 7 (“Crane”). Among the videos we’ve posted are:

  • A two-part series on the HP Vertica Connector for HCatalog. Part 1 provides an overview. Part 2 includes a demonstration:
  • A demonstration of the power of HP Vertica Flex Zone:
  • A tuturial on how to run Database Designer in Management Console 7:
  • A five-part series that demonstrates how to set up HP Vertica 7 with Amazon Web Services
  • An introduction to HP Vertica 7 Fault Groups

You can see these and all video tutorials here. Here’s a sample:


Stay tuned in the weeks ahead. We’ll be posting new videos that highlight new features in Management Console, how to use Fault Groups to set up large clusters, and more.

We’d love to hear more from you! If you have any suggestions or ideas for topics for future videos, let us know. You can post your ideas on our forum at community.vertica.com, or you can send ideas to vertica-docfeedback@hp.com

We’re committed to your success! Check back soon to see what’s new in HP Vertica Tutorials!

Gartner Magic Quadrant Released – HP Vertica Enters the Leader’s Quadrant

The Gartner Magic Quadrant has long been recognized as critical research that organizations rely on to weigh, evaluate, and ultimately select vendors as the infrastructure for their IT initiatives.

Yesterday, Gartner released the 2014 Gartner Magic Quadrant for Data Warehouse and Database Management Systems. We are very proud to announce that the HP Vertica Analytics Platform has entered the Leaders Quadrant, gaining in both terms of execution as well as its ability to fulfill our vision of storing, exploring, and serving data and insights to thousands of organizations.

We encourage you to read this complimentary report, as you consider HP Vertica to handle your most extreme Big Data analytics initiatives. And, we especially want to thank all of our innovative customers that push us each and every day to build the best, most scalable, and open analytics platform on the planet.

Want to get started with HP Vertica? Download our Community Edition – it’s free up to 1 TB with no time limit.

Visualizing Your Query Plan with Management Console 7

You run your newly crafted query and patiently wait for the results to appear on the terminal. You stare at your clock, waiting. 1 minute, 2 minutes, then 5, then 10. Your heart sinks. Why is it taking so long,? The query should be done by now, you tell yourself. You built your projections to optimize the joins, you’re sure there is enough memory to avoid spilling to disk. You start to doubt yourself at this point, so you’ll check to make sure.

You decide to run EXPLAIN to see if there’s anything obvious that the optimizer did incorrectly. You open a separate VSQL window and run EXPLAIN. You can see that there’s a hash-join at Path ID 4-that’s not good. You wonder, why isn’t this a merge-join? And, you could have sworn you were joining on sorted columns. You’d better check the sort order on the columns for your projections. What’s the query for that, again, you wonder. Well, since that may not be the bottleneck anyway; you decide to check the profile information for the query. You try to remember– which table stores profile information? EXECUTION_ENGINE_PROFILES, or QUERY_PLAN_PROFILES?”? What columns? Probably should select on all of them and see which columns I need.

And once you do find the columns you need, you may realize that trying to understand VSQL profile-metric outputs is not how you want to spend your afternoon.

But that doesn’t mean you are forever doomed to wade through dense text to get your answers…

Welcome to Management Console Query Plan Visualizer!

In the HP Vertica Analytics Platform 7., Management Console (MC) offers a simple interface, the Query Plan Visualizer, for getting plan or profile information on the your query. The Query Plan Visualizer provides a graphical view of the details of your query, allowing you to quickly identify and understand problem areas.

Explain SS

Let’s examine the same query mentioned previously using MC’s Query Plan Visualizer. Just paste in the query text and click Explain Plan . The results are shown here:

explain run ss

MC’s EXPLAIN output maintains the structure of the plan, and also highlights important information such as “No Statistics,” while linking to relevant metadata for the projections used and columns materialized. For example, we can see that Path ID 3 is a hash join, but now we can actually find out why.

Explain_Screenshot3

ExplainScreenshot4

So now we know why there was a hash-join instead of a merge-join. But how do we see how the query was actually executed? We can get the profile metrics for your query using either of these methods:

  1. We can click “Profile Query.” MC then executes the query and displays profile information for it once it completes. However, our query takes a while to run. This option may take a bit of time…
  2. We can also examine past query activity via the Activity tab located at the bottom of the screen. Using the Activity tab, we can retrieve profile information for queries we’ve already run.

In this case, we’ll choose the second option.

To do so:

    1. Go to the Activity tab,
    2. Select Queries from the dropdown menu for our chart type. This chart shows us a graph of number of queries run in the past.

Because we know our query was run recently, we’ll see it at the right side of the graph. Clicking that location brings us a table of query activity from the past few minutes. Sorting the queries by Elapsed brings our long-running query to the top.

run query ss

Clicking Explain/Profile on the far right of the table brings us back to the Query Plan Visualizer page and requests the profile information from the HP Vertica database.

query profile ss

The screen above shows a collapsed view of the profile information, which hides projection and column information. Metric information for each path appears to the right of the plan. We can measure 5 types of metrics for each path: disk usage, memory usage, data sent, data received, and time spent. Each blue bar represents the relative usage of a metric among all other paths. For example, in the Time column, we can see that the row of Path ID 3 has the largest blue bar (at about 35% fullness). This means, that out of all the paths, Path ID 3 took 35% of the total execution time. Now we can easily see that it was indeed our hash-join that took the most amount of time. Additionally, we can see that the disk-read on Path ID 6 was also responsible for a significant portion of the execution time.

So what about that pie chart? The pie chart shows how long the query took in each of its phases. As the query runs, it goes through multiple phases before it completes. Ideally, the query will spend most of its time in the “execution phase,” as the other phases should happen relatively quickly. So if your pie chart is mostly green, that’s good. Think of the chart as a sanity check that validates whether your query spent most of its time where it should.

Additionally, if you want to track the progress of a long running query, you can profile it with “Enable Monitoring” checked. With monitoring enabled, the counter values on the right hand side update at the set interval time, as well as show how much they increased or decreased by since the previous update. So rather than waiting for the query to complete profiling before you can see profile metric information, you can get the latest information on what paths are currently being processed at your set update-interval.

monitoring ss

By removing the need to know the specific queries required for getting profile information, and by making relevant data (projection metadata, query events) just a click away, the MC Query Plan Visualizer can greatly simplify the process of getting and understanding profiling information. If you’re still using version pre-7.0 version of MC, be sure to upgrade to a new Vertica 7.0 and give this a whirl :-)

Get Started With Vertica Today

Subscribe to Vertica