It’s another weekly edition of our Meet the Team feature! Our very own Amy Miller from support, shares with us the story of her career here at Vertica, what she loves about her job and her team, and how she wins national hockey tournaments.
I have just come back from a business trip to China where I visited several large Chinese telecom customers to talk about the recent big Vertica win at Facebook. Two questions these customers had constantly asked me were: What’s the future of MPP databases? Will Hadoop become one database that rules the whole analytic space?
These seemed to be odd questions considering that Facebook, one of the juggernauts in the Open Source community in general and Hadoop world in particular, has recently picked Vertica to be the anchoring database to satisfy its ever-increasing analytical demands and has since put the biggest Vertica cluster (with ~300 nodes and effective data storage of 6+ PB) into production. It tells me that if a Hadoop power-house and the inventor of Hive (the most popular SQL-on-Hadoop database) like Facebook, with its teams of brilliant programmers and bound-less resources, still thinks that it needs a MPP database like Vertica in its “Big Data” technology stack in the foreseeable future, it sends a clear and strong message. Obviously Facebook thinks the answers to both questions are NO, not so fast. In the meantime, Facebook will continue to use Hive/HBase and other Hadoop technologies for the tasks they are good at: ETL, handling unstructured data and conducting complex data-mining types of deep analysis.
So why does Facebook think that it needs a MPP database? Facebook has been running an EDW (Oracle Exadata ~50TB) for some time but feels that their existing EDW is running out of steam because it cannot keep up with the rapid data growth especially as mobile platform becomes more and more popular. Facebook would like to take advantage of the established commercial MPP databases for lower cost, robust eco-system, improved data security and better scalability/performance. Their main reasons for going with an MPP database can be summarized as follows:
Facebook has invited four MPP vendors (including Vertica) to participate in two rounds of competitive POCs before declaring Vertica as the ultimate winner on the basis of Vertica’s low TCO, ease of management and superior ad-hoc query performance.
There have recently been many SQL-on-Hadoop offerings in the last couple of years, both open source and proprietary, including but not limited to Hive, Hadapt, Citus, Impala, Stinger and Apache Drill. Though their effort in making Hadoop more SQL friendly is welcome, my general impression is that they are still a long way off in terms of closing the performance gap to the popular MPP databases in the marketplace (e.g. Vertica). Depending on your perspective, you may argue that this gap is not exactly getting narrower at any pace that foretells its closing any time soon.
There is strong reason for me to believe that the SQL-on-Hadoop camp may have over-estimated the effectiveness of bolting/wrapping around open source SQL optimizers (e.g. PostgreSQL) to HDFS and severely underestimated the effort and time it takes to produce an enterprise quality MPP database whose core optimizer/execution engine technology requires years of intensive real world use to mature, and 100s (if not 1000s) of customers to validate and millions of cases to test and train. This is certainly more about practice than theory or concept. Query optimization is fundamentally a software problem and there is a limit to what any “brute force” hardware-based approach can do. To echo and rephrase what the authors of the MapReduce and Parallel Databases: Friends or Foes?” said, smart software (like MPP databases) is still a good idea in the age of Hadoop and “Big Data” and there is plenty of room and opportunity for MPP databases to thrive for a long time to come….
Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.
With the HP Vertica 7, you can use Database Designer with Management Console. As in previous releases, you can still run Database Designer from Admin Tools, but its integration with Management Console offers an additional easy-to-use method for creating a database design.
Database Designer optimizes query performance and minimizes the disk storage that the database uses. It does this by analyzing your logical schema, sample data, and, optionally, your sample queries. Then, Database Designer creates a physical schema design (a set of projections) that can be deployed automatically or manually.
Check out the following demo to get started with the new Database Designer feature in Management Console.
* When using this new feature, remember that, to create the design, you must be a DBADMIN user or have the DBUSER role assigned to you with write access to the tables in your schema.
With HP Vertica 7.0, Management Console offers a new way to visualize your query plans and get profile information about your queries. You can run EXPLAIN on a query using Management Console’s Query Plan Visualizer, and it provides you with a visual representation of the query plan with the lowest cost. Management Console highlights and links to key information in the output, so you can spot issues at a glance. When you profile a query, Management Console provides a graphical view of what resources were used when HP Vertica executed the query.
Our new video tutorial walks you through using Management Console to view query plans and profile data. See the video below, and make sure to check out the other video tutorials we offer here.
Yesterday, we posted a new video to the site featuring our very own product marketing manager Steve Sarsfield. The video breaks down the key advantages of the new ConvergedSystems 300 as an all-in-one platform for your big data analytics needs. Check out the video above or go to the main ConvergedSystems 300 here to learn more.
Over recent months, we’ve heard our community request short, instructional videos and tutorials to help them learn more about the rich and powerful features of the HP Vertica Analytics Platform.
Well, we heard you, and have developed and posted some initial videos to help you maximize your investment in HP Vertica. We’ve posted a new videos that highlight new features in HP Vertica 7 (“Crane”). Among the videos we’ve posted are:
You can see these and all video tutorials here. Here’s a sample:
We’d love to hear more from you! If you have any suggestions or ideas for topics for future videos, let us know. You can post your ideas on our forum at community.vertica.com, or you can send ideas to firstname.lastname@example.org
We’re committed to your success! Check back soon to see what’s new in HP Vertica Tutorials!
The Gartner Magic Quadrant has long been recognized as critical research that organizations rely on to weigh, evaluate, and ultimately select vendors as the infrastructure for their IT initiatives.
Yesterday, Gartner released the 2014 Gartner Magic Quadrant for Data Warehouse and Database Management Systems. We are very proud to announce that the HP Vertica Analytics Platform has entered the Leaders Quadrant, gaining in both terms of execution as well as its ability to fulfill our vision of storing, exploring, and serving data and insights to thousands of organizations.
We encourage you to read this complimentary report, as you consider HP Vertica to handle your most extreme Big Data analytics initiatives. And, we especially want to thank all of our innovative customers that push us each and every day to build the best, most scalable, and open analytics platform on the planet.
Want to get started with HP Vertica? Download our Community Edition – it’s free up to 1 TB with no time limit.