Vertica

Vertica's Blog

The Power of Projections – Part 2

By Colin Mahony and Shilpa Lawande

Part II – Understanding the Simplicity of Projections and the Vertica Database Designer™

In Part I of this post, we introduced you to the simple concept of Vertica’s projections.  Now that you have an understanding of what they are, we wanted to go into more detail on how users interface with them, and introduce you to Vertica’s unique Database Designer tool.

For each table in the database, Vertica requires a minimum of one projection, called a “superprojection”. A superprojection is a projection for a single table that contains all the columns and rows in the table.  Although the data may be the same as a traditional base table, it has the advantages of segmentation (spreading the data evenly across the nodes in the cluster), sorting, and encoding (compressing the size of the data on a per column basis).  This leads to significant footprint reduction as well as load and query performance enhancements.  To give you a sense of the impact that Vertica’s projections have on size, most Vertica customers have at least a 50% reduction in footprint thanks to our compression.  This includes the high availability copy and on average 3-5 projections.  Again, contrast this to traditional row-store databases ballooning upwards of 5x their original size and that is a 10:1 difference in Vertica.

To get your database up and running quickly, Vertica automatically creates a default superprojection for each table created through the CREATE TABLE and CREATE TEMPORARY TABLE statements. This means that if database admins and users never want to know about a projection, they don’t have to – Vertica will automatically handle it under the covers. To further illustrate this point, users can simply pass in projection parameters such as Order By, Encodings, Segmentation, High Availability, and Partitioning right after the CREATE TABLE statement, never interfacing directly with a projection under the hood.

By creating a superprojection for each table in the database, Vertica ensures that all SQL queries can be answered. Default superprojections alone will do far better than a row-store, however, by themselves they may not fully optimize database performance and Vertica’s full potential.  Vertica recommends that you start with the default projections and then use Vertica’s nifty Database Designer™  to optimize your database.  Database Designer creates new projections that optimize your database based on its data statistics and the queries you use. Database Designer:

1. Analyzes your logical schema, sample data, and sample queries (optional).
2. Creates a physical schema design (projections) in the form of a SQL script that can be deployed automatically or manually.
3. Can be used by anyone without specialized database knowledge (even business users can run Database Designer).
4. Can be run and re-run anytime for additional optimization without stopping the database.

Designs created by the Database Designer provide exceptional query performance. The Database Designer uses sophisticated strategies to provide excellent ad-hoc query performance while using disk space efficiently. Of course, a proficient human may do even better than the Database Designer with more intimate knowledge of the data and the use-case – a small minority of our customers prefer to do manual projection design and can usually get a good feel for it after working with the product for a few weeks.

We’ve heard people ask if we need a projection for each query in Vertica, which we absolutely do not! Typically our customers use 3-5 projections and several are using the single superprojection only. A typical customer would have the superprojection along with a few smaller projections (often comprised of only a few columns each).  Unlike MVs and indexes, projections are cheap to maintain during load and due to Vertica’s compression, the resulting data size tends to be 5-25x smaller than the base data. Depending on your data latency needs (seconds to minutes) and storage availability you could choose to add more projections to further optimize the database.  Also important to note is that Vertica does not charge extra for projections, regardless of how many are deployed.  So whether a customer has 1 or 50 projections, their license fees are the same – entirely based on raw data.

As you can see, projections are very easy to work with, and if you are a business analyst who doesn’t know SQL/DDL, that’s okay, we created a tool that designs, deploys and optimizes the database automatically for you.  Our objective from day one has always been to enable customers to ask more questions and get faster answers from their data without having to constantly tune the underlying database.  Part III of this post goes into more detail on projections versus indexes and materialized views.

Read the rest of the 3-part series…

The Power of Projections – Part 1: Understanding Projections and What They Do
The Power of Projections – Part 3: Comparing and Contrasting Projections to Materialized Views and Indexes

The Power of Projections – Part 1

By Colin Mahony and Shilpa Lawande

Part I: Understanding Projections and What They Do

Many of us here at Vertica have been amazed and frankly flattered at how much FUD our competitors are putting out there regarding Vertica’s “projections”.  Having heard some incredibly inaccurate statements about them, we’ve decided to clarify what they are, how and why we have them, and the advantages they bring.  Actually, projections are a pivotal component of our platform, and a major area of differentiation from the competition.  Most importantly, Vertica’s customers love the benefits projections bring! In an effort to provide you with as much detail as possible, this blog is broken up into three posts with Parts II and III being more technical.

First, Read More »

On Both Sides of the Internship

Author: Lyric Doshi

Three cheers for Vertica Summer Interns 2011! .

As summer comes to an end, we bid goodbye to yet another amazing crop of summer interns. This years interns Hieu, Zhongliang, Ruchika and Zhijie were MS/PhD students from different schools along the east coast.  They worked on projects to extend the Vertica SDK, enhance Hadoop/Pig connectivity, and create internal developer productivity tools.  We plan to incorporate much of this work into a future major release of Vertica. In fact, there was so much excitement surrounding their work that in addition to the traditional presentation to engineering, they were asked to present to the entire company after having lunch with Vertica VP&GM Chris Lynch.

Once a picky an intern Read More »

The Right Tool for the Job: Using Hadoop with Vertica for Big Data Analytics

by Mingsheng Hong, Vertica Product Marketing Engineer

I have an entrepreneur friend who used to carry a butter knife around.  He claimed this “almighty” tool was the only one he ever needed!  While the butter knife does serve a wide range of purposes (especially with a stretch of the imagination), in practice it doesn’t always yield optimal results.  For example, as a screwdriver, it may work for common screws, but certainly not a Phillips (unless you push down very hard and hope not to strip the screw).  As a hammer, you may be able to drive finishing nails, but your success and mileage may Read More »

Reports of SQL’s Death Are Greatly Exaggerated

Apache Log Analysis in Vertica .

I am a proud new father and of course the first thing I did was to put pictures of my daughter online on our webserver because I wanted to see who had been looking at them. I could have gone the Google Analytics route, but being a geek I wanted to explore the data myself rather than just get a static report.

Before Vertica, for this kind of analysis I probably would have written a perl script (because it has the best regexp support of any language I know), but as soon as I start doing anything more complicated than summarizing, things get ugly quickly. Specifically I wanted to group the web Read More »

Christopher Lynch Commencement Address to the Bentley University McCallum Graduate School of Business

Christopher Lynch, VP and General Manager of Vertica, an HP Company, delivered the commencement address to the Bentley University McCallum Graduate School of Business on Saturday, May 21st. Knowing Chris and appreciating the enthusiasm, passion, and commitment to all things he undertakes, I wouldn’t have missed the opportunity to watch the commencement. As is usual with Chris, this was not your typical commencement speech. I think the team at Bentley summed it up very nicely in the Bentley University recap of the day:

In an impassioned speech at the 36th annual commencement ceremony, Lynch recounted how early experiences shaped his choices. “I grew up modestly, but in a home full of love and so I was happy… therefore Read More »

Monte Carlo Analysis Using the Vertica Analytics Platform: Part 1

This post is part of a new series of blogs on harnessing the power of the Vertica Analytics Platform to solve problems that, at first glance, do not appear to be database-centric.

Introduction

The main ingredient in any Monte Carlo method is, as the name suggests, some process akin to going to the casino and repeatedly throwing the dice to see if the odds are in your favor or stacked towards the house.  While this description is simple, the true beauty lies in the application of brute force; instead of applying a complex analysis to the rules and physics of a game, just go play it a number of times and see what happens.

Illustrated graphically, suppose we have a Read More »