Author Archive

Tales from a Cocktail Party: How Customers Use Vertica

On Wednesday, I gave one of the lightning talks at the 4th Extremely Large Databases Conference (XLDB) at Stanford University about how real customers use Vertica to store and analyze their ‘extremely large’ databases. The 5 minute format is tough.  I had to pick only three customers   but I hope I was able to get across how cool the things they are doing with Vertica are.

Given the proximity to San Francisco, it occurs to me that Vertica is basically in the same business that Levi Strauss was in during the 19th century gold rush: selling supplies to a rapid and lucrative market where there is lots of money to be made. Our modern day gold rush is between businesses to see who can use the massive amounts of data they collect to make the best X possible, where X is anything from ‘television distribution network’ to ‘massive online game’ and everything in between.

Given that we are in the system software business, we sell a (vital) component of an end product and thus I largely don’t see complete applications. I most typically see the SQL that those applications generate, and on most days I am focused on the nitty-gritty details of making it work as fast as possible.  Sometimes, though, I get to step back and see what is really being done with our products and it’s usually quite cool.

So, in this blog post, I wanted to give you some cool cocktail party tidbits about what our customers do with their data…

One telecom company told me that (unsurprisingly) most of those 500 channels on your cable box never get watched.  So most of the time, the company doesn’t waste their network’s bandwidth sending all of the channels to your house.  Instead, they use switched video and then use Vertica to analyze the patterns of who watches what and when in every local distribution group, so that they can make network planning decisions like “how much switched video capacity do we really need for those 100 subscribers?”

Cocktail item: This telco has an (anonymized) record of what station each television box is tuned to for every minute of every day.

Another of our customers analyzes SMS message content for big mobile service providers. To be honest, I don’t know exactly what they are doing with this information, but you might imagine they have interesting data and interesting conclusions, and they use Vertica to parse this data.

Cocktail item: Question: What is the most common text message? Answer: The single letter ‘k.’

My aunt is a devoted online farmer, but I doubt she realizes how much technical firepower is used to analyze her online buying habits. Zynga, the developer of FarmVille, Mafia Wars and a number of other popular online games, has a massive Vertica cluster that they use to analyze past in-game decisions with the goal of making their games more fun to play.  This means you will spend more time playing the games, and hopefully spending more money with Zynga.

Cocktail Item: Zynga’s revenue is massive, estimated by various sources to be $200M or greater in 2009, and also reportedly on track for a killer 2010.  Who would have thought virtual farming or playing a gangster would be so lucrative?

Vertica’s customers are from a wide range of industries – gaming, telecommunications, financial, healthcare, and more – but they share one thing in common.  They all use Vertica to analyze massive amounts of data in real time, converting it into usable information that helps drive business decisions.  And now, I feel like a cocktail!

There Will Come Soft Rains

I am an early riser, especially on the weekends in summer when the sun rises early. This means I get 2-3 hours of time to myself before anyone else

is awake. Much to my wife’s chagrin, I often spend the time catching up on Vertica email and tasks that didn’t quite get done during the week. This is one of the only times where I can write email without getting semi-immediate responses back.

This morning, I did get several emails, though from our automated performance test systems, which reminded of Ray Bradbury’s short story “There Will Come Soft Rains” in which the humans are gone, but their automated systems still remain.

Of course the story’s main points aren’t about software automation early morning emails, but sitting here in the quiet before the day wakes, it seems somehow appropriate.

The automation systems we have at Vertica please me. Of course I am biased — I mostly wrote what we call “ServerFarm” which runs regression tests regularly (every few checkins) as well as allows developers to run tests on demand on a farm of machines. This system came about like most good productivity enhancing systems do: I was annoyed. Specifically I was annoyed of having to run our 1 hour long test suite on my laptop before checking in. I had taken to saving patches for my changes and running them manually at night so that I didn’t tie up my development machine during the day. ServerFarm started out as a way for me to automate that process and has grown from there.

Automated and regular regression testing is, of course, software best practice, but it is cool to see it working well at Vertica. By lowering the bar for using ServerFarm, even the crustiest of our developers now uses the Farm. The one thing that could be better is reporting of the results, especially over time. Thankfully we have a clever intern this summer helping us out with that. I am totally psyched to see what he comes up with.

Vertica Under the Hood: The Query Optimizer

As we bring our 4.0 release to market, we are starting a series of educational blog posts to provide a in-depth look at Vertica’s core technology. We start with one of our crown jewels – the Vertica Query Optimizer.

The goal of query optimizers in general is to allow users to get the maximal performance from their database without worrying about details of how it gets done.  At Vertica, we take this goal to heart in everything that we build.  From the first day starting out, the Vertica Optimizer team has focused on creating a product that reduces the need for manual tuning as much as possible.  This lets users focus on their business needs rather than tuning our technology.

Before we dive into the unique innovations within our optimizer, let’s get a few simple facts straight:

  • The Vertica Optimizer is not limited to classic Star and Snowflake Schemas – it hasn’t been since version 2.5. Many of our 130+ customers in production today are using non-star schemas with great success.  In fact, our Optimizer easily handles very complicated queries – from workloads as simple as TPCH, containing only relatively simple Star queries with a few tables, to complex queries containing hundreds of joins with mixes of INNER/OUTER joins and a variety of predicates and sub-queries.
  • It is not common and certainly not necessary to have one projection per query to get great performance from Vertica. While the Optimizer understands and chooses the optimal plan in the presence of several choices, few customers have found it necessary to do custom tuning for individual queries except in very unusual circumstances.  It is far more typical to have great performance without such tuning at all.
  • The Vertica Optimizer is the only true columnar optimizer developed from scratch to make best use of a column store engine.  Unlike some other column store vendors, we do not use any part of the Postgres optimizer.

Why? Because fundamentally, we believe that no amount of retrofitting can turn a row-oriented optimizer into a column-oriented one.

For the optimizer geeks out there, here are some of the capabilities that we believe give the Vertica Optimizer that special edge over others, even mature ones:

  • The entire Optimizer is designed as a set of extensible modules so that we can change the brains of the optimizer without rewriting much of the code. This means we can incorporate knowledge gleaned from end-user experiences into the Optimizer, without a lot of engineering effort.  After all, when you build a system from scratch, you can build it smarter and better!
  • Unlike standard optimizers that determine the optimal single-node plan and then introduce parallelizing operators into it as an after thought, our patent-pending optimizer algorithms account for data distribution during the join order enumeration phase of the optimizer. We use sophisticated heuristics based on knowledge of physical properties of the available projections to control the explosion in search space.
  • Unlike standard optimizers that restrict the join search space to left-deep plans, the Vertica Optimizer considers bushy plans very naturally.
  • The Vertica Optimizer is cost-based with a cost-model based not just on I/O but also CPU and Network transfer costs and takes into account the unique details of our columnar operators and runtime environment.
  • The Vertica Optimizer employs many techniques that take advantage of the specifics of our sorted columnar storage and compression – for example, late materialization, compression aware costing and planning, stream aggregation, sort elimination, merge joins, etc.
  • The Vertica Database designer works hand-in-glove with the optimizer by producing a physical design that can take advantage of the many clever optimizations available to the optimizer.

While innovating on the core algorithms, we have also incorporated many of the best practices developed over the past 30 years of optimizer research, such as:

  • Using histograms to calculate selectivity.
  • Optimizing queries to favor co-located joins where possible.  Note that optimizer can handle physical designs with arbitrary distribution properties and uses distribution techniques such as re-segmented or broadcast joins.
  • Transformations such as converting outer joins to inner joins, taking advantage of primary/foreign key and null constraints, sub-query de-correlation, view flattening, introducing transitive predicates based on join keys and automatically pruning out unnecessary parts of the query.

As a testament to the quality of our optimizer, we are proud to say that customers rarely override the plans produced by our optimizer.  This removes an entire class of management from the DBA and letting our algorithms take full advantage of our ever-improving execution engine. That being said, we believe that performance and ease-of-use speak for themselves and so we invite you to Test Drive the Vertica Database on your schema, your queries and your data!

Get Started With Vertica Today

Subscribe to Vertica