Vertica

Archive for the ‘updates & deletes’ Category

What’s New in Vertica 4.1

Vertica announced a new version of its Vertica Analytics Platform software, version 4.1, on Tuesday, November 9th at the TDWI Orlando. You can read more about Vertica 4.1 in the press release, but I wanted to give you a few of the highlights of the features that make 4.1 so important to our customers, or anyone looking to make the most of their data.

What’s New in Vertica 4.1 from Vertica Systems on Vimeo.

Here are some highlights from the video:

What’s New Intro
Third-Party Tools Integration – 0:43
SQL Macros – 2:14
Enhanced Security & Authentication – 2:47
Updates & Deletes – 3:27
Vertica 4.1 Wrap Up – 3:50
We hope you enjoy the video!

The Vertica Connector for Hadoop

At Hadoop World today, Vertica is announcing our second generation connector for Hadoop and Pig.  You can read more about the Vertica Connector for Hadoop in the press release, but we wanted to give you the basics about the new Connector here on the blog.

About a year ago, Vertica was the first analytic database company to deliver a parallel connector for Hadoop.  Today, Vertica has over 25 customers using Vertica and Hadoop together.  These are real customers, with real-world applications, who are combining the strengths of Hadoop and the Vertica Analytics Platform.

And with Vertica Connector for Hadoop, users have unprecedented flexibility and speed in loading data from Hadoop to Vertica and querying data from Vertica in Hadoop.  And, as part of our ongoing commitment to Hadoop, we’ve announced an upcoming backup option to Hadoop File System (HDFS), and plans to publish a set of libraries that will allow Hadoop users to leverage Vertica’s advanced sorting, compression and encoding natively in the backup files on HDFS for additional processing and data exploration.

The Vertica 4.0 Connector for Hadoop is open source, supported by Vertica, and available for download at here.

You can read more on the web site, but here’s a short summary of what Vertica 4.0 Connector for Hadoop offers right now:

  • Enhanced integration of Vertica 4.0 with stable versions of Hadoop and Pig
  • Improved transfer performance

And, in a separate release available in the first half of 2011, we’ll be providing:

  • An innovative, low-cost backup option
  • Libraries to read Vertica’s native format from HDFS

As part of our Hadoop push, we’re going to be presenting a couple of webinars around our combined Vertica/Hadoop solution.  Watch for an announcement here on the blog or in your email (if you are on the Vertica list).

Vertica Under the Hood: The Query Optimizer

As we bring our 4.0 release to market, we are starting a series of educational blog posts to provide a in-depth look at Vertica’s core technology. We start with one of our crown jewels – the Vertica Query Optimizer.

The goal of query optimizers in general is to allow users to get the maximal performance from their database without worrying about details of how it gets done.  At Vertica, we take this goal to heart in everything that we build.  From the first day starting out, the Vertica Optimizer team has focused on creating a product that reduces the need for manual tuning as much as possible.  This lets users focus on their business needs rather than tuning our technology.

Before we dive into the unique innovations within our optimizer, let’s get a few simple facts straight:

  • The Vertica Optimizer is not limited to classic Star and Snowflake Schemas – it hasn’t been since version 2.5. Many of our 130+ customers in production today are using non-star schemas with great success.  In fact, our Optimizer easily handles very complicated queries – from workloads as simple as TPCH, containing only relatively simple Star queries with a few tables, to complex queries containing hundreds of joins with mixes of INNER/OUTER joins and a variety of predicates and sub-queries.
  • It is not common and certainly not necessary to have one projection per query to get great performance from Vertica. While the Optimizer understands and chooses the optimal plan in the presence of several choices, few customers have found it necessary to do custom tuning for individual queries except in very unusual circumstances.  It is far more typical to have great performance without such tuning at all.
  • The Vertica Optimizer is the only true columnar optimizer developed from scratch to make best use of a column store engine.  Unlike some other column store vendors, we do not use any part of the Postgres optimizer.

Why? Because fundamentally, we believe that no amount of retrofitting can turn a row-oriented optimizer into a column-oriented one.

For the optimizer geeks out there, here are some of the capabilities that we believe give the Vertica Optimizer that special edge over others, even mature ones:

  • The entire Optimizer is designed as a set of extensible modules so that we can change the brains of the optimizer without rewriting much of the code. This means we can incorporate knowledge gleaned from end-user experiences into the Optimizer, without a lot of engineering effort.  After all, when you build a system from scratch, you can build it smarter and better!
  • Unlike standard optimizers that determine the optimal single-node plan and then introduce parallelizing operators into it as an after thought, our patent-pending optimizer algorithms account for data distribution during the join order enumeration phase of the optimizer. We use sophisticated heuristics based on knowledge of physical properties of the available projections to control the explosion in search space.
  • Unlike standard optimizers that restrict the join search space to left-deep plans, the Vertica Optimizer considers bushy plans very naturally.
  • The Vertica Optimizer is cost-based with a cost-model based not just on I/O but also CPU and Network transfer costs and takes into account the unique details of our columnar operators and runtime environment.
  • The Vertica Optimizer employs many techniques that take advantage of the specifics of our sorted columnar storage and compression – for example, late materialization, compression aware costing and planning, stream aggregation, sort elimination, merge joins, etc.
  • The Vertica Database designer works hand-in-glove with the optimizer by producing a physical design that can take advantage of the many clever optimizations available to the optimizer.

While innovating on the core algorithms, we have also incorporated many of the best practices developed over the past 30 years of optimizer research, such as:

  • Using histograms to calculate selectivity.
  • Optimizing queries to favor co-located joins where possible.  Note that optimizer can handle physical designs with arbitrary distribution properties and uses distribution techniques such as re-segmented or broadcast joins.
  • Transformations such as converting outer joins to inner joins, taking advantage of primary/foreign key and null constraints, sub-query de-correlation, view flattening, introducing transitive predicates based on join keys and automatically pruning out unnecessary parts of the query.

As a testament to the quality of our optimizer, we are proud to say that customers rarely override the plans produced by our optimizer.  This removes an entire class of management from the DBA and letting our algorithms take full advantage of our ever-improving execution engine. That being said, we believe that performance and ease-of-use speak for themselves and so we invite you to Test Drive the Vertica Database on your schema, your queries and your data!

Get Started With Vertica Today

Subscribe to Vertica