Archive for December, 2009

Vertica 3.5 FlexStore: The next generation of Column-stores

Last month Vertica delivered release 3.5 of our analytic database software. Over the past few years, Vertica has continued to innovate and mature its core database functionality with major releases every 6 months. I would like to thank all our customers and partners whose feedback has been instrumental in developing our product. The centerpiece of the Vertica 3.5 release is – FlexStore – our next generation columnar database architecture.  With FlexStore, Vertica now has all the benefits of a columnar database with several of the benefits traditionally considered the forte of traditional row-oriented databases.

There are three main ideas introduced by FlexStore:

  1. grouping of multiple (possibly all) columns into a single file
  2. automatically choosing the disk storage format based on data load patterns
  3. ability to differentiate storage media by their performance characteristics and to enable intelligent placement of data based on usage patterns

Let us look at a couple of practical examples where Vertica can derive huge performance gains from patent-pending FlexStore features.

One of the key innovations in Vertica has been the ability to eliminate the need for batch loading data warehouses – most Vertica customers trickle load data throughout the day, while providing real-time data latency to their end-users.  There are no indexes to build or materialized views to refresh.  Data is trickle loaded into the Write-Optimized-Store (WOS) – a fully queryable in-memory store and over time moved over to Read-Optimized-Store (ROS) on disk. The migration and access to data across the WOS and ROS is completely transparent to the user and managed automatically by the Tuple Mover.  Data from both the WOS & ROS are also automatically combined whenever necessary in any queries.

FlexStore improves the efficiency of this trickle load process by enabling Vertica to choose whether the incoming data is to be stored into a row-oriented format in the WOS or in a row or column oriented format in the ROS, depending on the size of the data  being loaded. This determination is similarly made entirely automatically without any user intervention.  The row-oriented format on disk groups several columns into a single file and applies columnar compression within each file. This reduces the number of file accesses needed to access the data during queries.  Over time, data from multiple small loads are combined if necessary and reorganized into a more highly optimized column oriented format.

FlexStore also allows user control over placement of columns on different media.  This intelligent data placement also provides an opportunity to incorporate the use of solid state drives for database storage in a cost effective manner.  Even if your architecture consists of a homogeneous set of disks, it is well known that storing data on inner v/s outer tracks can result in different performance.  By presenting the inner and outer tracks as two different storage locations to Vertica, FlexStore allows intelligence placement of columns so that frequently accessed columns can be placed on the outer tracks and infrequently used columns can be placed on inner tracks.

Grouping of two or more columns into a single file is also available as a user directive when defining projections.  Columns that are frequently accessed together can be grouped to reduce the number of file accesses necessary to retrieve the data.  Columns that are related in a domain specific way, such as bid and ask values in tick data are candidates to for the grouping directive. Grouping of columns also enables use of interesting compression and encoding techniques across multiple columns.

The grouped column feature can be combined with the data placement feature to fine tune performance or storage efficiency. For instance, columns that are infrequently accessed can be combined into a single group and stored on slower disks or on the inner tracks of a disk.  In this manner, Flexstore lays the foundation of Vertica’s Information Lifecycle Management strategy, where data is differentiated based on its value over its the course of its lifecycle.  You can expect more advances from Vertica in this area over the course of the next few releases.

Vertica 3.5 also introduces our next generation query optimizer and execution engine with improved performance and many new optimizations. For example, because Vertica stores data sorted, merge joins are extremely efficient in Vertica and in Vertica 3.5, merge joins can be now be used in many more situations. Deletes and updates, traditionally considered a weakness of column store architectures are not only faster but also, have little impact on query performance. Several of our early-adopter customers reported 30% faster out of the box query performance and over 100x improvement on delete performance after upgrade to Vertica 3.5!

Along with performance and scalability, Vertica continues to invest in enhancing and simplifying the operational capabilities of the database and improving overall user experience. Vertica 3.5 introduces a rich set of monitoring tables fully query-able using SQL.

Watch this space to learn about more upcoming technical innovations from Vertica as we continue to build the fastest and simplest database for analytics!

Get Started With Vertica Today

Subscribe to Vertica