Vertica

Archive for the ‘VLDB’ Category

When UPDATE is actually INSERT

At the VLDB 2012 conference a few weeks ago, we had a chance to listen to Jiri Schindler giving a tutorial about NoSQL.  His interesting and informative presentation covered the fundamental architecture and I/O usage patterns of RDBMS systems and various NoSQL data management systems, such as HBase, Cassandra, and MongoDB.

During the presentation, Schindler listed basic I/O access patterns for columnar databases using the slide below. It is hard to summarize the operation of the various columnar database systems on a single slide, and Schindler did a great job given the constraints of the presentation. However, while his characterization might hold for other columnar databases, the Vertica Analytic Database  has a different I/O pattern for UPDATEs which we wanted to explain in more detail.

First, Vertica does not require synchronous I/O of a recovery log. Unlike most other RDBMS systems,  Vertica implements durability and fault tolerance via distributed replication.

Second, since Vertica never modifies storage in place, it avoids the other I/O intensive operations referenced in the slide.

When a user issues an UPDATE statement, Vertica performs the equivalent of a delete followed by an insert. The existing row is deleted by inserting a Delete Vector (a small record saying that the row was deleted), and a new copy of the row with the appropriately updated columns is inserted. Both the Delete Vector and the new version of the row are stored in a memory buffer known as the WOS (write optimized store). After sufficient data has accumulated in the WOS from INSERTs, UPDATEs, DELETEs, and COPYs (bulk loads), they are moved in bulk to disk storage known as the ROS.

It is important to note that existing files in the ROS are not modified while data is moved from WOS to the ROS – rather a new set of sorted and encoded column files is created. To avoid a large number of files accumulating over time, the Tuple Mover regularly merges column files together using an algorithm that limits the number of times any tuple is rewritten and also uses large contiguous disk operations, which is quite efficient well on most modern file and disk systems.

This arrangement has several advantages:

  1. From the user’s point of view, the update statement completes quickly and future queries get the expected answer (by filtering out the original values at runtime using the Delete Vectors).
  2. The cost of sorting, encoding, and writing column files to disk is amortized over a large number of rows by utilizing the in memory WOS.
  3. I/O is always in proportion to the number of rows inserted or modified – it is never the case that an update of a small number of rows causes I/O on a significant amount of previously stored data.

More details about how data is stored and Vertica’s overall architecture and design decisions, please consider reading our VLDB 2012 paper.

 

 

VLDB 2012 – Istanbul Bound!

I’ll be giving a talk next week about Vertica at VLDB 2012. If you happen to be in Istanbul, please stop by (Nga and I have a T-Shirt for you). Our paper can be found at the VLDB website:

The Vertica Analytic Database: C-Store 7 Years Later

http://vldb.org/pvldb/vol5/p1790_andrewlamb_vldb2012.pdf

At Vertica/HP, we pride ourselves on cutting edge technology, informed by the latest academic research, applied with cutting edge software craftsmanship. Over the years, we have benefited by closely collaborating with academic researchers, befitting a company founded by Mike Stonebraker.

Vertica Systems was originally founded to commercialize the ideas from the C-Store research project developed at MIT and other top universities and which was originally described in a VLDB 2005 paper. This year I am proud we have come full circle and published a rigorous technical description of the Vertica Analytic Database in VLDB 2012.

We look forward to many more years of technical breakthroughs and cool innovation in analytic database systems. Speaking of which, we are hiring! If you are a superstar (cliché, I know) and are interested in working with us to

  • Design, build and test challenging distributed systems, database internals, and analytics systems software
  • Bring one of the very few new database engines to new customers who desperately need it

Drop us a line at marcia.langdon@hp.com

Get Started With Vertica Today

Subscribe to Vertica