Couple months ago I went to SIGMOD 2012. One of the big award winners there was Bruce Lindsay (IBM Fellow Emeritus), a true patriarch of relational databases. (System R; enough said!)
I was somehow drawn to him before I figured out his name, and before I learned that he was an award winner. Maybe it was the hairdo and mannerisms.
Or maybe it was how he asked the presenters of the paper on “MCJoin” something along the lines of “So, I’ve written a few join algorithms in my day and one of the things that set me back a few months each time was OUTER JOINs”. Which, in my day, set me back a few months.
Back to the awards. Each recipient gave a talk. Bruce gave a very interesting presentation covering RDBMS, how it built up to something useful over the years, and then considered whether we are “losing our way”. I was a bit surprised that he listed “column stores” as a “detour” on the path of RDBMS progress. This is his slide (and, as you view it, try imagine someone in the row in front of you cackling about how Mike Stonebraker would react to it…):
I loved it. Well, maybe I loved it up until the bit about column stores. This part just doesn’t make sense to me. Maybe it is time to go over the points one more time.
- Good for “write once data”. Yes, Vertica is. Though I doubt it is just because we are a column store; probably other design decisions are in play here. But I would point out that there is a real PILE of write once data. Think about it. For each conventional “transaction” where someone buys something, accounts debited, inventory levels adjusted, etc., how many clicks were made? How many ads were presented? How many things were examined, with no purchase? How many choices were presented and not even contemplated by the end user? All of these things are just logs of what happened; write once data. That log of web server accesses? It is a log of what happened, never corrected. That bit about every click made in every game on Facebook? Write once. The database of video on demand rentals, plays, pauses for bio breaks? Write once. The database of CDR/IPDR/xDRs? Write once. The history of all stock trades? Corrected infrequently. When you get to “big data”, you deal with not just what happened, but what was contemplated, and what was offered, and what could have happened. It is usually a history. A log. Write once.
- Lack of fast joins (and I think he verbally mentioned aggregates). At this point I should probably say that Bruce talked about “Money Queries”… those analytics that help you answer tough questions that will help you derive value from the data / make more money, usually involving joins and aggregates. (I paraphrase) “I’m a liberal, and we don’t exploit anybody! But it’s just data. It doesn’t have rights! You can exploit it!” So back to joins in particular… Well, because we can pick up columns we need from disk after the join is done and rows are filtered out, we can sometimes do joins faster than row stores. But often it is a matter of the join algorithm written by our smart guys vs. the algorithm written by the “Brand X” smart guys. It’s why we have a “patent pending” optimizer, sideways passing scheme in the executor, and so on. Then throw in our scale-out/MPP design, patent pending distributed aggregation techniques, analytics, timeseries extensions, etc., and you have a solid platform for monetizing/exploiting your “big data”.
- Projects. This is actually the area where Vertica does the best. If you only need a subset of columns to answer a query, you get an instant column-store boost! (And if you use them all, you don’t lose anything.) To make a column store suck at projects, you’d have to order the tuples differently in each column, requiring joins to reconstitute the tuple. (We don’t do it that way, because that would be silly and would be as bad as Bruce says!)
- Updates. A column store doesn’t theoretically have to be worse at updates than other designs (give or take #5 below). But we at Vertica made other design decisions that favor queries and loads (see #1 for the reasoning). If you want to compress data until there is no blood left in the stone, that argues against in-place updates. If you keep data sorted all the time and don’t want holes that would cause performance deterioration, that precludes in-place updates. And you definitely don’t want updates locking out queries, or causing flaky answers. So the upshot is Vertica is not the best at fine-grained SQL UPDATEs, but it is not because it is a column store. (That said, I challenge you update 10% of the records in any database and not notice a degradation in space usage and/or query performance.)
- This is not mentioned in Bruce’s slide, but the worst case for column stores is single record retrieval. A 20-column record lives in 20 places on disk. If you want to retrieve 1 record (say, a single customer profile), or maybe a handful of records (the 3 orders that customer placed this month), you have made an order of magnitude more work for a magnetic disk head. If you want to do this sort of thing, don’t get the current version of Vertica, get an in-memory database (maybe trendy NoSQL), a Key-Value store or at least a row store. And maybe shard it, so that it scales, if it isn’t a parallel database. (Don’t waste time with products that claim to do both columnar and row-store at once; the “jacks of all trades” are still masters of none.) But do send a feed of the data, in a log-style schema, to an analytic database so you can join it with web logs, etc., and monetize your data!
I learned a long time ago that there was more than one data structure. Arrays, a variety of lists and trees, hash tables, etc. They have relative strengths and weaknesses with regards to efficiency, on multiple axes (memory usage, lookups, updates, etc.). Database storage design choices aren’t any different. And, for “big data”, column stores just make sense.
Column stores aren’t a “side trip”; they’re a valuable tool in the arsenal, and are here to stay!!
If you’d like to understand more about Vertica’s architecture and design decisions, read our paper in the upcoming VLDB 2012 conference. And, of course, you are welcome to try it for yourself – download our Community Edition.