Why All Column Stores Are Not the Same

Posted May 22, 2018 by Soniya Shah, Information Developer

This blog post was authored by Steve Sarsfield.

Critical Capabilities for Big Data Databases

Imitation is the sincerest form of flattery, but when imitators try to match brilliant engineering against uninspiring execution in order to meet the needs of a bullet on a PowerPoint slide, the imitation tends to be more of a parody, or even a perversion of the original intent. I find this to be the case in columnar databases. It’s why I wrote the white paper entitled “Why All Column Stores Are Not the Same”. The fact is, it’s pretty hard to develop a columnar database, particularly one that is pushed as hard as Vertica clients push us. The Vertica team has been at this for nine major releases now, writing Vertica from scratch to make the most of the columnar approach. Unlike other solutions, we are not attempting to take a row store engine and use it to store data in columns because we know that there would be limited value in doing so. We are not using columnar storage for archiving since we believe columnar storage is the fastest and most efficient way to access stored data for many kinds of analytics. Others believe that it’s only for information lifecycle management and archiving old data. We know that columnar storage both provides opportunities for incredible performance while also presenting certain challenges. Overcoming these challenges and exploiting the opportunity has been all we’ve thought about for more than a decade. Columnar databases are not a commodity, yet I see the major analysts firms saying this is so. If you get caught up in thinking that Oracle, for example, can do column store with the same efficiency, performance and concurrency as Vertica I think you can be lead into making very bad decisions about your analytics. Even for databases that are true column stores, you are possibly facing some sort of major technical flaw, including single point of failure, memory management limitations, lack of analytics workload management to support concurrency and more. If you really need to put big data analytics into production, there is no better solution than a columnar database, but please don’t believe for a second that they are all the same. Be better than the major analysts firms and take a look under the hood before you update your data warehouse or tackle your big data challenges.