Vertica’s Architectural Direction – Separation of Compute and Storage

Posted June 10, 2019 by Chuck Bear, Vertica Chief Architect

Drawing of 4 blue pillars: Separation, Automation, Prediction, Protection

For over a decade, Vertica has delivered industry-leading performance on analytic query workloads in on-premises “shared-nothing” environments. Due to several good (and several lucky) historical architectural decisions, Vertica in Eon Mode is also leading the market for elastic, on-demand databases in public clouds. This post will review the key features of the Vertica architecture, and describe Vertica’s ongoing journey to converge Vertica’s Enterprise Mode deployment option and Vertica’s new Eon Mode deployment option into one system that offers enterprise-level performance in the cloud, and cloud-level flexibility on premises.

Historical Strengths
Back in 2005 when Vertica was founded, data warehouse and other analytic workloads (to the extent they existed) were run on relatively high-end hardware, clusters with storage area networks (SANs) and built-in hardware fault tolerance, or in some cases, combined software/hardware appliances. Vertica believed in a software-only approach, on the grounds that flexible, commodity hardware configurations would be a better value for the customer. In retrospect, while the software-only approach places more of the complexity in the software and leaves our support people with a lot of environmental problems to diagnose, that decision made Vertica much more adaptable to the public cloud environment where SANs are not available and appliances are a non-starter.

In the early days, Vertica’s founding team had a very dim view of shared storage. Building support for it could only have happened “OMDB” – Over Mike (Stonebraker’s) Dead Body. There was some good reasoning behind this. At the time “shared storage” meant SANs, which were expensive, hard to configure, and could have performance problems. Occasionally customers insisted on using shared storage anyway, and the results gave additional credibility to the notion that storage arrays were problematic. Our first use of a SAN in the lab of a large server vendor, though supported by experts, was misconfigured such that all the data was going to one adapter, rather than being load-balanced. When one of the early iSCSI vendors sent us a sample array to benchmark, our IT guy accidentally mapped the same LUNs to multiple computers (without using a shared file system), and they promptly clobbered each other’s data.

Current State
Of course, the public cloud has fundamentally changed the definition of “shared storage” at all levels. Public clouds are a simplified object store (rather than block-based array) design, offered at much lower and more flexible cost, and administration simply involves setting up security, and making sure things are deleted when no longer wanted.

Fortunately, Vertica ended up well-positioned to take advantage of cloud object stores. From early on, Vertica realized that optimizing data compression ratios was at odds with handling incremental updates, and selected a storage model where files are never updated once written. This also led to an analytics-optimized transaction model, where queries never have to take any locks. But in the cloud generation, the write-once, read-many model is a perfect fit, as it is the access pattern supported by HDFS, S3, etc. And, as it is still true that accessing local storage or local memory is faster than accessing shared storage, this update-free object transaction model permits relatively trouble-free caching.

These observations formed the basis of Vertica’s Eon Mode, where compute and storage can be scaled separately, with the same performance MPP database customers expect. Our cloud deployments can perform just like the original shared nothing MPP Vertica architecture, because, give or take transient behaviors where caches are being filled, it processes queries in exactly the same way.

Diagram of Enterprise Mode showing storage and compute together and Eon Mode showing separate storage with caching to bring data and compute together
Future Architecture
Users of the Vertica’s Enterprise Mode, which is utilized both on premises and in multiple public clouds, want many of the features currently available only in Eon Mode, such as the ability to add storage without adding more compute nodes, to change the configuration of compute nodes in a more dynamic manner than offered by the rebalance feature, or to spin up multiple clusters of compute nodes to query the same data set. Some features are currently not available in the cloud in Eon Mode, such as the Write Optimized Store (WOS) for small inserts, as it uses node-local memory instead of shared storage.

Over a series of future releases, Vertica plans to converge Eon and Enterprise modes into a single experience. Vertica nodes will get the ability to read each other’s files, a relatively simple step that will leave only one key difference between the options: for Enterprise Mode tables or partitions, the Vertica nodes are responsible for the master copy of the data, whereas for shared tables the master copy will live in shared storage, and the Vertica nodes need not keep a copy of the data at all times. Vertica already supports more than one storage location in Enterprise Mode today. In a converged future, it will get support for multiple shared and local storage locations, selectable per schema, table, or partition.

Diagram showing state today and in future
Vertica’s long-standing design goals have set the product up for success in a variety of deployments, in on-premises, public cloud, and private cloud scenarios. We’re currently working to converge the two Vertica modes into one that combines columnar MPP performance with full storage flexibility on all platforms. Features, such as adding storage without compute, or adding many more compute nodes to scale analytic query throughput, will be available in all environments.

Separation of compute and storage is only one of Vertica’s pillars for future development. We have long-term plans for everything from performance and scale to improving client drivers, from machine learning features to partner ecosystem, from query optimization to query execution, from complex data type support to security and data protection, and more. So, stay tuned for more blog posts about the future of the Vertica product and remember, for Vertica, it’s never good enough.

Related Resources
White Paper: Vertica in the Clouds
Benchmark Report: Cloud Database Performance Benchmark – Snowflake vs Vertica in Eon Mode Technical Brief: Eon Mode – Bringing a Columnar Database to the Clouds