You may have noticed something interesting about the next generation enterprise data warehouse (EDW) vendors. They’re all pushing to ensure all data moves to the same place for analytics purposes, which of course means their particular data store. This is very worrisome to me because it involves two levels of lock in: 1) underlying infrastructure from a single cloud vendor, and 2) the EDW vendor itself, whether or not the infrastructure and EDW are offered under the very same brand. Vertica has a different idea, which we announced last month but have been building for years. It’s called the “Unified Analytics Warehouse.”
As pioneers in big data, we know it may sound odd to not include the word “data” in the name of our solution architecture. But that is actually an intentional choice (despite repeated encouragement to use “data analytics” for SEO optimization). Our mission is to unify your analytics; that’s what’s important. With the UAW you can keep your data wherever it is, without excessive ETL or migration from one data store to another simply for the sake of analyzing it.
A single version of the truth 2.0
We believe that when customers need analytics across a range of data sets managed by their company, they don’t – and shouldn’t have to – care where that data is located. What they care about is getting a single version of the truth. This is what Vertica continues to deliver, a UAW that spans streaming data, batch data, structured and semi-structured data, and allows access to the preferred tools from the data science community. Our ongoing mission is to support a combination of shared data storage, whether that’s HDFS, object storage, multi-cloud, on-premises, or various combinations, along with the ability to deliver on SLAs via the organization’s preferred visualization and AI tools.
This is what a Unified Analytics Warehouse is all about. It allows some data to be stored in Vertica, as needed, when it makes sense for optimal performance at scale. It allows data to be in external communal storage in open source formats like JSON, ORC and Parquet to minimize data duplication for applications that also need access to the data. We believe the data should be stored in the format that is most efficient for the enterprise, while the Unified Analytics Warehouse functions as a single source of analytical truth spanning every department and use case.
What does the UAW require?
First, there are many data requirements, including support for many different data types. Second, these data types need to be manageable in batch, in motion, at rest, and combined with the necessary advantages of optimized compression. Third, the system requires highly reliable governance, security, and resiliency. Resiliency is particularly important, because when we consider that predictive analytics will lead to proactive steps, an enterprise has to be confident that these are the right steps. Fourth, the system has to be available 24×7, every day. Yes, the infrastructure – cloud, network, on-prem resources, etc. – are a key part of this architecture, but these elements can’t restrict it. And fifth, organizations who have successfully achieved their goals of predictive analytics and proactive actions need performance at previously unimaginable scale, most likely by operationalizing machine learning on the full data set.
To this last point, given the different workloads, use cases, and communities that the UAW must support, it needs to have rich analytical functions, including full end-to-end machine learning capabilities. And these need to be available not only within the warehouse, but also to data in external tables on communal storage platforms like HDFS and S3, and available through import and export of trained models, and through integration with tools like Tensor Flow, so you truly unify the analytics across an enterprise. All those “and” requirements are what differentiate Vertica today.
The EDW and the data lake converge
I talk to analysts, read reports, and see predictions from industry leaders all the time, and I continue to see that the big data industry expects a single solution to the EDW and data lake divide. But in fact, both the data lake and the EDW are, together, a core part of the future. What wraps it all up is a unified approach to the analytics.
The right answer for any form of analytics has nothing to do with putting all your data in one place. That’s not in the best interest of an enterprise, since the time and labor required to move and transform data workloads is costly and now, more than ever, represents a strong risk for future choices. We believe that a modern business’s focus needs to be on unified analytics, which spans deployment options, data, and locations, with advanced analytics and machine learning functions all accessible in the language of your choice. That’s what will help any organization move to data-driven insights faster, and make decisions based on those insights with confidence. Given the importance of managing vast volumes of data, as well as performance, accuracy, and the need to unify the analyst and the data science communities, we strongly believe in Vertica’s Unified Analytics Warehouse.