The newly converged Cloudera recently announced its quarterly earnings and the market was very disappointed, resulting in a stock price drop of 42% and a lot of negative media coverage. One headline that stuck out to me read: “Does Cloudera’s Stock Drop Signal the End of Big Data?” I thought, “No, I don’t think Cloudera’s overspending and poor future outlook signals anything but an awakening of investors to the need for companies to actually earn more than they spend!”
Has the Term “Big Data” Jumped the Shark?
Nevertheless, the headline did cause me to keep thinking because I do agree with one key point: “Big Data” is an outdated phrase and no longer a meaningful industry concept. What was once considered “big” is now normal (or even small, when you look at the explosion of IOT use cases
). It’s no longer just the sexy Silicon Valley Social Media, Ride Sharing, AdTech, FinTech, and Gaming companies who are gathering massive amounts of data. Virtually every company in every industry has massive amounts of data stored in every possible incarnation of a data repository because that’s been the focus for the last few years – get the data!
Analytics Delivers the Value in the Face of Growing Hadoop and Cloud Costs
We are now facing a very critical fork in the industry road ahead. The data itself, especially given the scale, is nothing more than a hefty cost center and a serious liability. Everyone knows that Hadoop infrastructure, maintenance, administration, and support is expensive. Everyone knows that going to the cloud because it’s cheaper is a proven fantasy. And maybe most importantly, everyone knows the risk of a massive data breach, especially in light of GDPR
, the soon-to-be-introduced California Data Privacy law, and so many more regulations around the world. Companies that fork to the left and continue to focus only on collecting and storing data will ultimately have to answer to their own investors about the financial costs and liabilities. But companies who follow the right path and transition from focusing on “big data” and focus instead on the analytics necessary to turn that data into business insights will turn their cost centers into revenue streams and convert their liabilities into highly valuable assets.
Managed Cloud Services Aren’t the Answer
But this can be a challenging journey when you consider the options in the market. Companies currently have data stored in many different locations and different formats. Many people call these data silos and they’re right, but distributed data locations and disparate data formats are often justified by the need to protect a company against infrastructure lock-in, and the need to allow access to data by multiple applications. Managed Services like Snowflake
require that all data be loaded into a single proprietary repository. Public Cloud Providers like AWS, Google, and Azure require that all the data be moved to their cloud infrastructure. Advanced analytical functions like time series, geospatial, and Machine Learning add more complexity when they’re implemented using specialty platforms.
Following the Proven Path with Vertica
From the very first line of the C-Store paper to the very first line of code, Vertica was built
for a LOT of data. Vertica’s MPP columnar architecture was designed to meet the early days of “Big Data” and that’s why all the early adopters and Silicon Valley data disruptors are powered by Vertica. But Vertica is driven by a team of architects and engineers that believe we must always meet the demands of the market and respect the evolution of the industry. This development culture led Vertica to expand its analytics to data lakes, including Hadoop HDFS and Amazon S3. It led us to reading and writing open source formats like ORC and Parquet. It led us to building advanced analytics functions like time series, geospatial, pattern matching, projections, and in-database Machine Learning into the Vertica platform. It inspired us to implement tools like Database Designer, Query Optimizer, and Flattened Tables
to continuously improve performance. All of these functions demonstrate respect for the need to unify data silos without demanding that all the data be sucked into one vendor’s environment. These functions demonstrate recognition that advanced analytical functions and machine learning are frankly far less impactful if they can only reach a subset of the data.
But that’s not all. Because we are an engineering led culture, we understand the importance of development and testing, of non-production and production clusters, hot and cold standbys, and all the other aspects of a mission-critical software platform that the world’s most data dependent companies
rely on to run their businesses. That’s why we implemented licensing that allows for free development and test clusters and why we allow duplicate clusters for non-production and hot/cold standby.
The era of Big Data is over, and “big” is the new normal. The next era is clear: Big Cost and Big Liability or Big Revenue and Big Asset. I know what I’d choose.