Do you really want to outsource your data and analytics?

Posted February 23, 2021 by Marc Heimlich, Director, Technology Sector, Vertica

As a nine year veteran of the Big Data business, I’ve seen the shift from transactional data stores like Postgres, SQL Server and Oracle to OLAP offerings from the likes of Greenplum, Aster, Vertica, and ParAccel, to the rise and fall of Hadoop Data Lakes, and now the shift to cloud, object storage, and serverless architectures with offerings from Snowflake, Google’s BigQuery, and Amazon’s Redshift.

This movement has been driven primarily by two trends – cheaper storage and faster compute.

The goal for all enterprises is the ability to get insights when the data is still relevant. Analytics are driving autonomous monitoring, autonomous vehicles, prescriptive medicine, and just-in-time manufacturing, inventory guidance, sales and marketing insights, and human capital management.

It’s one thing to get a fast query response in the form of a report or a dashboard that a human must observe and analyze, but it’s transformative when you get a query result that you can take action on without human intervention. This comes from systems that give you fast and accurate insights, which in turn allow you to write scripts to automate a response – like serving up an ad, allocating more bandwidth, spinning up a new VM instance, or telling a car to turn right to avoid a pedestrian.

Now, the promise of cloud-managed services is making it easier to eliminate infrastructure cost and management, but is it really giving you market defining analytics? What is the opportunity cost here? Are you ok with giving up control and allowing a third-party to manage what should be the single most important competitive advantage you have? And that would be – your DATA! Let’s examine this from the perspective of data management and analysis, performance, security, and cost.

Data Management and Analysis Must Include ALL of Your Data

In order for an organization to get market defining insights, enterprises must combine data from internal sources like finance, operations, HR, sales, and marketing with external data from customers, suppliers, and partners and even third-party datasets from D&B, SFDC, Harte-Hanks, and S&P. Putting this data in a single cloud is no different from Teradata telling you to put all of your data in their appliance (could cloud be the new iteration of an appliance?). It’s unrealistic and unlikely to happen.

The reality is that this data will remain in multiple places – on-premises, in the clouds, and even on the edge. In order to the get the fastest possible compute out of this data, it is best to leave the data in place and integrate and layer the compute on top of the data wherever it sits!

So buyer beware – in order to maximize your insights you will need to move ALL of your data to these managed service cloud providers, which is unlikely, unrealistic, and costly.

And it’s not just combining these diverse datasets wherever they may reside, it’s also a matter of operationalizing the data science or even basic analytics in a seamless way. Where does your data science sit today in the organization? If yours is like most enterprises, your data science effort is unlikely to be integrated with your data management. Most data scientists still sit off in their corner taking whatever datasets they can get and finding insights that, by the time they are looped back into the business, are no longer relevant nor accurate because the models were run on a fraction or subset of the data!

It is important to consider how you can operationalize your data science by running your analytics on ALL of the data from multiple formats and data management environments. This will improve the accuracy of the outcomes, provide the insights when they are still timely, and minimize the complexity or silos that might exist.

Once you have some of your data in the cloud, these managed service providers will be more than happy to sell you additional data preparation, enrichment, modeling, and analytical tools, and in some cases, they will direct you to a third-party partner, which only adds to the cost and complexity equation that you were trying to avoid in the first place.

Performance Should Spell Competitive Advantage

Along with the stated goal of faster compute comes expectations that certain performance metrics or SLAs will be met. If not, financial penalties or lost business can result. While having a managed service is easy from a data management perspective, what happens if internal or external stakeholders require a new schema with new data types from third party data sources, and those inputs require specialized integration efforts with queries that require additional tuning and optimization? Well, a customer in the healthcare space recently called one of these managed service vendors and asked if they could better tune a particularly query and were told “we don’t do that” …

You should consider a cloud managed service no different from Walmart or Amazon, who cater to the masses. If your data management needs are fine for the one-size-fits-all approach, a cloud managed service should be sufficient; however, if you want to use data and analytics to compete, an enterprise must control its destiny in ensuring not only the keenest insights but at a performance that allows the business to take action on those insights when the data is still relevant.

Security Means Controlled Data Access

By sending all of your data up into the cloud, you are relying on a third-party to secure that data. Are you aware that some of these managed service providers put all of your metadata or catalog data in a central cloud services layer? What happens if that gets compromised? Not only is your data and customer data exposed but the ability to continue to run your queries may be compromised as well. I’ve also been told by an employee at a travel website that even within an enterprise, permission access can also get compromised where one group or an employee gets access to privileged and confidential information erroneously. Do you really want a third party controlling access and security?

Your enterprise must own your cloud instance – and that is arguably the single biggest security consideration. You can’t leave the protection and access privileges to a third party; the risk exposure could result in financial loss, reputation damage, and job loss.

So here is the thing – I’m not saying “Don’t use a managed service,” but make sure your data is in a ‘walled garden’ within that managed service, or buy your own cloud instance outright and have the managed service provider manage the environment!

Beware of Hidden Compute Costs

And finally, one of the big reasons for using the cloud is the ability to eliminate capital expenditure. In poring through the reviews of the managed cloud service vendors, it is clear that their costs are an unknown. The autoscaling feature of Snowflake scales geometrically, so if performance is lagging and you are running on 64 nodes, you may suddenly find yourself running on 128 nodes and paying double that month.

I talked to one software vendor a couple of weeks ago that built an entirely separate query layer outside of BigQuery just to avoid their compute costs. These managed services make it seem inexpensive to get on their service, but then they penalize you every time you or your customers query your application or service. That’s a bit counter-intuitive.

One customer in the Communications space told us that they are moving their workloads back to the data center, because three months of running their application in the cloud is equivalent to a year’s worth of data center colocation costs. (Stay tuned for their webinar.)

And what about the analytics? Snowflake tells you to integrate with third-party providers. We recently did an analysis for one cybersecurity vendor and showed that the cost of Vertica was one-fifth what Redshift, Athena, and SageMaker would charge on 1.5 PBs of data. So you also need to consider the complete TCO, not just for data preparation or data management or data science, but for all of the above!

So where does this lead? – Vertica, the Unified Analytics Platform

Vertica is the leading unified analytics platform, designed to deliver speed, scalability, and built-in machine learning for analytically intensive workloads that the enterprise can control. These workloads can reside in the public clouds, on-premises, on Hadoop, or any hybrid combination. At the core of the Vertica advanced analytics platform is a column-oriented, relational database purpose-built to handle today’s most demanding analytics requirements.

Vertica is a Unified Analytics Platform that:

  • Unifies analytics on HDFS data or object storage data lakes with optimized database storage to capitalize on storage investments and maximize business value.
  • Unifies a company’s deployment options spanning multi-cloud and on-premises environments to embrace cloud innovations, prevent lock-in, and meet regulatory and security requirements.
  • Unifies the data science community and the business analyst and IT community, enabling each to continue to use their preferred tools and languages while operationalizing machine learning at scale for real-time predictive analytics.

Used by some of the most data intensive organizations in the world, Vertica is trusted by AT&T, Cerner, Uber, The Trade Desk, and many other data-driven organizations.

As a unified analytics platform, Vertica solves four current market challenges that every organization faces.

  • Despite the disappointment in Hadoop, HDFS data lakes represent a very significant investment for many companies, but the value is not equivalent to original expectations.
  • Combined with the explosion of cloud object storage, organizations struggle even more to analyze their scattered data.
  • In addition, organizations favor a multi-cloud or hybrid cloud and on-premises deployment strategy as they face the reality of cloud vendor lock-in, costs, and migration challenges.
  • Machine learning is no longer a science project but must be put into production to deliver predictive analytics and recommendations in time to allow proactive actions.

With Vertica, an enterprise gains the ability to control and secure its data management and analysis, technically and financially, to take advantage of insights while they are still relevant.