Let Business Intelligence and Data Science Coexist!

Posted May 11, 2020 by Mike Perrow, Senior Product Marketing Writer/Editor

Business intelligence (BI) is a well-established discipline, with mature tools and practices in play at most modern businesses, and for good reasons. First, reporting results is key to business success, since – to grossly paraphrase Santayana – we don’t want to repeat past mistakes. And, second, we want to see how trends in our historical data might lead to business gains in whatever category we desire to excel in. Those patterns are lessons from the past that we do want to repeat.

So, what about this scrappy, relative new-comer called “data science”? What makes it so special that it deserves a place alongside established BI routines? Rather than enumerating all the reasons here, let’s just agree on an underlying fact: Data sets have gotten really big since the early days of BI and the early executive Decision Support Systems based on early versions of the data warehouse. Today, we can collect and store more data in a day than the data warehouses of 1995 stored in a year. Consider internet usage alone, which in 1995 had an estimated 16 million users. Today, that number is more than 4.5 billion.

“Big data” is the term for today’s opportunity in all that data growth, and frankly, it’s a beast that BI was not designed for.

BI and DS: Two different approaches to data

While data mart analytics, visualization, and trending have evolved in important ways since the inception of BI techniques, the BI focus is still on decision-making based on a current business model. It’s all about understanding what data is telling us in known categories. By contrast, data science (DS) offers a strange but exciting new territory to explore: the unknown categories of information that can become clearer through machine learning.

Based on its ability to find patterns in huge volumes of data, data science can tell us that a business selling peanut butter has, in fact, a greater future in global peanut distribution. That sort of thing really happens, and quickly. But it doesn’t happen within traditional BI, where the current business model is evaluated according to last quarter’s profits and related trends. There’s huge value in that. But there’s huge value in data science, too.

Bring data science projects into the limelight

There’s often a practical problem that stands in the way of data science projects, however: BI and data science frequently compete for resources, and BI got there first. When business organizations fund their BI systems, they’re thinking about turning operational data into insights that drive quarterly business goals. They’ve invested big bucks into the hardware, software, and people that will produce those reports and dashboards on a regular basis. This leaves data science teams scrambling for computing power to train their algorithms on new data. They have to try hard not to interfere with the BI teams who have a separate agenda, typically involving service-level agreements (SLAs) with the business teams they support. Some BI teams actually face penalties for failing to meet their SLAs.

And it isn’t just compute power that data science needs. They also require separate data copies for massive data sets, and a general separation of their workloads to not interfere with BI processes.

Since 2018, Vertica has offered a solution to this dilemma, and with Vertica release 10.0, the solution has become even more compelling.

Use subclusters to share the wealth of compute resources

When you deploy Vertica in Eon Mode, slowing down BI queries no longer has to be an excuse for putting data science projects on hold. It allows you to use subclusters to provide dedicated compute to separate workloads, with no data replication and copy maintenance required. Subclusters ensure that workloads will not impact one another. Heavy analytics workloads will not slow down BI dashboards. You can scale each subcluster to the workload it is servicing, then shut it down when not in use.

When Vertica is deployed in Eon Mode, a primary, durable data copy resides in your shared communal storage.  An intelligent cache called the “Depot,”  local  to  each  compute  node,  ensures  high  performance  by  storing  a copy of the data most relevant to the workload. There is no need to redistribute the primary copy, or create data replicas. You can configure clusters as your use cases demand, and allow heavy storage with minimal compute, or vice versa, depending on workload requirements.

Vertica in Eon Mode brings peaceful coexistence to your BI and DS teams

Vertica in Eon Mode meets the demands of variable workloads through rapid scaling, and it simplifies database operations such as rapid node recovery. Those features are great for your admin team, but it also enables workload isolation so that DBAs can protect the SLAs of BI workloads from the high compute demands of data science. This means that data science teams looking to dig deeper into data can have the compute power they need, without compromising routine enterprise data warehouse reporting.

Find out more about Vertica in Eon Mode here. And register for our next Vertica Under the Hood presentation, “A Clear Path to Eon Mode” scheduled for May 19, live at 1 pm EDT.


Related Links:
Data sheet – Vertica in Eon Mode
Data sheet – Vertica in Eon Mode for Pure Storage
Blog post – Vertica’s Eon Mode – New Architecture for Flexibility, Performance, Savings
Past Vertica Under the Hood webinar – Managing Varying Workloads with Vertica in Eon Mode
Future Vertica Under the Hood webinar – “What’s New in Vertica 10” scheduled for June 2, live at 1 pm EDT.