The De-mythification Series
Part 3: The Single-Solution Elf
In this part of the de-mythification series, I’ll address another common misconception in the big data marketplace: that there exists a single piece of technology that will solve all big data problems. Whereas the first two entries in this series focused on market needs, this will focus more on the vendor side of things in terms of how big data has driven technology development, and give some practical guidance on how an organization can better align their needs with their technology purchases.
Big Data is the Tail Wagging the Vendor
Big data is in the process of flipping certain technology markets upside-down. Ten or so years ago, vendors of databases, ETL, data analysis, etc. all could focus on building tools and technologies for discrete needs, with an evolutionary eye—focused on incremental advance and improvement. Thats all changed very quickly as the world has become much more instrumented. Smartphones are a great example. Pre-smartphone, the data stream from an individual throughout the day might consist of a handful of call-detail records and a few phone status records. Maybe a few kilobytes of data at most. The smartphone changed that. Today a smartphone user may generate megabytes, or even gigabytes of data in a single day from the phone, the broadband, the OS, email, applications, etc. Multiply that across a variety of devices, instruments, applications and systems, and the result is a slice of what we commonly refer to as ”Big Data”.
Most of the commentary on big data has focused on the impact to organizations. But vendors have been, in many cases, blindsided. With technology designed for orders of magnitude less data, sales teams accustomed to competing against a short list of well-established competitors, marketing messages focused on clearly identified use cases, and product pricing and packaging oriented towards a mature, slow-growth market, many have struggled to adapt and keep up.
Vendors have responded with updated product taglines (and product packaging) which often read like this:
”End-to-end package for big data storage, acquisition and analysis”
”A single platform for all your big data needs”
”Store and analyze everything”
Don’t these sound great?
But simple messages like these mask the reality that there are distinct activities that which comprise big data analytics, and that these activities come with different technology requirements, and much of today’s technology was born in a very different time—so the likelihood of there being a single tool that does everything well is quite low. Lets start with the analytic lifecycle, depicted in the figure below, and discuss the ways this has driven the state of the technology.
This depicts the various phases of an analytic lifecycle from the creation and acquisition of data through the exploration and structuring to analysis and modeling, to putting the information to work. These phases often require very different things from technology. Let’s take the example of acquiring and storing of large volumes of data with varying structure. Batch performance is often important here, as is cost to scale. Somewhat less important is ease of use—load jobs tend to change at a lower rate than user queries, especially when the data in a document-like format (e.g. JSON). By contrast, the development of a predictive model requires a highly interactive technology which combines high performance with a rich analytic toolkit. So batch use will be minimal, while ease of use is key.
Historically, many of the technologies required for big data analytics were built as stand-alone technologies: a database, a data mining tool, an ETL tool, etc. Because of this lineage, the time and effort required to re-engineer these tools to work effectively together as a single technology, with orders of magnitude more data, can be significant.
Despite how a vendor packages technology, organizations must ask themselves this question: what do you really need to solve the business problems? When it comes time to start identifying a technology portfolio to address big data challenges, I always recommend that customers start by putting things in terms of what they really need. This is surprisingly uncommon, because many organizations have grown accustomed to vendor messaging which is focused on what the vendor wants to sell as opposed as to what the customer needs to buy. It may seem like a subtle distinction, but it can make all the difference between a successful project and a very expensive set of technology sitting on the shelf unused.
I recommend engaging in a thoughtful dialog with vendors to assess not only what you need today, but to explore things you might find helpful which you havent thought of yet. A good vendor will help you in this process. As part of this exercise, its important to avoid getting hung up on the notion that theres one single piece of technology that will solve all your problems: the single solution elf.
Once my colleagues and I dispel the single solution myth, we can then have a meaningful dialog with an organization and focus on the real goal: finding the best way to solve their problems with a technology portfolio which is sustainable and agile.
I’ve been asked, more than once “Why cant there be a single solution? Things would be so much easier that way.” That’s a great question, which I’ll address in my next blog post as I discuss some common sense perspectives on what technology should—and shouldn’t—do for you.
Next up: The Automagic Pixie