Vertica

Archive for the ‘Customers’ Category

Is Big Data Giving You Grief? Part 1: Denial

My father passed away recently, and so I’ve found myself in the midst of a cycle of grief. And, in thinking about good blog topics, I realized that many of the organizations I’ve worked with over the years have gone through something very much like grief as they’ve come to confront big data challenges…and the stages they go through even map pretty cleanly to the five stages of grief! So this series was born.

So it’ll focus on the five stages of grief: denial, anger, bargaining, depression, and acceptance. I’ll explore the ways in which organizations experience each of these phases when confronting the challenges of big data, and also present strategies for coping with these challenges and coming to terms with big data grief.

Part One: Denial

“We don’t have a big data problem. Our Oracle DBA says so.”

Big data is a stealth tsunami – it’s snuck up on many businesses and markets worldwide. As a result, they often believe initially that that they don’t need to change. In other words, they are in denial. In this post, I’ll discuss various forms of denial, and recommend strategies for moving forward.

Here are the three types of organizational “denial” that we’ve seen most frequently:

They don’t know what they’re missing

Typically, these organizations are aware that there’s now much more data available to them, but don’t see that how it represents opportunity to their business. Organizations may have listened to vendors, who often focus their message on use cases they want to sell into – which may not be the problem a business needs to solve. But it’s also common for an organization settle into its comfort zone; the business is doing just fine and the competition doesn’t seem to be gaining any serious ground. So, the reasoning goes, why change?

The truth is that, as much as those of us who work with it every day feel that there’s always a huge opportunity in big data, for many organizations it’s just not that important to them yet. They might know that every day, tens of thousands of people tweet about their brand, but they haven’t yet recognized the influence these tweets can have on their business. And they may not have any inkling that those tweets can be signals of intent – intent to purchase, intent to churn, etc.

They don’t think it’s worth doing

Organizations in denial may also question whether dealing with big data is worth doing. An organization might already be paying a technology vendor $1 million or more per year for technology…and this to handle just a few terabytes of data. When the team looks at a request to suddenly deal with multiple petabytes of data, it automatically assumes that the costs would be prohibitive and shuts down that line of thinking. This attitude often goes hand-in-hand with the first item…after all, if it’s outrageously expensive to even consider a big data initiative, it seems there’s no point in researching it further since it can’t possibly provide a strong return on investment.

Somebody is in the bunker

While the prior two items pertained largely to management decisions based on return on investment for a big data project, this one is different. Early in my career I learned to program in the SAS analysis platform. As I pursued this for several different firms, I observed that organizations would tend to build a team of SAS gurus who held the keys to this somewhat exotic kingdom. Key business data existed only in SAS datasets which were difficult to access from other systems. Also, programming in SAS required a specialized skillset that only a few possessed. Key business logic such as predictive models, data transformations, business metric calculations, etc. were all locked away in a large library of SAS programs. I’ve spoken with more than one organization who tells me that they’ve got a hundred thousand (or more!) SAS datasets, and several times that many SAS programs floating around their business…many of which contain key business logic and information. As a result, the SAS team often held a good position in the organizational food chain, and its members were well paid.

One day, folks began to discover that they could download other tools that did very similar things, didn’t care where the data resided, cost a fraction of SAS, and required less exotic programming skills.

Can you see where this is going?

I also spent some years as an Oracle DBA and database architect, and witnessed very similar situations. It’s not uncommon – especially given how disruptive big data technologies can be – to see teams go “into the bunker” and be very reluctant to change. Why would they volunteer to give up their position, influence and perks? And so we now are at the intersection of information technology and a classic change management challenge.

Moving forward past denial

For an organization, working through the denial stage can seem daunting, but it’s very do-able. Here are some recommendations to get started:

Be prepared to throw out old assumptions. The world is rapidly becoming a much more instrumented place, so there are possibilities today that literally didn’t exist ten years ago. The same will be true in another ten years (or less). This represents both opportunity and competitive threat. Not only might your current competitors leverage data in new ways, but entirely new classes of products may appear quickly that will change everything. For example, consider the sudden emergence in recent years of smartphones, tablets, Facebook, and Uber. In their respective domains, they’ve caused entire industries to churn. So it’s important to cast a broad net in terms of looking for big data projects to deliver value for your business.

Big data means not having to say “no.”  I’ve worked with numerous organizations who have had to maintain a high cost infrastructure for so long that they’re used to saying “no” when they ‘re approached for a new project. And they add an exclamation point (“no!”) when they’re approached with a big data project. Newer technologies and delivery models offer the chance to put much more in the hands of users. So, while saying no may sometimes be inevitable, it no longer needs to be an automatic response. When it comes to an organization’s IT culture, be ready to challenge the common wisdom about team organization, project evaluation and service delivery. The old models – the IT service desk, the dedicated analyst/BI team, organizing a technology team into technology centric silos such as the DBA team, etc. may no longer be a fit.

Big data is in the eye of the beholder. Just because vendors love to talk about Twitter (and I’m guilty of that too), doesn’t mean that Twitter is relevant to your business. Maybe you manufacture a hundred pieces of very complex equipment every year and sell them to a handful of very large companies. In this case, it’s probably best not worry overmuch about tweets. You might have a very different big data problem. For instance, you may need to evaluate data from your last generation of devices which had ten sensors that generate ten rows of data per second each. And, you know that the next generation will have ten thousand sensors generating a hundred rows per second each – so very soon it’ll be necessary to cope with around ten thousand times as much data (or more – the new sensors may provide a lot more information than the older ones). And if the device goes awry, your customer might lose a $100 million manufacturing run. So don’t dismiss the possibilities in big data just because your vendor doesn’t talk about your business. Push them to help you solve your problems, and the vendors worth partnering with will work with you to do this.

Data expertise is a good thing. Just because you might not need ten Oracle DBA’s in the new world doesn’t mean that you should lay eight of them off. The folks who have been working intimately with the data in the bunker often have very deep knowledge of the data. They frequently can retool and, in fact, find themselves having a lot more fun delivering insights and helping the business. It may be important to re-think the role of the “data gurus” in the new world.  In fact, I’d contend that this is where you may find some of your best data scientists.

While organizational denial is a tough place to be when it comes to big data, it happens often. And many are able to move past it. Sometimes voluntarily, and sometimes not – as I’ll describe in the next installment.  So stay tuned!

Next up:

Anger: “We missed our numbers last quarter because we have a big data problem! What the heck are we going to do about it?”

The Automagic Pixie

The “De-mythification” Series

Part 4: The Automagic Pixie

Au∙to∙mag∙ic: (Of a usually complicated technical or computer process) done, operating, or happening in a way that is hidden from or not understood by the user, and in that sense, apparently “magical”

[Source: Dictionary.com]

In previous installments of this series, I de-bunked some of the more common myths around big data analytics. In this final installment, I’ll address one of the most pervasive and costly myths: that there exists an easy button that organizations can press to automagically solve their big data problems. I’ll provide some insights as to how this myth has come about, and recommend strategies for dealing with the real challenges inherent in big data analytics.

Like the single-solution elf, this easy button idea is born of the desire of many vendors to simplify their message. The big data marketplace is new enough that all the distinct types of needs haven’t yet become entirely clear – which makes it tough to formulate a targeted message. Remember in the late 1990’s when various web vendors were all selling “e-commerce” or “narrowcasting” or “recontextualization”? Today most people are clear on the utility of the first two, while the third is recognized for what it was at the time – unhelpful marketing fluff. I worked with a few of these firms, and watched as the businesses tried to position product for a need which hadn’t yet been very well defined by the marketplace. The typical response by the business was to keep it simple – just push the easy button and our technology will do it for you.

I was at my second startup in 2001 (an e-commerce provider using what we would refer to today as a SaaS model) when I encountered the unfortunate aftermath of this approach. I sat down at my desk on the first day of the job, and was promptly approached by the VP of Engineering, who informed me that our largest customer was about to cancel its contract – we’d been trying to upgrade the customer for weeks, during which time their e-commerce system was down. Although they’d informed the customer that the upgrade was a push-button process, it wasn’t. In fact, at the time I started there, the team was starting to believe that an upgrade would be impossible and that they should propose re-implementing the customer from scratch. By any standard, this would be a fail.

Over the next 72 hours, I migrated the customer’s data and got them up and running.   It was a Pyrrhic victory at best – the customer cancelled anyhow, and the startup went out of business a few months later.

The moral of the story? No, it’s not to keep serious data geeks on staff to do automagical migrations. The lesson here is that when it comes to data driven applications – including analytics – the “too good to be true” easy button almost always is. Today, the big data marketplace is full of great sounding messages such as “up and running in minutes”, or “data scientist in a box”.

“Push a button and deploy a big data infrastructure in minutes to grind through that ten petabytes of data sitting on your SAN!”

“Automatically derive predictive models that used to take the data science team weeks in mere seconds! (…and then fire the expensive data scientists)!”

Don’t these sound great?

The truth is, as usual, more nuanced. One key point I like to make with organizations is that big data analytics, like most technology practices, involves different tasks. And those tasks generally require different tools. To illustrate this for business stakeholders, I usually resort to the metaphor of building a house. We don’t build a house with just a hammer, or just a screwdriver. In fact, it requires a variety of tools – each of which is suited to a different task. A brad nailer for drywall. A circular saw for cutting. A framing hammer for framing. And so on. And in the world of engineering, a house is a relatively simple thing to construct. A big data infrastructure is considerably more complex. So it’s reasonable to assume that an organization building this infrastructure would need a rich set of tools and technologies to meet the different needs.

Now that we’ve clarified this, we can get to the question behind the question. When someone asks me “Why can’t we have an easy button to build and deploy analytics?” What they’re really asking is “How can I use technological advances to build and deploy analytics faster, better and cheaper?

Ahh, now that’s an actionable question!

In the information technology industry, we’ve been blessed (some would argue cursed) by the nature of computing. For decades now we’ve been able to count on continually increasing capacity and efficiency. So while processors continue to grow more powerful, they also consume less power. As the power requirements for a given unit of processing become low enough, it is suddenly possible to design computing devices which run on “ambient” energy from light, heat, motion, etc. This has opened up a very broad set of possibilities to instrument the world in ways never before seen – resulting in dramatic growth of machine-readable data. This data explosion has led to continued opportunity and innovation across the big data marketplace. Imagine if each year, a homebuilder could purchase a saw which could cut twice as much wood with a battery half the size. What would that mean for the homebuilder? How about the vendor of the saw? That’s roughly analogous to what we all face in big data.

And while we won’t find one “easy button”, it’s very likely that we can find a tool for a given analytic task which is significantly better than one that was built in the past. A database that operates well at petabyte scale, with performance characteristics that make it practical to use. A distributed filesystem whose economics make it a useful place to store virtually unlimited amounts of data until you need it. An engine capable of extracting machine-readable structured information from media. And so on. Once my colleagues and I have debunked the myth of the automagic pixie, we can have a productive conversation to identify the tools and technologies that map cleanly to the needs of an organization and can offer meaningful improvements in their analytical capability.

I hope readers have found this series useful. In my years in this space, I’ve learned that in order to move forward with effective technology selection, sometimes we have to begin by taking a step backward and undoing misconceptions. And there are plenty! So stay tuned.

The Single-Solution Elf

The “De-mythification” Series

Part 3: The Single-Solution Elf

In this part of the de-mythification series, I’ll address another common misconception in the big data marketplace: that there exists a single piece of technology that will solve all big data problems. Whereas the first two entries in this series focused on market needs, this will focus more on the vendor side of things in terms of how big data has driven technology development, and give some practical guidance on how an organization can better align their needs with their technology purchases.

Big Data is the Tail Wagging the Vendor

Big data is in the process of flipping certain technology markets upside-down. Ten or so years ago, vendors of databases, ETL, data analysis, etc. all could focus on building tools and technologies for discrete needs, with an evolutionary eye – focused on incremental advance and improvement. That’s all changed very quickly as the world has become much more instrumented. Smartphones are a great example. Pre-smartphone, the data stream from an individual throughout the day might consist of a handful of call-detail records and a few phone status records. Maybe a few kilobytes of data at most. The smartphone changed that. Today a smartphone user may generate megabytes, or even gigabytes of data in a single day from the phone, the broadband, the OS, email, applications, etc. Multiply that across a variety of devices, instruments, applications and systems, and the result is a slice of what we commonly refer to as “Big Data”.

Most of the commentary on big data has focused on the impact to organizations. But vendors have been, in many cases, blindsided. With technology designed for orders of magnitude less data, sales teams accustomed to competing against a short list of well-established competitors, marketing messages focused on clearly identified use cases, and product pricing and packaging oriented towards a mature, slow-growth market, many have struggled to adapt and keep up.

Vendors have responded with updated product taglines (and product packaging) which often read like this:

“End-to-end package for big data storage, acquisition and analysis”

“A single platform for all your big data needs”

“Store and analyze everything”

Don’t these sound great?

But simple messages like these mask the reality that there are distinct activities that which comprise big data analytics, and that these activities come with different technology requirements, and much of today’s technology was born in a very different time – so the likelihood of there being a single tool that does everything well is quite low. Let’s start with the analytic lifecycle, depicted in the figure below, and discuss the ways this has driven the state of the technology.

analytic_lifecycle

This depicts the various phases of an analytic lifecycle from the creation and acquisition of data through the exploration and structuring to analysis and modeling, to putting the information to work. These phases often require very different things from technology. Let’s take the example of acquiring and storing of large volumes of data with varying structure. Batch performance is often important here, as is cost to scale. Somewhat less important is ease of use – load jobs tend to change at a lower rate than user queries, especially when the data in a document-like format (e.g. JSON). By contrast, the development of a predictive model requires a highly interactive technology which combines high performance with a rich analytic toolkit. So batch use will be minimal, while ease of use is key.

Historically, many of the technologies required for big data analytics were built as stand-alone technologies: a database, a data mining tool, an ETL tool, etc. Because of this lineage, the time and effort required to re-engineer these tools to work effectively together as a single technology, with orders of magnitude more data, can be significant.

Despite how a vendor packages technology, organizations must ask themselves this question: what do you really need to solve the business problems? When it comes time to start identifying a technology portfolio to address big data challenges, I always recommend that customers start by putting things in terms of what they really need. This is surprisingly uncommon, because many organizations have grown accustomed to vendor messaging which is focused on what the vendor wants to sell as opposed as to what the customer needs to buy. It may seem like a subtle distinction, but it can make all the difference between a successful project and a very expensive set of technology sitting on the shelf unused.

I recommend engaging in a thoughtful dialog with vendors to assess not only what you need today, but to explore things you might find helpful which you haven’t thought of yet. A good vendor will help you in this process. As part of this exercise, it’s important to avoid getting hung up on the notion that there’s one single piece of technology that will solve all your problems: the single solution elf.

Once my colleagues and I dispel the single solution myth, we can then have a meaningful dialog with an organization and focus on the real goal: finding the best way to solve their problems with a technology portfolio which is sustainable and agile.

I’ve been asked, more than once “Why can’t there be a single solution? Things would be so much easier that way.” That’s a great question, which I’ll address in my next blog post as I discuss some common sense perspectives on what technology should – and shouldn’t – do for you.

Next up: The Automagic Pixie

 

Two new case studies – Jobrapido & Supercell.

We’ve published two more case studies, featuring Job Rapido and Supercell. These are compelling examples of innovative companies that use the HP Vertica Analytics Platform to gain a competitive edge and dervive maximum business value from Big Data. The two summaries and respective full case study PDF’s provide details about each company’s goals, success, and ultimate outcomes using HP Vertica. To see more like these, visit the HP Vertica Case Studies page.

Jobrapido-Dark-RGB

Jobrapido scales its database
to the next level

Since its founding in 2006, Jobrapido has become one of the biggest online job search aggregators in the world, helping millions of users everywhere from Italy to the United states find the job that’s right for them. In 2012, they were acquired by Evenbase, a part of DMG media based in the UK. HP Vertica has proved invaluable to their success, performing above and beyond for their big data analytics needs. David Conforti Director of BI at Jobrapido describes HP Vertica as “like having a sort of magic mirror to ask to all the business questions that come to my mind,” one that has allowed him and his team to deliver their users both valuable insight and results, and a unique personal experience based on their analytics.

Download the full case study(.pdf)

Supercell_logo_black_on_white

Supercell performs real-time analytics

In 2012, Supercell delivered two top-grossing games on iOS with the titles “Clash of Clans” and “Hey Day,” just a year after its founding in 2011. Using HP Vertica big data analytics platform, Supercell has been able to engage in real-time gaming data analytics, allowing them to balance, adapt, and improve their gamers experiences on a day to day basis. “HP Vertica is an important tool in making sure that our games provide the best possible experience for our players” says Janne Peltola, a data scientist at Supercell. Using HP Vertica, Supercell is able to create gaming experiences that are fun and engaging for customers to keep coming back to, long after they have started playing.

Download the full case study (.pdf)

Our Users Validate the Value of Vertica

We recently allowed TechValidate, a trusted authority for creating customer evidence content, to survey the HP Vertica customer base. The firm reached out to nearly 200 customers across a variety of industries and came back with some extremely powerful results.

From the financial benefits to the performance advantages, the benefits of the HP Vertica Analytics platform were repeatedly and clearly detailed by our customers.

A sampling of some of the comments and results can be found below, but to see the full results set click here.

HP Vertica Software Rocks HP Vertica Software - the best in the market
Query performance increased by 100-500% or more

HP Vertica customers have achieved a wide range of benefitsMajority of Vertica users saved $100-500K or more

 

 

 

 

Welcoming Facebook to the growing family of HP Vertica customers!

Yesterday at HP Discover Barcelona, we were thrilled to welcome one of our newest Vertica customers – Tim Campos, CIO of Facebook, during George Kadifa’s keynote.

Facebook selected the HP Vertica Analytics Platform as one component of its big data infrastructure.  Vertica’s value to Facebook can be found in its ability to provide business insights with incredible speed and flexibility. HP Vertica supports Facebook’s business analysts and helps the company be more productive through dramatically reduced query time. It is also valuable for providing accurate forecasting and aiding data driven decisions.

According to Tim:

“Data is incredibly important: it provides the opportunity to create new product enhancements, business insights, and a significant competitive advantage by leveraging the assets companies already have. At Facebook, we move incredibly fast. It’s important for us to be able to handle massive amounts of data in a respectful way without compromising speed, which is why HP Vertica is such a perfect fit.”

As blogger Dana Gardner posted:

“HP, taking a new leap in its marathon to remake itself, has further assembled, refined and delivered the IT infrastructure, big data and cloud means by which other large enterprises can effectively remake themselves.

This week here at the HP Discover 2013 conference — despite the gulf of 70 years but originating in the same Silicon Valley byways — has found a kindred spirit in … Facebook. The social media juggernaut, also based in Palo Alto, is often pointed to with both envy and amazement at its new-found and secretive feats of IT data center scale, reach, efficiency and adaptability. It’s a mantle of technological respect that HP itself once long held.

So for Facebook’s CIO, Tim Campos, to get on stage in Europe and declare that, “A partner like HP Vertica thinks like we do” and is a “key part” of Facebook’s big data capabilities, is one the best endorsements, err … “likes,” that any modern IT infrastructure vendor could hope for. With Facebook’s data growing by 500 terabytes a day, this is quite a coup for HP’s analytics platform, which is part of its HAVEn initiative. “

Welcome aboard Facebook to the growing family of HP Vertica customers!

How MZI HealthCare identifies big data patient productivity gems using HP Vertica

As part of our continuing podcast series, Dana Gardner, president and principal analyst for Interarbor Solutions, recently conducted an interview with Greg Gootee, product manager at MZI HealthCare.   MZI HealthCare develops and provides sophisticated software solutions that are flexible, reliable, cost effective, and help reduce the complexities of the healthcare industry.

In a post on ZDNet, Dana shares some of the highlights from his podcast with Greg Gootee:

Doctors make informed decisions from their experience and the data that they have. So it’s critical that they can actually see all the information that’s available to them.

The other critical thing was speed, being able to deliver high-end analytics at the point of care, instead of two or three months later, and Vertica really produced. In fact, we did a proof of concept with them. It was almost unbelievable some of the queries that ran and the speed at which that data came back to us.

The ability to expand and scale the Vertica system along with the scalability that we get with the Amazon allows us to deliver that information. No matter what type of queries we’re getting, we can expand that automatically. We can grow that need, and it really makes a large difference in how we could be competitive in the marketplace.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

Get Started With Vertica Today

Subscribe to Vertica