Aside from the sheer quantity of digital data created every day—about 2.5 exabytes1 —there’s more to Big Data than volume. Big Data offers enterprise leaders the opportunity to dramatically change the way their organizations operate to gain competitive advantage and find new revenue opportunities. But realizing the value Big Data promises requires a new approach. Traditional data warehouses and business intelligence tools weren’t built for the scale of Big Data, and can’t provide insight quickly enough to be useful or even keep up.
But this isn’t just a case of data growth outstripping technology growth. Big Data embodies fundamental differences that necessitate new approaches and new technologies. Big Data takes many forms, three in particular we’ll discuss here:
- Transactional data
- Sentiment and perceptual data based on conversations taking place in social media
- Data from networked sensors—the so-called “Internet of Things”
As businesses have expanded—and expanded onto the Internet—the volume of business transactions has grown. The Economist reported in 2010 that Wal-Mart processes more than 1 million customer transactions every hour and maintains databases exceeding 2.5 petabytes (million gigabytes)2. Imagine how those numbers have grown since then.
What’s even more critical is that companies can now capture not just sales transactions, but the detailed histories and clickstreams that lead to the sale. From web-based clickstream analysis to call data records, pre- and post-transaction histories are more robust than ever—and our ability to collect, analyze and act on that data must adjust accordingly.
The social media explosion
Today’s online customer has progressed well beyond accessing information. Today’s consumers are not only interacting and collaborating with each other, but they’re talking about and interacting with your brand. Facebook has more than 1 billion active subscribers3, and it’s estimated they share almost 700,000 individual pieces on content every minute. On Twitter, more than a billion tweets go out every two to three days4. (You can watch them mapped geographically in real-time at tweetping.net.)
Product reviews, user communities, forums and blogs allow consumers to generate content that contains critical insight for the business. The proliferation of user-generated content in these social channels has lead to new techniques and tools for “sentiment analysis”—the ability to measure emotion to determine how your company and brand are perceived.
The Internet of Things
The amount of information generated by devices rather than people is also growing explosively.
Mobile devices—and the apps people use on them—regularly broadcast individuals’ location, performance and other factors to the network. Retailers and distributors are using radio frequency identification (RFID), bar and QR codes to track inventory and enhance their supply chain and inventory performance. The healthcare industry seeks to improve care and reduce costs through remote patient monitoring. The automotive industry is embedding sensors in vehicles. And utilities are beginning to rely on smart meters to track usage. McKinsey Global Institute reports that more than 30 million networked sensors are in use in the transportation, automotive, industrial, utilities and retail sectors—and the number is growing by 30 percent every year.5
We recently presented a webinar on the Internet of Things and the Power of Sensor Data, which delves into this exciting area in much more detail.
Disrupting conventional analytics – developing a ‘conversational relationship with data’
Using Big Data to make operations more efficient, improve competitiveness and increase revenue is not about generating traditional statistics or producing standard reports.
Just as important as systems to collect and store data are systems to analyze and extract insight from that data. Without insight, you can’t gain new knowledge into your markets, your products and your operations.
When you have this insight at your disposal, you can act faster and with greater probability of success.
Extracting business value from Big Data requires a new approach. We believe that Big Data analytics is an iterative process. We describe it as developing a conversational relationship with your data. Analytics becomes a continuous improvement loop, which uses the results of analyses to frame better, more meaningful analyses, which, in turn, produce more definitive results. When results are available in minutes, analysts can ask, “What if?”
When properly applied, Big Data analytics enables business leaders to:
- Understand market reaction and brand perception
- Identify key buying factors
- Segment populations to customize actions
- Enable experimentation
- Accurately predict outcomes
- Reinvent and enhance inventory and supply chain systems and processes
- Disrupt their industries, gain an edge over competitors and enable new business models
Big Data already proved its game-changing power during the 2012 U.S. presidential election. Obama campaign chairman Jim Messina said: “We were going to demand data on everything, we were going to measure everything…We were going to put an analytics team inside of us to study us the entire time to make sure we were being smart about things.”
And, in fact, Big Data analytics helped the Obama campaign ratchet up the three key levers in any election: voter registration, persuasion and turnout. Rolling Stone magazine singled out Messina and the campaign’s CTO, Harper Reed, as two among a handful of unsung heroes in Obama’s victory.
You can hear more about how HP Vertica contributed to the high-tech strategy behind Obama’s reelection in a recent webinar featuring Chris Wegrzyn, director of data architecture for the Democratic National Committee.
The traditional data warehouse won’t get it done
The concept of the data warehouse evolved in the 1980s. Then, data warehouses were simply databases into which data from multiple sources was consolidated for the purpose of query and reporting. But today, these systems fall short when confronted with the volume, velocity and variety of Big Data. Why? They fail to enable the conversational approach to data required by Big Data analytics.
Traditional databases and data warehouses don’t easily scale to the hundreds of terabytes or even petabytes needed for many Big Data applications. Data is often not compressed, so huge amounts of storage and I/O bandwidth are needed to load, store and retrieve data. Data is still stored in tables by row, so access to a single data element through many rows—a common operation in business analytics—requires retrieving practically all of the data in a dataset to extract the specific element(s) needed. That strains I/O bandwidth and extends processing time. We have seen cases where the velocity of incoming data exceeds the capacity of the system to load it into the database, and queries produce answers in hours rather than the seconds or minutes needed for iterative business analytics. As a result, systems cost too much to maintain, and they fail to deliver the insight business leaders seek.
Take sentiment analysis, for example. The goal is to extract meaningful information from unstructured data so results can be stored in databases and analyzed. But the formats of resulting data are less predictable, more varied and subject to change during iterative analytics. This requires frequent changes to relational database structure and to processes that load data into them. For IT, it means the iterative approach to extracting business insight from Big Data requires new approaches, new tools and new skills.
Challenges for business leaders
Big Data is not just a technical challenge. Gaining and applying business insight compels business leaders to adopt new and disruptive ways of thinking and working.
Successful leaders we have known in data-driven organizations become more familiar with the sources of data available to them. Rather than asking IT what information is available in the database, they view information as a key competitive asset and explore how insights might be extracted from it to offer immediate and sustainable competitive advantage.
A solution for Big Data analytics
HP Vertica Analytics Platform is a new kind of database designed from the ground up for business analytics at the scale of Big Data. Compared to traditional databases and data warehouses, it drives down the cost of capturing, storing and analyzing data. And it produces answers 50 to 1,000 times faster to enable the iterative, conversational analytics approach needed.
- HP Vertica Analytics Platform compresses data to reduce storage costs and speed access by up to 90 percent.
- It stores data by columns rather than rows and caches data in memory to make analytic queries 50 to 1,000 times faster.
- It uses massively parallel processing (MPP) to spread huge data volumes over any hardware, including low-cost commodity servers.
- It uses data replication, failover and recovery to achieve automatic high availability.
- It includes a pre-packaged, in-database analytics library to handle complex analytics and development framework.
- It supports the R statistical programming language so analysts can create user-defined analytics inside the database.
- It dynamically integrates with Hadoop to analyze large sets of structured, semi-structured and unstructured data.
HP Vertica Analytics Platform means better, faster business insight at less cost.
Test drive the HP Vertica Analytics Platform at www.vertica.com/evaluate.
 “Big Data: The Management Revolution,” Andrew McAfee and Erik Brynjolfsson, Harvard Business Review, October, 1012.
“Data, data everywhere,” The Economist, Feb 25, 2010.
Facebook key facts.
 “Big data: The next frontier for innovation, competition, and productivity,” The McKinsey Global Institute, June 2011.