Vertica

Archive for the ‘big data’ Category

Taking a Moonshot at Big Data Analytics for Everyone

HP Vertica is very excited about Monday’s announcement of the HP Moonshot system.

Why? Because we believe that the combination of the HP Vertica Analytics Platform running on the HP Moonshot Servers offers a truly game-changing value proposition for a variety of customers, and new segments of the market.

Moonshot is, simply put, a groundbreaking system which offers customers the ability to rapidly deploy, scale and manage with dramatically lower space and energy constraints. While traditional IT services that support business functions will continue to be served by general purpose server infrastructure, a new computing platform is required for specialized workloads that can deliver innovative solutions to market at unprecedented speed and scale.

 

We’ve already successfully tested the HP Vertica Analytics Platform on HP Moonshot Servers, and achieved very comparable performance to traditional Big Data Analytics hardware across certain performance ranges, which for a large segment of the market is more than sufficient to handle their Big Data Analytics loads – while offering very significant potential cost, space and energy savings.

Running Vertica on Moonshot offers yet another proof point of the unmatched value provided by HP’s combination of Information Optimization solutions, and a great example of the opportunity created by innovation that makes us so excited to be a part of the greater OneHP.

To learn more about HP Project Moonshot, visit http://www.hp.com/go/moonshot

A Method to the March Madness?

The NCAA 2013 Men’s Basketball March Madness Tournament officially tips off on Thursday, March 21st.  For those of you unfamiliar with the tournament, 64 teams from colleges and universities across the United States compete for the championship, awarded to just one winner in early April. Buzzer-beating upsets are as common as fan face paint and schools from parts unknown, making it challenging to choose the winner in your office tournament bracket.

To give you a sense of the tournament’s popularity and appeal, according to USA Today “Last year’s championship game alone had about 20 million TV viewers. The overall tournament had 52 million visits across March Madness on Demand’s broadband and mobile platforms.”

So, what is the buzz on this year’s tournament on Twitter, and can social sentiment foreshadow ultimate success? A small team of us here — representing Autonomy, HP Vertica, and HP Information Management & Analytics (IM&A) — set out to answer that very question by building a March Madness Sentiment Tracker Demo to track the “sentiment of the crowd.”

The Technology Behind the March Madness Sentiment Tracker

Using HP Labs’ Academy Awards Meter demo as our guide, we created a framework in roughly a week based on Autonomy, HP Vertica, and Tibco Spotfire.

We unveiled the demonstration at the Sloan MIT Sports Analytics Conference. See Chris Selland’s blog post from that event and his participation on the Big Data in Sports panel.

Since the MIT Sports Analytics Conference was held weeks before the tourney had begun, we first collected roughly half a million Tweets using Autonomy’s data aggregator from February 20th to March 1st. The Tweets included anything related to the Top 25 ranked teams at the time as well as the top scorers. Our colleagues at Autonomy also used Autonomy IDOL to structure and sentiment to the data. For example, a Tweet like “I am excited to watch my Jayhawks win #MarchMadness!” would carry a positive sentiment. However, a Tweet like “I hate #MarchMadness – it interrupts my favorite TV shows!” would carry a negative sentiment.

Our very own Will Cairns, who presented on the main stage of the MIT Sloan Sports Analytics Conference, loaded the data into the HP Vertica Analytics Platform, ran some analytical queries and provided an output file for HP IM&A to create the visualization front-end with Tibco SpotFire. That is where the insight (and conversation with the data) began to happen.

Visualizing the Sentiment and Lessons Learned

HP IM&A created impressive visualizations that helped us (and attendees) to explore:

  • Volume of tweets by team
  • Volume of tweets by player
  • Positive, negative, and neutral sentiment groupings
  • Volume of tweets by U.S. city and by worldwide country
  • Volume of tweets by language (English, French, Spanish, etc.)

They say that a picture is worth a 1,000 words. Well, the visualizations provided for great conversation – some results were not surprising such as NCAA perennial teams steeped with rich history, such as Kansas and Duke, leading the total volume of tweets. Some players ranked higher than others in volume of tweets, leading attendees to observe, “ Well, Trey Burke had a monster game the other night, so that makes sense.”

But why did Chicago rank as the U.S. city with the highest number of tweets, despite having no college or university from Illinois team ranked in the top 25 at the time? Well, the Big 12 is one of the more competitive conferences in the country this season, and Chicago area schools (such as the University of Illinois) play Wisconsin, Indiana, Michigan, and Michigan State. It’s also one of the top five major media hubs in the country.

Spirited debates and conversations aside, most importantly, this exercise clearly demonstrated the power of sentiment for a range of use cases in nearly every industry with a major product, brand, or service. In the telecommunications industry, network providers are actively tracking social media channels to measure customer satisfaction. If there is an issue with the service, say in a certain region of the country, you better believe that customer service will soon receive calls to that very point. Using sentiment analysis to quickly address issues by, say, adding more network bandwidth and improving service can help reduce service costs, improve customer satisfaction, and minimize churn.

But can sentiment foreshadow success? I guess you will have to tune into the games to find out, while tracking your favorite social media channel. Better yet, why not use HP Vertica’s tight integration with R to develop a statistical model based on data available from ESPN and the likes on hard basketball statistics, such as field goal percentage, points allowed, head-to-head scoring, and more? You could correlate that statistical data with sentiment data trending from Twitter.

Hmm…that sounds like a perfect complement to our March Madness Sentiment Tracker demo. Stay tuned for more details or share your thoughts on how you could marry sentiment data with statistical data to ultimately predict this year’s winner.

The Disruptive Power of Big Data

Aside from the sheer quantity of digital data created every day—about 2.5 exabytes1 —there’s more to Big Data than volume. Big Data offers enterprise leaders the opportunity to dramatically change the way their organizations operate to gain competitive advantage and find new revenue opportunities. But realizing the value Big Data promises requires a new approach. Traditional data warehouses and business intelligence tools weren’t built for the scale of Big Data, and can’t provide insight quickly enough to be useful or even keep up.

But this isn’t just a case of data growth outstripping technology growth. Big Data embodies fundamental differences that necessitate new approaches and new technologies. Big Data takes many forms, three in particular we’ll discuss here:

  • Transactional data
  • Sentiment and perceptual data based on conversations taking place in social media
  • Data from networked sensors—the so-called “Internet of Things”

Transactional Data

As businesses have expanded—and expanded onto the Internet—the volume of business transactions has grown. The Economist reported in 2010 that Wal-Mart processes more than 1 million customer transactions every hour and maintains databases exceeding 2.5 petabytes (million gigabytes)2. Imagine how those numbers have grown since then.

What’s even more critical is that companies can now capture not just sales transactions, but the detailed histories and clickstreams that lead to the sale. From web-based clickstream analysis to call data records, pre- and post-transaction histories are more robust than ever—and our ability to collect, analyze and act on that data must adjust accordingly.

The social media explosion

Today’s online customer has progressed well beyond accessing information. Today’s consumers are not only interacting and collaborating with each other, but they’re talking about and interacting with your brand. Facebook has more than 1 billion active subscribers3, and it’s estimated they share almost 700,000 individual pieces on content every minute. On Twitter, more than a billion tweets go out every two to three days4. (You can watch them mapped geographically in real-time at tweetping.net.)

Product reviews, user communities, forums and blogs allow consumers to generate content that contains critical insight for the business. The proliferation of user-generated content in these social channels has lead to new techniques and tools for “sentiment analysis”—the ability to measure emotion to determine how your company and brand are perceived.

The Internet of Things

The amount of information generated by devices rather than people is also growing explosively.
Mobile devices—and the apps people use on them—regularly broadcast individuals’ location, performance and other factors to the network. Retailers and distributors are using radio frequency identification (RFID), bar and QR codes to track inventory and enhance their supply chain and inventory performance. The healthcare industry seeks to improve care and reduce costs through remote patient monitoring. The automotive industry is embedding sensors in vehicles. And utilities are beginning to rely on smart meters to track usage. McKinsey Global Institute reports that more than 30 million networked sensors are in use in the transportation, automotive, industrial, utilities and retail sectors—and the number is growing by 30 percent every year.5

We recently presented a webinar on the Internet of Things and the Power of Sensor Data, which delves into this exciting area in much more detail.

Disrupting conventional analytics – developing a ‘conversational relationship with data’

Using Big Data to make operations more efficient, improve competitiveness and increase revenue is not about generating traditional statistics or producing standard reports.

Just as important as systems to collect and store data are systems to analyze and extract insight from that data. Without insight, you can’t gain new knowledge into your markets, your products and your operations.

When you have this insight at your disposal, you can act faster and with greater probability of success.

Extracting business value from Big Data requires a new approach. We believe that Big Data analytics is an iterative process. We describe it as developing a conversational relationship with your data. Analytics becomes a continuous improvement loop, which uses the results of analyses to frame better, more meaningful analyses, which, in turn, produce more definitive results. When results are available in minutes, analysts can ask, “What if?”

When properly applied, Big Data analytics enables business leaders to:

  • Understand market reaction and brand perception
  • Identify key buying factors
  • Segment populations to customize actions
  • Enable experimentation
  • Accurately predict outcomes
  • Reinvent and enhance inventory and supply chain systems and processes
  • Disrupt their industries, gain an edge over competitors and enable new business models

Big Data already proved its game-changing power during the 2012 U.S. presidential election. Obama campaign chairman Jim Messina said: “We were going to demand data on everything, we were going to measure everything…We were going to put an analytics team inside of us to study us the entire time to make sure we were being smart about things.”
And, in fact, Big Data analytics helped the Obama campaign ratchet up the three key levers in any election: voter registration, persuasion and turnout. Rolling Stone magazine singled out Messina and the campaign’s CTO, Harper Reed, as two among a handful of unsung heroes in Obama’s victory.

You can hear more about how HP Vertica contributed to the high-tech strategy behind Obama’s reelection in a recent webinar featuring Chris Wegrzyn, director of data architecture for the Democratic National Committee.

The traditional data warehouse won’t get it done

The concept of the data warehouse evolved in the 1980s. Then, data warehouses were simply databases into which data from multiple sources was consolidated for the purpose of query and reporting. But today, these systems fall short when confronted with the volume, velocity and variety of Big Data. Why? They fail to enable the conversational approach to data required by Big Data analytics.

Traditional databases and data warehouses don’t easily scale to the hundreds of terabytes or even petabytes needed for many Big Data applications. Data is often not compressed, so huge amounts of storage and I/O bandwidth are needed to load, store and retrieve data. Data is still stored in tables by row, so access to a single data element through many rows—a common operation in business analytics—requires retrieving practically all of the data in a dataset to extract the specific element(s) needed. That strains I/O bandwidth and extends processing time. We have seen cases where the velocity of incoming data exceeds the capacity of the system to load it into the database, and queries produce answers in hours rather than the seconds or minutes needed for iterative business analytics. As a result, systems cost too much to maintain, and they fail to deliver the insight business leaders seek.

Take sentiment analysis, for example. The goal is to extract meaningful information from unstructured data so results can be stored in databases and analyzed. But the formats of resulting data are less predictable, more varied and subject to change during iterative analytics. This requires frequent changes to relational database structure and to processes that load data into them. For IT, it means the iterative approach to extracting business insight from Big Data requires new approaches, new tools and new skills.

Challenges for business leaders

Big Data is not just a technical challenge. Gaining and applying business insight compels business leaders to adopt new and disruptive ways of thinking and working.
Successful leaders we have known in data-driven organizations become more familiar with the sources of data available to them. Rather than asking IT what information is available in the database, they view information as a key competitive asset and explore how insights might be extracted from it to offer immediate and sustainable competitive advantage.

A solution for Big Data analytics

HP Vertica Analytics Platform is a new kind of database designed from the ground up for business analytics at the scale of Big Data. Compared to traditional databases and data warehouses, it drives down the cost of capturing, storing and analyzing data. And it produces answers 50 to 1,000 times faster to enable the iterative, conversational analytics approach needed.

  • HP Vertica Analytics Platform compresses data to reduce storage costs and speed access by up to 90 percent.
  • It stores data by columns rather than rows and caches data in memory to make analytic queries 50 to 1,000 times faster.
  • It uses massively parallel processing (MPP) to spread huge data volumes over any hardware, including low-cost commodity servers.
  • It uses data replication, failover and recovery to achieve automatic high availability.
  • It includes a pre-packaged, in-database analytics library to handle complex analytics and development framework.
  • It supports the R statistical programming language so analysts can create user-defined analytics inside the database.
  • It dynamically integrates with Hadoop to analyze large sets of structured, semi-structured and unstructured data.

HP Vertica Analytics Platform means better, faster business insight at less cost.


Test drive the HP Vertica Analytics Platform at www.vertica.com/evaluate.


[1] “Big Data: The Management Revolution,” Andrew McAfee and Erik Brynjolfsson, Harvard Business Review, October, 1012.

[2]“Data, data everywhere,” The Economist, Feb 25, 2010.

[3]Facebook key facts.

[4] http://www.mediabistro.com/alltwitter/tweetping_b35247

[5] “Big data: The next frontier for innovation, competition, and productivity,” The McKinsey Global Institute, June 2011.

Big Data Value at Mobile World Congress 2013

Barcelona, Spain is known for its tapas, futbol, and Gaudi-inspired architecture. However, as host to the world’s largest annual mobile industry event — Mobile World Congress  — the city has also become synonymous with all things mobile.

Nearly 80,000 attendees (72,000 from 200 countries to be precise — an all-time high) were blanketed with announcements and presentations about the latest gadgets and devices, the wireless enablement of mainly everything, Machine to Machine (M2M), and, largely, the growth and value of managing and analyzing Big Data.

It’s no surprise to us, given that 7 of the top 10 communications and service providers (CSPs) trust the HP Vertica Analytics Platform to manage and analyze terabytes to petabytes of data (i.e., Big Data) in near-real time.

But why are CSPs managing and analyzing all of this Big Data — in other words, where is the value?

Miguel Carrero, GM, Actionable Customer Intelligence, HP and I covered this very question at the show during a short video interview. Miguel also covered this topic in more detail in the day three editorial recap of Mobile World Daily 2013.

At the HP booth (as well as the transformation workshops), CSPs met with HP CMS (Communications and Media Solutions) to learn how the HP Smart Profile Server Solution — powered by the HP Vertica Analytics Platform — helps them realize a range of real-world use cases with real business value:

  • Targeted product and marketing offers – Gain complete contextual insight into your customers’ needs then take action to improve customer satisfaction and achieve better retention rates.
  • Network optimization – Improve your network engineering and planning and user experience via optimized network utilization and real-time response to traffic congestion situations.
  • “Bill shock” mandate – Provide pre-paid and subscriber mobile customers with visibility into their mobile usage, including voice, data, and roaming.
  • New business model enablement – Capture the real-time business value of each of your customers and leverage it via new collaborative business models — increasing upsell opportunities and delivering prioritized resolutions.

What is your organization’s most prevalent use case for managing and analyzing Big Data?

BDOC – Big Data on Campus

I had a great time speaking at the MIT Sloan Sports Analytics Conference yesterday, and perhaps the most gratifying part of doing a panel in front of a packed house was how many students were in the audience. Having been a bit of a ‘stats geek’ during my college years, I can assure you that such an event, even with a sports theme, would never have drawn such an audience back then.

It was even more gratifying to read this weekend’s Wall Street Journal, with the title Data Crunchers Now The Cool Kids on Campus. Clearly this a terrific time to be studying – and teaching – statistics and Big Data. To quote the article:

The explosive growth in data available to businesses and researchers has brought a surge in demand for people able to interpret and apply the vast new swaths of information, from the analysis of high-resolution medical images to improving the results of Internet search engines.

Schools have rushed to keep pace, offering college-level courses to high-school students, while colleges are teaching intro stats in packed lecture halls and expanding statistics departments when the budget allows.

 

Of course, Big Data training is not just for college students, and at HP Vertica we are working on programs to train both professionals as well as students in conjunction with our colleagues in the HP ExpertOne program. We invite those interested in learning more to contact us – including educational institutions who are interested in adding Big Data training to their curriculum.

Sensor Data and the Internet of Things: When Big Data Gets Really Big

I remember back in the 1990s when Sun Microsystems claimed that “Java anywhere” would even make refrigerators intelligent to know when you were out of milk, triggering a series of events that ultimately resulted in a grocery delivery chain bringing your milk to your door step the very next day.

Fast forward to today. There are millions (and soon billions) of devices that are connected to the Internet — cars, medical equipment, buildings, meters, power grids, and, yes, even refrigerators. These connected devices comprise the Internet of Things (also known as Machine to Machine or M2M).

But why is this important to your world of Big Data analytics?

The Internet of Things is generating an unfathomable amount of sensor data  — data that product manufacturers, particularly, need to manage and analyze to build better products, predict failures to reduce costs, and understand customer behavior to differentiate and improve loyalty.

In fact, a recent report by IDC’s The Digital Universe 2020 forecasts that machine-generated data will increase to 42 percent of all data by 2020, up from 11 percent in 2005.

The use cases are proven and here. Some are even mainstream. Think Progressive Insurance’s Snapshot pay-as-you-drive insurance commercials that have taken over our airwaves. Others are around us, and you may not even know it. Over your next work day, think about how many devices are connected and distributing information just waiting for analysis — your car, train, flight, or bus; traffic lights, road side signs, the elevator and escalator, an ATM, your check-out system.

But, more importantly, join us for our upcoming Webcast: Unlocking the Massive Potential of Sensor Data and the Internet of Things on Thursday, February 14th at noon EST (9:00AM PST).

We look forward to continuing the conversation and share these and other emerging use cases, real-world case studies, and a technology perspective to help you prepare for this massive opportunity ushered in by sensor data and the Internet of Things!

Startup Rink

For years, I’ve enjoyed working at Vertica, part of a culture where developers aren’t encumbered by bureaucracy, there is a true meritocracy, and we focus on efficiently delivering meaningful features to customers. I’ve been impressed through the years by the commitment, hard work, and truly impressive accomplishments of my colleagues. It takes an incredible team to build a product, like the original Vertica Analytics Database (now known as the HP Vertica Analytics Platform), from scratch, and tackle complex distributed systems and scalability challenges — it is also a lot of fun, especially with this group.

After HP acquired Vertica over a year and a half ago, I was glad to see the startup culture continue to thrive. The acquisition did bring about some change, which has overall been very positive. The engineering group has benefited from a wealth of resources at HP, including new toys, mostly in the form of hardware, and newfound relationships with the talented folks at HP Labs and in other business units.

It is my great fortune to work with truly talented developers, who have greatly impacted my personal and career growth. The challenges we’ve faced have worked to strengthen their influence. During a recent holiday project, I leaned on lessons learned from my colleagues. Interestingly, the project had nothing to do with my profession.

What does building a backyard, or, in my case front yard, skating rink have to do with a startup experience?

For starters, you hear lots of reasons why you shouldn’t do it. Building a rink is an impractical project, especially in my geographical location. It is relatively expensive compared to skating at a public rink — the cost is roughly what many pay for a few months of cable, but for something that you don’t mind your kids doing for hours each day. It is a lot of work. I call it exercise, something I need more of this time of year. At best, temperatures will remain cold enough to sustain five or six weeks of skating. As I got started, I heard all about how the ground didn’t freeze at all last winter.

To complete a project like this one must filter criticism appropriately. The folks at my local box store were very helpful in improving my rink design while others contributed only negative comments. I’m certain a good many of my neighbors think I am crazy. I was a little concerned when two fire engines came down my street while I was flooding the rink. It turns out that they were carrying Santa Claus on display for kids; his sleigh must have been getting tuned for his big day.

front_yard_rinkPerhaps most importantly, you have to be able to rebound when things don’t go as planned. I broke my back — at least it felt that way — framing the rink. What I didn’t count on was a lot of rain, followed by a fair amount of snow. These conditions added additional weight to the rink and made the ground extremely soggy (it was mush to a depth of more than one foot in some areas). Consequently, the deep end of the rink — the ground isn’t perfectly level — burst at one corner.

I’m certain that I looked crazed as I hurried to mend the damage before the rink fell apart completely. Once things stabilized, I could see that the ground wasn’t holding. The stakes were leaning and the rink was in great jeopardy. I felt defeated. I thought about giving up. I’d invested a lot of time and energy and wasted some money on this foolish project. Comments from the naysayers filled my head. But, as I said earlier, I’ve had the good fortune of working on challenging projects with colleagues who know how to make things work in the face of adversity. I didn’t need to consult them. I knew how they’d react. I’ve seen the same scenario play out dozens of times at work. After I cleared my head and got a pep talk from my wife I doubled down my efforts and made a serious attempt to salvage the rink. There was no guarantee of success—things looked bleak.

Thankfully, hard work paid off. It usually does, but there are times when, despite good intentions and best efforts, things don’t work out as intended. When that happens you’re left with valuable lessons learned. And, in that case, next year’s rink will be a success.

shooting_goalA few days after the rink was repaired Mother Nature did her part. The rink has been in operation for a couple of days now. Already, the work has been worthwhile. My family has had some very memorable times out there. Like, the time my three year old daughter amazed us with her on-ice impression of Prof. Hinkle chasing Frosty down a hill as she laughed hysterically or watching my five-year-old son give my wife a celebratory hug after imagining winning the Stanley Cup for the 1,000th time with another amazing goal.

With any luck, we’ve got a few more weeks to enjoy the cold weather. Now I’ve got to head out to resurface the ice with the homeboni I built (see image) so there’s a fresh sheet for the kids to skate on tomorrow.

Homeboni