Vertica

Archive for March, 2013

Join HP Vertica’s User-Driven Community!

We at HP Vertica are very excited to announce our new user-driven community that will help us better serve our users, partners, and anyone generally interested in learning more about the HP Vertica Analytics Platform. In partnership with GetSatisfaction, community members can now:

  • Start engaging with other customers to establish new and valuable relationships
  • Add context to your past issues and get the best possible answers to your questions
  • Wield your influence as subject matter expert in the community

Now, HP Vertica newbies as well as seasoned database/data analytics veterans and everyone in between can post questions, share ideas, report issues, and give praise. To make this information readily available for your convenient access, all questions are cataloged and searchable.

The new-and-improved community interface will enable you to easily access more information and better communicate with HP Vertica and other users of the HP Vertica Analytics Platform.

We welcome you to join our community by visiting http://community.vertica.com or by accessing the community tab on the side of the www.vertica.com homepage.

Big Data Analytics without Big Data Complexity

New analytics deployments can be complex, taking up to 18 months to implement and optimize. The complexity of maintaining and integrating these environments often results in missed deadlines, incomplete projects, increased costs, and lost opportunities. In fact, only 32 percent* of application deployments are rated as “’successful”’ by organizations.

To remove this Big Data complexity, we are pleased to announce the general availability of the HP AppSystem for Vertica. Following through on the initial announcement at HP Discover as part of the HP AppSystems portfolio, the HP AppSystem for Vertica ensures system performance and reduces implementation time from months to a matter of hours.

But what is an AppSystem and is it right for you?

Built on the HP Converged Infrastructure, the new HP AppSystem for Vertica is a fully pre-integrated technology stack that includes a specifically optimized hardware configuration, factory pre-loaded OS, and the HP Vertica Analytics Platform environment.

HP AppSystem for Vertica is ideal for organizations interested in accelerating time-to-business value with high-performance, massively scalable analytics at each layer of IT infrastructure — server, storage, network, and management. As a result, you can scale seamlessly, while adding capacity as your analytics needs for Big Data evolve.
We encourage you to learn more about the HP AppSystem for Vertica — and get started removing complexity to capitalize on your big data analytics initiatives.

* = CHAOS Summary 2009, Jim Johnson, Standish Group, April 2009

A Method to the March Madness?

The NCAA 2013 Men’s Basketball March Madness Tournament officially tips off on Thursday, March 21st.  For those of you unfamiliar with the tournament, 64 teams from colleges and universities across the United States compete for the championship, awarded to just one winner in early April. Buzzer-beating upsets are as common as fan face paint and schools from parts unknown, making it challenging to choose the winner in your office tournament bracket.

To give you a sense of the tournament’s popularity and appeal, according to USA Today “Last year’s championship game alone had about 20 million TV viewers. The overall tournament had 52 million visits across March Madness on Demand’s broadband and mobile platforms.”

So, what is the buzz on this year’s tournament on Twitter, and can social sentiment foreshadow ultimate success? A small team of us here — representing Autonomy, HP Vertica, and HP Information Management & Analytics (IM&A) — set out to answer that very question by building a March Madness Sentiment Tracker Demo to track the “sentiment of the crowd.”

The Technology Behind the March Madness Sentiment Tracker

Using HP Labs’ Academy Awards Meter demo as our guide, we created a framework in roughly a week based on Autonomy, HP Vertica, and Tibco Spotfire.

We unveiled the demonstration at the Sloan MIT Sports Analytics Conference. See Chris Selland’s blog post from that event and his participation on the Big Data in Sports panel.

Since the MIT Sports Analytics Conference was held weeks before the tourney had begun, we first collected roughly half a million Tweets using Autonomy’s data aggregator from February 20th to March 1st. The Tweets included anything related to the Top 25 ranked teams at the time as well as the top scorers. Our colleagues at Autonomy also used Autonomy IDOL to structure and sentiment to the data. For example, a Tweet like “I am excited to watch my Jayhawks win #MarchMadness!” would carry a positive sentiment. However, a Tweet like “I hate #MarchMadness – it interrupts my favorite TV shows!” would carry a negative sentiment.

Our very own Will Cairns, who presented on the main stage of the MIT Sloan Sports Analytics Conference, loaded the data into the HP Vertica Analytics Platform, ran some analytical queries and provided an output file for HP IM&A to create the visualization front-end with Tibco SpotFire. That is where the insight (and conversation with the data) began to happen.

Visualizing the Sentiment and Lessons Learned

HP IM&A created impressive visualizations that helped us (and attendees) to explore:

  • Volume of tweets by team
  • Volume of tweets by player
  • Positive, negative, and neutral sentiment groupings
  • Volume of tweets by U.S. city and by worldwide country
  • Volume of tweets by language (English, French, Spanish, etc.)

They say that a picture is worth a 1,000 words. Well, the visualizations provided for great conversation – some results were not surprising such as NCAA perennial teams steeped with rich history, such as Kansas and Duke, leading the total volume of tweets. Some players ranked higher than others in volume of tweets, leading attendees to observe, “ Well, Trey Burke had a monster game the other night, so that makes sense.”

But why did Chicago rank as the U.S. city with the highest number of tweets, despite having no college or university from Illinois team ranked in the top 25 at the time? Well, the Big 12 is one of the more competitive conferences in the country this season, and Chicago area schools (such as the University of Illinois) play Wisconsin, Indiana, Michigan, and Michigan State. It’s also one of the top five major media hubs in the country.

Spirited debates and conversations aside, most importantly, this exercise clearly demonstrated the power of sentiment for a range of use cases in nearly every industry with a major product, brand, or service. In the telecommunications industry, network providers are actively tracking social media channels to measure customer satisfaction. If there is an issue with the service, say in a certain region of the country, you better believe that customer service will soon receive calls to that very point. Using sentiment analysis to quickly address issues by, say, adding more network bandwidth and improving service can help reduce service costs, improve customer satisfaction, and minimize churn.

But can sentiment foreshadow success? I guess you will have to tune into the games to find out, while tracking your favorite social media channel. Better yet, why not use HP Vertica’s tight integration with R to develop a statistical model based on data available from ESPN and the likes on hard basketball statistics, such as field goal percentage, points allowed, head-to-head scoring, and more? You could correlate that statistical data with sentiment data trending from Twitter.

Hmm…that sounds like a perfect complement to our March Madness Sentiment Tracker demo. Stay tuned for more details or share your thoughts on how you could marry sentiment data with statistical data to ultimately predict this year’s winner.

The Disruptive Power of Big Data

Aside from the sheer quantity of digital data created every day—about 2.5 exabytes1 —there’s more to Big Data than volume. Big Data offers enterprise leaders the opportunity to dramatically change the way their organizations operate to gain competitive advantage and find new revenue opportunities. But realizing the value Big Data promises requires a new approach. Traditional data warehouses and business intelligence tools weren’t built for the scale of Big Data, and can’t provide insight quickly enough to be useful or even keep up.

But this isn’t just a case of data growth outstripping technology growth. Big Data embodies fundamental differences that necessitate new approaches and new technologies. Big Data takes many forms, three in particular we’ll discuss here:

  • Transactional data
  • Sentiment and perceptual data based on conversations taking place in social media
  • Data from networked sensors—the so-called “Internet of Things”

Transactional Data

As businesses have expanded—and expanded onto the Internet—the volume of business transactions has grown. The Economist reported in 2010 that Wal-Mart processes more than 1 million customer transactions every hour and maintains databases exceeding 2.5 petabytes (million gigabytes)2. Imagine how those numbers have grown since then.

What’s even more critical is that companies can now capture not just sales transactions, but the detailed histories and clickstreams that lead to the sale. From web-based clickstream analysis to call data records, pre- and post-transaction histories are more robust than ever—and our ability to collect, analyze and act on that data must adjust accordingly.

The social media explosion

Today’s online customer has progressed well beyond accessing information. Today’s consumers are not only interacting and collaborating with each other, but they’re talking about and interacting with your brand. Facebook has more than 1 billion active subscribers3, and it’s estimated they share almost 700,000 individual pieces on content every minute. On Twitter, more than a billion tweets go out every two to three days4. (You can watch them mapped geographically in real-time at tweetping.net.)

Product reviews, user communities, forums and blogs allow consumers to generate content that contains critical insight for the business. The proliferation of user-generated content in these social channels has lead to new techniques and tools for “sentiment analysis”—the ability to measure emotion to determine how your company and brand are perceived.

The Internet of Things

The amount of information generated by devices rather than people is also growing explosively.
Mobile devices—and the apps people use on them—regularly broadcast individuals’ location, performance and other factors to the network. Retailers and distributors are using radio frequency identification (RFID), bar and QR codes to track inventory and enhance their supply chain and inventory performance. The healthcare industry seeks to improve care and reduce costs through remote patient monitoring. The automotive industry is embedding sensors in vehicles. And utilities are beginning to rely on smart meters to track usage. McKinsey Global Institute reports that more than 30 million networked sensors are in use in the transportation, automotive, industrial, utilities and retail sectors—and the number is growing by 30 percent every year.5

We recently presented a webinar on the Internet of Things and the Power of Sensor Data, which delves into this exciting area in much more detail.

Disrupting conventional analytics – developing a ‘conversational relationship with data’

Using Big Data to make operations more efficient, improve competitiveness and increase revenue is not about generating traditional statistics or producing standard reports.

Just as important as systems to collect and store data are systems to analyze and extract insight from that data. Without insight, you can’t gain new knowledge into your markets, your products and your operations.

When you have this insight at your disposal, you can act faster and with greater probability of success.

Extracting business value from Big Data requires a new approach. We believe that Big Data analytics is an iterative process. We describe it as developing a conversational relationship with your data. Analytics becomes a continuous improvement loop, which uses the results of analyses to frame better, more meaningful analyses, which, in turn, produce more definitive results. When results are available in minutes, analysts can ask, “What if?”

When properly applied, Big Data analytics enables business leaders to:

  • Understand market reaction and brand perception
  • Identify key buying factors
  • Segment populations to customize actions
  • Enable experimentation
  • Accurately predict outcomes
  • Reinvent and enhance inventory and supply chain systems and processes
  • Disrupt their industries, gain an edge over competitors and enable new business models

Big Data already proved its game-changing power during the 2012 U.S. presidential election. Obama campaign chairman Jim Messina said: “We were going to demand data on everything, we were going to measure everything…We were going to put an analytics team inside of us to study us the entire time to make sure we were being smart about things.”
And, in fact, Big Data analytics helped the Obama campaign ratchet up the three key levers in any election: voter registration, persuasion and turnout. Rolling Stone magazine singled out Messina and the campaign’s CTO, Harper Reed, as two among a handful of unsung heroes in Obama’s victory.

You can hear more about how HP Vertica contributed to the high-tech strategy behind Obama’s reelection in a recent webinar featuring Chris Wegrzyn, director of data architecture for the Democratic National Committee.

The traditional data warehouse won’t get it done

The concept of the data warehouse evolved in the 1980s. Then, data warehouses were simply databases into which data from multiple sources was consolidated for the purpose of query and reporting. But today, these systems fall short when confronted with the volume, velocity and variety of Big Data. Why? They fail to enable the conversational approach to data required by Big Data analytics.

Traditional databases and data warehouses don’t easily scale to the hundreds of terabytes or even petabytes needed for many Big Data applications. Data is often not compressed, so huge amounts of storage and I/O bandwidth are needed to load, store and retrieve data. Data is still stored in tables by row, so access to a single data element through many rows—a common operation in business analytics—requires retrieving practically all of the data in a dataset to extract the specific element(s) needed. That strains I/O bandwidth and extends processing time. We have seen cases where the velocity of incoming data exceeds the capacity of the system to load it into the database, and queries produce answers in hours rather than the seconds or minutes needed for iterative business analytics. As a result, systems cost too much to maintain, and they fail to deliver the insight business leaders seek.

Take sentiment analysis, for example. The goal is to extract meaningful information from unstructured data so results can be stored in databases and analyzed. But the formats of resulting data are less predictable, more varied and subject to change during iterative analytics. This requires frequent changes to relational database structure and to processes that load data into them. For IT, it means the iterative approach to extracting business insight from Big Data requires new approaches, new tools and new skills.

Challenges for business leaders

Big Data is not just a technical challenge. Gaining and applying business insight compels business leaders to adopt new and disruptive ways of thinking and working.
Successful leaders we have known in data-driven organizations become more familiar with the sources of data available to them. Rather than asking IT what information is available in the database, they view information as a key competitive asset and explore how insights might be extracted from it to offer immediate and sustainable competitive advantage.

A solution for Big Data analytics

HP Vertica Analytics Platform is a new kind of database designed from the ground up for business analytics at the scale of Big Data. Compared to traditional databases and data warehouses, it drives down the cost of capturing, storing and analyzing data. And it produces answers 50 to 1,000 times faster to enable the iterative, conversational analytics approach needed.

  • HP Vertica Analytics Platform compresses data to reduce storage costs and speed access by up to 90 percent.
  • It stores data by columns rather than rows and caches data in memory to make analytic queries 50 to 1,000 times faster.
  • It uses massively parallel processing (MPP) to spread huge data volumes over any hardware, including low-cost commodity servers.
  • It uses data replication, failover and recovery to achieve automatic high availability.
  • It includes a pre-packaged, in-database analytics library to handle complex analytics and development framework.
  • It supports the R statistical programming language so analysts can create user-defined analytics inside the database.
  • It dynamically integrates with Hadoop to analyze large sets of structured, semi-structured and unstructured data.

HP Vertica Analytics Platform means better, faster business insight at less cost.


Test drive the HP Vertica Analytics Platform at www.vertica.com/evaluate.


[1] “Big Data: The Management Revolution,” Andrew McAfee and Erik Brynjolfsson, Harvard Business Review, October, 1012.

[2]“Data, data everywhere,” The Economist, Feb 25, 2010.

[3]Facebook key facts.

[4] http://www.mediabistro.com/alltwitter/tweetping_b35247

[5] “Big data: The next frontier for innovation, competition, and productivity,” The McKinsey Global Institute, June 2011.

Big Data Value at Mobile World Congress 2013

Barcelona, Spain is known for its tapas, futbol, and Gaudi-inspired architecture. However, as host to the world’s largest annual mobile industry event — Mobile World Congress  — the city has also become synonymous with all things mobile.

Nearly 80,000 attendees (72,000 from 200 countries to be precise — an all-time high) were blanketed with announcements and presentations about the latest gadgets and devices, the wireless enablement of mainly everything, Machine to Machine (M2M), and, largely, the growth and value of managing and analyzing Big Data.

It’s no surprise to us, given that 7 of the top 10 communications and service providers (CSPs) trust the HP Vertica Analytics Platform to manage and analyze terabytes to petabytes of data (i.e., Big Data) in near-real time.

But why are CSPs managing and analyzing all of this Big Data — in other words, where is the value?

Miguel Carrero, GM, Actionable Customer Intelligence, HP and I covered this very question at the show during a short video interview. Miguel also covered this topic in more detail in the day three editorial recap of Mobile World Daily 2013.

At the HP booth (as well as the transformation workshops), CSPs met with HP CMS (Communications and Media Solutions) to learn how the HP Smart Profile Server Solution — powered by the HP Vertica Analytics Platform — helps them realize a range of real-world use cases with real business value:

  • Targeted product and marketing offers – Gain complete contextual insight into your customers’ needs then take action to improve customer satisfaction and achieve better retention rates.
  • Network optimization – Improve your network engineering and planning and user experience via optimized network utilization and real-time response to traffic congestion situations.
  • “Bill shock” mandate – Provide pre-paid and subscriber mobile customers with visibility into their mobile usage, including voice, data, and roaming.
  • New business model enablement – Capture the real-time business value of each of your customers and leverage it via new collaborative business models — increasing upsell opportunities and delivering prioritized resolutions.

What is your organization’s most prevalent use case for managing and analyzing Big Data?

BDOC – Big Data on Campus

I had a great time speaking at the MIT Sloan Sports Analytics Conference yesterday, and perhaps the most gratifying part of doing a panel in front of a packed house was how many students were in the audience. Having been a bit of a ‘stats geek’ during my college years, I can assure you that such an event, even with a sports theme, would never have drawn such an audience back then.

It was even more gratifying to read this weekend’s Wall Street Journal, with the title Data Crunchers Now The Cool Kids on Campus. Clearly this a terrific time to be studying – and teaching – statistics and Big Data. To quote the article:

The explosive growth in data available to businesses and researchers has brought a surge in demand for people able to interpret and apply the vast new swaths of information, from the analysis of high-resolution medical images to improving the results of Internet search engines.

Schools have rushed to keep pace, offering college-level courses to high-school students, while colleges are teaching intro stats in packed lecture halls and expanding statistics departments when the budget allows.

 

Of course, Big Data training is not just for college students, and at HP Vertica we are working on programs to train both professionals as well as students in conjunction with our colleagues in the HP ExpertOne program. We invite those interested in learning more to contact us – including educational institutions who are interested in adding Big Data training to their curriculum.

No, You Do Not Need One Projection Per Query in Vertica

Projections are the Vertica Analytic Databases’s only data structure. Every row from every table stored in Vertica is stored in a projection. There is no such thing as a query which “does not hit the projections.” If someone says those words, you should immediately suspect their motives and/or competence. We are quite open about projections (see previous posts such as this, this and this) and we think they are part of Vertica’s fundamental technical advantage. For those wishing for a more detailed description of projections, please see our VLDB paper from last year.

The idea that Vertica needs a special projection for every query in order to get good performance is just wrong. This rumor is spread as FUD sales tactic by one of our more unscrupulous competitors who knows it is not true and yet promulgates it anyways. We have typically assumed that people see through this transparent tactic, but after reading the same idea repeated by some otherwise credible articles and posts on the internet recently, I feel the need to set the record straight. The reason our competitor’s FUD inducing claim is semi-believable is because it plays on a classic DBA nightmare of full table scans in Row Store systems.

There is a fundamental technical difference between a native Column Store (e.g. Vertica) and a Row Store for ad hoc analysis when you do not have a specialized data structure for that query. In a Row Store, if you don’t have an appropriate index, the system must fallback to a full table scan to retrieve the data. Scanning an entire table’s worth of data is almost always a disastrous amount of I/O for large tables. However, in a Column Store, even if you don’t have an optimal physical structure for a specific query (for example, the optimal projection in Vertica), you simply end up with column scans for those columns referred to in the query.

Furthermore, due to the fact that we built our column storage and execution engine from the ground up with this kind of scenario in mind, our specialized storage format can often avoid reading all the column data from the disk even when a “full column scan” is needed. Along with the other well discussed benefits such as better compression, the fundamental I/O benefit for ad hoc queries is why a Column Store architecture is so much better suited to many data warehouse scenarios than a Row Store.

Get Started With Vertica Today

Subscribe to Vertica