Vertica

Author Archive

Big Data Analytics: What’s Old is New Again

Big Data AnalyticsMany consider Big Data analytics to be a new paradigm. In reality, the analytics of massive amounts of data has been in practice for years, particularly in the financial services, communications, and manufacturing industries. Interestingly, one of the early pioneers was UPS, which used analytics in the 50’s to improve operations.

At the turn of the century, the Internet kicked off the second wave of Big Data analytics. Communications companies needed to better understand network traffic to plan for growth and management. The department with the largest consumption in this era was marketing. Consumer-focused organizations needed to leverage all of the behavioral nuggets about their clients.

The explosion of intelligent mobile devices and the current online social phenomenon have taken data volumes to unprecedented levels. To keep up, organizations explored using sample data and limiting the length of historical data in their data sets. Many companies discovered that they were losing revenue due to the inaccuracy caused by these shortcuts. They needed to find ways to include as much data as possible, incorporate web/mobile interactions in real time, and provide analytic results faster to create offers at time of engagement.

The Evolution of Big Data Analytics
Technology has evolved to address this. The analytics market has matured, creating more choices than there were a decade ago. SAS still has the predominate share but sees competitors encroaching on the market that they created. There are many database options that customers have deployed to improve the performance of these analytics. These options did a great job of improving query times, but they did not improve the overall performance of the entire process.

Leading firms today have realized that a large portion of the work required is in the preparation of data. There are many tools to help with data preparation, but breaking the process up increases the expense and transfer time. The secret sauce to Big Data performance is in the preparation in the database. This provides flexibility to create data sets optimized for the required analytics as the requests are executed. Some refer to this as ELT vs ETL (Extract, Transfer, & Load). ELT reduces the preparation time as well as the analytic processing time. Applying analytics to data sets optimized for the task at hand allows you to focus processing on only the relevant data and, thus, more of the relevant data. Key clients have realized competitive advantage by implementing this process, achieving greater accuracy and faster results while simultaneously saving millions of operational expenses.

Big Data Value in Japan, No Translation Necessary

Last week, I had the opportunity to present at the Gartner BI Summit in Tokyo. With knowledge of merely a handful of Japanese terms, I arrived in this beautiful country with mild heartburn that my presentation would be somehow misinterpreted and fall flat. The sessions were teeming with representatives from organizations across Japan eager to understand if Big Data was valuable or simply another passing technology fad.

Recently celebrating their 50th anniversary in the country, HP Japan was well represented at the event. My HP Vertica counterpart on the ground reinforced the need to emphasize business value, noting the growing demand for Big Data solutions from nearly every industry, particularly automotive, telecommunications, and railways/transportation. However, before technology decisions are made, Japan businesses want concrete evidence that they can either save money, make money, or differentiate from their competitors — not unlike businesses here in the states.

The title of my presentation was The New Economics of Enterprise Data Warehousing, based on a recently published research report from GigaOM. The general message is that traditional enterprise data warehouses cannot, and were never built to, handle the variety, volume, and velocity of Big Data — mainly because Big Data in its truest sense really didn’t exist back in the 80’s and 90’s when those systems were architected. Therefore, a new breed of big data analytics platforms (led by the HP Vertica Analytics Platform) emerged in the past few years that can handle these demands with extreme performance at massive scale, while enabling organizations to achieve true value at an overall lower TCO.

Heads nodded, followed by hushed side conversations in Japanese as attendees heard story after story on how leading organizations — Cardlytics, Guess, KDDI, HP.com, and even the Democratic National Committee — are deriving measurable business value and accomplishing the previously unimaginable with the HP Vertica Analytics Platform (including re-electing an American president).

I didn’t need the two translators (or my colleague) on hand to explain to me that the conference attendees were overall convinced that there is indeed value in all of the Big Data generated around them in Tokyo and other regions of Japan. I left the conference satisfied and amazed by these incredibly polite, organized, and astute people, with an understanding that business value is universally understood, despite the language.

Big Data Analytics without Big Data Complexity

New analytics deployments can be complex, taking up to 18 months to implement and optimize. The complexity of maintaining and integrating these environments often results in missed deadlines, incomplete projects, increased costs, and lost opportunities. In fact, only 32 percent* of application deployments are rated as “’successful”’ by organizations.

To remove this Big Data complexity, we are pleased to announce the general availability of the HP AppSystem for Vertica. Following through on the initial announcement at HP Discover as part of the HP AppSystems portfolio, the HP AppSystem for Vertica ensures system performance and reduces implementation time from months to a matter of hours.

But what is an AppSystem and is it right for you?

Built on the HP Converged Infrastructure, the new HP AppSystem for Vertica is a fully pre-integrated technology stack that includes a specifically optimized hardware configuration, factory pre-loaded OS, and the HP Vertica Analytics Platform environment.

HP AppSystem for Vertica is ideal for organizations interested in accelerating time-to-business value with high-performance, massively scalable analytics at each layer of IT infrastructure — server, storage, network, and management. As a result, you can scale seamlessly, while adding capacity as your analytics needs for Big Data evolve.
We encourage you to learn more about the HP AppSystem for Vertica — and get started removing complexity to capitalize on your big data analytics initiatives.

* = CHAOS Summary 2009, Jim Johnson, Standish Group, April 2009

A Method to the March Madness?

The NCAA 2013 Men’s Basketball March Madness Tournament officially tips off on Thursday, March 21st.  For those of you unfamiliar with the tournament, 64 teams from colleges and universities across the United States compete for the championship, awarded to just one winner in early April. Buzzer-beating upsets are as common as fan face paint and schools from parts unknown, making it challenging to choose the winner in your office tournament bracket.

To give you a sense of the tournament’s popularity and appeal, according to USA Today “Last year’s championship game alone had about 20 million TV viewers. The overall tournament had 52 million visits across March Madness on Demand’s broadband and mobile platforms.”

So, what is the buzz on this year’s tournament on Twitter, and can social sentiment foreshadow ultimate success? A small team of us here — representing Autonomy, HP Vertica, and HP Information Management & Analytics (IM&A) — set out to answer that very question by building a March Madness Sentiment Tracker Demo to track the “sentiment of the crowd.”

The Technology Behind the March Madness Sentiment Tracker

Using HP Labs’ Academy Awards Meter demo as our guide, we created a framework in roughly a week based on Autonomy, HP Vertica, and Tibco Spotfire.

We unveiled the demonstration at the Sloan MIT Sports Analytics Conference. See Chris Selland’s blog post from that event and his participation on the Big Data in Sports panel.

Since the MIT Sports Analytics Conference was held weeks before the tourney had begun, we first collected roughly half a million Tweets using Autonomy’s data aggregator from February 20th to March 1st. The Tweets included anything related to the Top 25 ranked teams at the time as well as the top scorers. Our colleagues at Autonomy also used Autonomy IDOL to structure and sentiment to the data. For example, a Tweet like “I am excited to watch my Jayhawks win #MarchMadness!” would carry a positive sentiment. However, a Tweet like “I hate #MarchMadness – it interrupts my favorite TV shows!” would carry a negative sentiment.

Our very own Will Cairns, who presented on the main stage of the MIT Sloan Sports Analytics Conference, loaded the data into the HP Vertica Analytics Platform, ran some analytical queries and provided an output file for HP IM&A to create the visualization front-end with Tibco SpotFire. That is where the insight (and conversation with the data) began to happen.

Visualizing the Sentiment and Lessons Learned

HP IM&A created impressive visualizations that helped us (and attendees) to explore:

  • Volume of tweets by team
  • Volume of tweets by player
  • Positive, negative, and neutral sentiment groupings
  • Volume of tweets by U.S. city and by worldwide country
  • Volume of tweets by language (English, French, Spanish, etc.)

They say that a picture is worth a 1,000 words. Well, the visualizations provided for great conversation – some results were not surprising such as NCAA perennial teams steeped with rich history, such as Kansas and Duke, leading the total volume of tweets. Some players ranked higher than others in volume of tweets, leading attendees to observe, “ Well, Trey Burke had a monster game the other night, so that makes sense.”

But why did Chicago rank as the U.S. city with the highest number of tweets, despite having no college or university from Illinois team ranked in the top 25 at the time? Well, the Big 12 is one of the more competitive conferences in the country this season, and Chicago area schools (such as the University of Illinois) play Wisconsin, Indiana, Michigan, and Michigan State. It’s also one of the top five major media hubs in the country.

Spirited debates and conversations aside, most importantly, this exercise clearly demonstrated the power of sentiment for a range of use cases in nearly every industry with a major product, brand, or service. In the telecommunications industry, network providers are actively tracking social media channels to measure customer satisfaction. If there is an issue with the service, say in a certain region of the country, you better believe that customer service will soon receive calls to that very point. Using sentiment analysis to quickly address issues by, say, adding more network bandwidth and improving service can help reduce service costs, improve customer satisfaction, and minimize churn.

But can sentiment foreshadow success? I guess you will have to tune into the games to find out, while tracking your favorite social media channel. Better yet, why not use HP Vertica’s tight integration with R to develop a statistical model based on data available from ESPN and the likes on hard basketball statistics, such as field goal percentage, points allowed, head-to-head scoring, and more? You could correlate that statistical data with sentiment data trending from Twitter.

Hmm…that sounds like a perfect complement to our March Madness Sentiment Tracker demo. Stay tuned for more details or share your thoughts on how you could marry sentiment data with statistical data to ultimately predict this year’s winner.

Big Data Value at Mobile World Congress 2013

Barcelona, Spain is known for its tapas, futbol, and Gaudi-inspired architecture. However, as host to the world’s largest annual mobile industry event — Mobile World Congress  — the city has also become synonymous with all things mobile.

Nearly 80,000 attendees (72,000 from 200 countries to be precise — an all-time high) were blanketed with announcements and presentations about the latest gadgets and devices, the wireless enablement of mainly everything, Machine to Machine (M2M), and, largely, the growth and value of managing and analyzing Big Data.

It’s no surprise to us, given that 7 of the top 10 communications and service providers (CSPs) trust the HP Vertica Analytics Platform to manage and analyze terabytes to petabytes of data (i.e., Big Data) in near-real time.

But why are CSPs managing and analyzing all of this Big Data — in other words, where is the value?

Miguel Carrero, GM, Actionable Customer Intelligence, HP and I covered this very question at the show during a short video interview. Miguel also covered this topic in more detail in the day three editorial recap of Mobile World Daily 2013.

At the HP booth (as well as the transformation workshops), CSPs met with HP CMS (Communications and Media Solutions) to learn how the HP Smart Profile Server Solution — powered by the HP Vertica Analytics Platform — helps them realize a range of real-world use cases with real business value:

  • Targeted product and marketing offers – Gain complete contextual insight into your customers’ needs then take action to improve customer satisfaction and achieve better retention rates.
  • Network optimization – Improve your network engineering and planning and user experience via optimized network utilization and real-time response to traffic congestion situations.
  • “Bill shock” mandate – Provide pre-paid and subscriber mobile customers with visibility into their mobile usage, including voice, data, and roaming.
  • New business model enablement – Capture the real-time business value of each of your customers and leverage it via new collaborative business models — increasing upsell opportunities and delivering prioritized resolutions.

What is your organization’s most prevalent use case for managing and analyzing Big Data?

Sensor Data and the Internet of Things: When Big Data Gets Really Big

I remember back in the 1990s when Sun Microsystems claimed that “Java anywhere” would even make refrigerators intelligent to know when you were out of milk, triggering a series of events that ultimately resulted in a grocery delivery chain bringing your milk to your door step the very next day.

Fast forward to today. There are millions (and soon billions) of devices that are connected to the Internet — cars, medical equipment, buildings, meters, power grids, and, yes, even refrigerators. These connected devices comprise the Internet of Things (also known as Machine to Machine or M2M).

But why is this important to your world of Big Data analytics?

The Internet of Things is generating an unfathomable amount of sensor data  — data that product manufacturers, particularly, need to manage and analyze to build better products, predict failures to reduce costs, and understand customer behavior to differentiate and improve loyalty.

In fact, a recent report by IDC’s The Digital Universe 2020 forecasts that machine-generated data will increase to 42 percent of all data by 2020, up from 11 percent in 2005.

The use cases are proven and here. Some are even mainstream. Think Progressive Insurance’s Snapshot pay-as-you-drive insurance commercials that have taken over our airwaves. Others are around us, and you may not even know it. Over your next work day, think about how many devices are connected and distributing information just waiting for analysis — your car, train, flight, or bus; traffic lights, road side signs, the elevator and escalator, an ATM, your check-out system.

But, more importantly, join us for our upcoming Webcast: Unlocking the Massive Potential of Sensor Data and the Internet of Things on Thursday, February 14th at noon EST (9:00AM PST).

We look forward to continuing the conversation and share these and other emerging use cases, real-world case studies, and a technology perspective to help you prepare for this massive opportunity ushered in by sensor data and the Internet of Things!

Top 4 Considerations When Evaluating a Data Analytics Platform

From fraud detection to clickstream analytics to simply building better products or delivering a more optimal customer experience, Big Data use cases are abounding with analytics at the core.

With a solid business or use case in place, the next step that organizations typically take is to investigate and evaluate the appropriate set of analytics technology from which to accomplish their analysis, often starting with a data analytics platform. But what are the requirements from which to base your evaluation?

The Winter Corporation, the large-scale data experts, just finalized an in-depth white paper (The HP Vertica Analytics Platform: Large Scale Use and Advanced Analytics) that reflects the results and findings through evaluation, independent research, customer and employee interviews, and documentation review.

Intended for a more technical audience, this white paper focuses on key evaluation criteria that your organization can use as a guide as you conduct your own evaluation.

 

 

Winter Corporation identified these key feature areas as critical for any data analytics platform:

1. Architecture
• Column store architecture
• Shared nothing parallelism
• Cluster size and elasticity
• Smart K-Safety based availability
• Hybrid storage model
• Multiple database isolation modes
• Both bulk load and trickle feed

2. Performance
• Extensive data compression and data encoding
• Read-optimized storage
• Highly parallel operation
• Storage of multiple projections
• Automatic physical database design

3. General Useful and Noteworthy Features for Large-Scale Use
• Export-import
• Backup/restore
• Workload analyzer
• Workload management
• Role-based security

4. Extensions for Advanced Analytics
• SQL extensions
• Built-in functions
• User-defined extensions
• Flexibility in accessing and analyzing all data (structured, semistructured, or unstructured)

Finally, once you have evaluated and confirmed that the data analytics platform meets your feature and technology requirements, you want to hear from other organizations that have deployed large-scale analytics’ initiatives in real-world environments.

The white paper concludes with a write-up on how Zynga, a social game services company with more than 240 million users of its online games, stores the actions of every player in every game — about 6 TB per day of data — in near-real time in the HP Vertica Analytics Platform. No matter where in the world a game event occurs, the data can be retrieved via a report or query from the central HP Vertica database no more than five minutes later.

Get Started With Vertica Today

Subscribe to Vertica