Vertica

Author Archive

Big Data Analytics without Big Data Complexity

New analytics deployments can be complex, taking up to 18 months to implement and optimize. The complexity of maintaining and integrating these environments often results in missed deadlines, incomplete projects, increased costs, and lost opportunities. In fact, only 32 percent* of application deployments are rated as “’successful”’ by organizations.

To remove this Big Data complexity, we are pleased to announce the general availability of the HP AppSystem for Vertica. Following through on the initial announcement at HP Discover as part of the HP AppSystems portfolio, the HP AppSystem for Vertica ensures system performance and reduces implementation time from months to a matter of hours.

But what is an AppSystem and is it right for you?

Built on the HP Converged Infrastructure, the new HP AppSystem for Vertica is a fully pre-integrated technology stack that includes a specifically optimized hardware configuration, factory pre-loaded OS, and the HP Vertica Analytics Platform environment.

HP AppSystem for Vertica is ideal for organizations interested in accelerating time-to-business value with high-performance, massively scalable analytics at each layer of IT infrastructure — server, storage, network, and management. As a result, you can scale seamlessly, while adding capacity as your analytics needs for Big Data evolve.
We encourage you to learn more about the HP AppSystem for Vertica — and get started removing complexity to capitalize on your big data analytics initiatives.

* = CHAOS Summary 2009, Jim Johnson, Standish Group, April 2009

A Method to the March Madness?

The NCAA 2013 Men’s Basketball March Madness Tournament officially tips off on Thursday, March 21st.  For those of you unfamiliar with the tournament, 64 teams from colleges and universities across the United States compete for the championship, awarded to just one winner in early April. Buzzer-beating upsets are as common as fan face paint and schools from parts unknown, making it challenging to choose the winner in your office tournament bracket.

To give you a sense of the tournament’s popularity and appeal, according to USA Today “Last year’s championship game alone had about 20 million TV viewers. The overall tournament had 52 million visits across March Madness on Demand’s broadband and mobile platforms.”

So, what is the buzz on this year’s tournament on Twitter, and can social sentiment foreshadow ultimate success? A small team of us here — representing Autonomy, HP Vertica, and HP Information Management & Analytics (IM&A) — set out to answer that very question by building a March Madness Sentiment Tracker Demo to track the “sentiment of the crowd.”

The Technology Behind the March Madness Sentiment Tracker

Using HP Labs’ Academy Awards Meter demo as our guide, we created a framework in roughly a week based on Autonomy, HP Vertica, and Tibco Spotfire.

We unveiled the demonstration at the Sloan MIT Sports Analytics Conference. See Chris Selland’s blog post from that event and his participation on the Big Data in Sports panel.

Since the MIT Sports Analytics Conference was held weeks before the tourney had begun, we first collected roughly half a million Tweets using Autonomy’s data aggregator from February 20th to March 1st. The Tweets included anything related to the Top 25 ranked teams at the time as well as the top scorers. Our colleagues at Autonomy also used Autonomy IDOL to structure and sentiment to the data. For example, a Tweet like “I am excited to watch my Jayhawks win #MarchMadness!” would carry a positive sentiment. However, a Tweet like “I hate #MarchMadness – it interrupts my favorite TV shows!” would carry a negative sentiment.

Our very own Will Cairns, who presented on the main stage of the MIT Sloan Sports Analytics Conference, loaded the data into the HP Vertica Analytics Platform, ran some analytical queries and provided an output file for HP IM&A to create the visualization front-end with Tibco SpotFire. That is where the insight (and conversation with the data) began to happen.

Visualizing the Sentiment and Lessons Learned

HP IM&A created impressive visualizations that helped us (and attendees) to explore:

  • Volume of tweets by team
  • Volume of tweets by player
  • Positive, negative, and neutral sentiment groupings
  • Volume of tweets by U.S. city and by worldwide country
  • Volume of tweets by language (English, French, Spanish, etc.)

They say that a picture is worth a 1,000 words. Well, the visualizations provided for great conversation – some results were not surprising such as NCAA perennial teams steeped with rich history, such as Kansas and Duke, leading the total volume of tweets. Some players ranked higher than others in volume of tweets, leading attendees to observe, “ Well, Trey Burke had a monster game the other night, so that makes sense.”

But why did Chicago rank as the U.S. city with the highest number of tweets, despite having no college or university from Illinois team ranked in the top 25 at the time? Well, the Big 12 is one of the more competitive conferences in the country this season, and Chicago area schools (such as the University of Illinois) play Wisconsin, Indiana, Michigan, and Michigan State. It’s also one of the top five major media hubs in the country.

Spirited debates and conversations aside, most importantly, this exercise clearly demonstrated the power of sentiment for a range of use cases in nearly every industry with a major product, brand, or service. In the telecommunications industry, network providers are actively tracking social media channels to measure customer satisfaction. If there is an issue with the service, say in a certain region of the country, you better believe that customer service will soon receive calls to that very point. Using sentiment analysis to quickly address issues by, say, adding more network bandwidth and improving service can help reduce service costs, improve customer satisfaction, and minimize churn.

But can sentiment foreshadow success? I guess you will have to tune into the games to find out, while tracking your favorite social media channel. Better yet, why not use HP Vertica’s tight integration with R to develop a statistical model based on data available from ESPN and the likes on hard basketball statistics, such as field goal percentage, points allowed, head-to-head scoring, and more? You could correlate that statistical data with sentiment data trending from Twitter.

Hmm…that sounds like a perfect complement to our March Madness Sentiment Tracker demo. Stay tuned for more details or share your thoughts on how you could marry sentiment data with statistical data to ultimately predict this year’s winner.

Big Data Value at Mobile World Congress 2013

Barcelona, Spain is known for its tapas, futbol, and Gaudi-inspired architecture. However, as host to the world’s largest annual mobile industry event — Mobile World Congress  — the city has also become synonymous with all things mobile.

Nearly 80,000 attendees (72,000 from 200 countries to be precise — an all-time high) were blanketed with announcements and presentations about the latest gadgets and devices, the wireless enablement of mainly everything, Machine to Machine (M2M), and, largely, the growth and value of managing and analyzing Big Data.

It’s no surprise to us, given that 7 of the top 10 communications and service providers (CSPs) trust the HP Vertica Analytics Platform to manage and analyze terabytes to petabytes of data (i.e., Big Data) in near-real time.

But why are CSPs managing and analyzing all of this Big Data — in other words, where is the value?

Miguel Carrero, GM, Actionable Customer Intelligence, HP and I covered this very question at the show during a short video interview. Miguel also covered this topic in more detail in the day three editorial recap of Mobile World Daily 2013.

At the HP booth (as well as the transformation workshops), CSPs met with HP CMS (Communications and Media Solutions) to learn how the HP Smart Profile Server Solution — powered by the HP Vertica Analytics Platform — helps them realize a range of real-world use cases with real business value:

  • Targeted product and marketing offers – Gain complete contextual insight into your customers’ needs then take action to improve customer satisfaction and achieve better retention rates.
  • Network optimization – Improve your network engineering and planning and user experience via optimized network utilization and real-time response to traffic congestion situations.
  • “Bill shock” mandate – Provide pre-paid and subscriber mobile customers with visibility into their mobile usage, including voice, data, and roaming.
  • New business model enablement – Capture the real-time business value of each of your customers and leverage it via new collaborative business models — increasing upsell opportunities and delivering prioritized resolutions.

What is your organization’s most prevalent use case for managing and analyzing Big Data?

Sensor Data and the Internet of Things: When Big Data Gets Really Big

I remember back in the 1990s when Sun Microsystems claimed that “Java anywhere” would even make refrigerators intelligent to know when you were out of milk, triggering a series of events that ultimately resulted in a grocery delivery chain bringing your milk to your door step the very next day.

Fast forward to today. There are millions (and soon billions) of devices that are connected to the Internet — cars, medical equipment, buildings, meters, power grids, and, yes, even refrigerators. These connected devices comprise the Internet of Things (also known as Machine to Machine or M2M).

But why is this important to your world of Big Data analytics?

The Internet of Things is generating an unfathomable amount of sensor data  — data that product manufacturers, particularly, need to manage and analyze to build better products, predict failures to reduce costs, and understand customer behavior to differentiate and improve loyalty.

In fact, a recent report by IDC’s The Digital Universe 2020 forecasts that machine-generated data will increase to 42 percent of all data by 2020, up from 11 percent in 2005.

The use cases are proven and here. Some are even mainstream. Think Progressive Insurance’s Snapshot pay-as-you-drive insurance commercials that have taken over our airwaves. Others are around us, and you may not even know it. Over your next work day, think about how many devices are connected and distributing information just waiting for analysis — your car, train, flight, or bus; traffic lights, road side signs, the elevator and escalator, an ATM, your check-out system.

But, more importantly, join us for our upcoming Webcast: Unlocking the Massive Potential of Sensor Data and the Internet of Things on Thursday, February 14th at noon EST (9:00AM PST).

We look forward to continuing the conversation and share these and other emerging use cases, real-world case studies, and a technology perspective to help you prepare for this massive opportunity ushered in by sensor data and the Internet of Things!

Top 4 Considerations When Evaluating a Data Analytics Platform

From fraud detection to clickstream analytics to simply building better products or delivering a more optimal customer experience, Big Data use cases are abounding with analytics at the core.

With a solid business or use case in place, the next step that organizations typically take is to investigate and evaluate the appropriate set of analytics technology from which to accomplish their analysis, often starting with a data analytics platform. But what are the requirements from which to base your evaluation?

The Winter Corporation, the large-scale data experts, just finalized an in-depth white paper (The HP Vertica Analytics Platform: Large Scale Use and Advanced Analytics) that reflects the results and findings through evaluation, independent research, customer and employee interviews, and documentation review.

Intended for a more technical audience, this white paper focuses on key evaluation criteria that your organization can use as a guide as you conduct your own evaluation.

 

 

Winter Corporation identified these key feature areas as critical for any data analytics platform:

1. Architecture
• Column store architecture
• Shared nothing parallelism
• Cluster size and elasticity
• Smart K-Safety based availability
• Hybrid storage model
• Multiple database isolation modes
• Both bulk load and trickle feed

2. Performance
• Extensive data compression and data encoding
• Read-optimized storage
• Highly parallel operation
• Storage of multiple projections
• Automatic physical database design

3. General Useful and Noteworthy Features for Large-Scale Use
• Export-import
• Backup/restore
• Workload analyzer
• Workload management
• Role-based security

4. Extensions for Advanced Analytics
• SQL extensions
• Built-in functions
• User-defined extensions
• Flexibility in accessing and analyzing all data (structured, semistructured, or unstructured)

Finally, once you have evaluated and confirmed that the data analytics platform meets your feature and technology requirements, you want to hear from other organizations that have deployed large-scale analytics’ initiatives in real-world environments.

The white paper concludes with a write-up on how Zynga, a social game services company with more than 240 million users of its online games, stores the actions of every player in every game — about 6 TB per day of data — in near-real time in the HP Vertica Analytics Platform. No matter where in the world a game event occurs, the data can be retrieved via a report or query from the central HP Vertica database no more than five minutes later.

Big Data, Information Optimization, and Bulldozers at HP Discover

Nearly 10,000 HP customers, partners, prospects, and employees met in Frankfurt, Germany for HP Discover, which was abuzz with major announcements (including “Bulldozer” or HP Vertica Analytics Platform 6.1) and spirited discussions around Big Data and Information Optimization.

The blogosphere, news feeds, and headlines are dominated by the challenges and perceived virtues of Big Data. But is it all just hype and how can companies really monetize, and avoid getting swallowed up by, all this Big Data? And what does Information Optimization mean and how is Big Data and Information Optimization related, if at all?

The HP Vertica team played a central role in answering these very questions throughout the conference in sessions, press and media briefings, CIO tours, blogging sessions, news video interviews, and even livestream Twitter chats.

Opening Session on Information Optimization

 

To kick off the conference, Colin Mahony joined panelists Professor Andrew McAfee of the MIT Sloan School of Management, John Sontag of HP Labs, and Paul Miller of HP Enterprise Group to offer their perspectives on the potential impact of Big Data on every organization — even going so far as to propose that those businesses who avoid Big Data (or, at a minimum, do not implement a strategy) risk becoming extinct to more nimble competitors.

June Manley of HP Software also shared some interesting results from an independent study conducted by Coleman Parkes in October 2012 with senior business executives and senior technology executives:

  • 84% of executives said that they DO NOT have the right information at the point of need that enables them to obtain actionable insight that drives the right business outcome
  • Only 10% of executives said their organization currently incorporates unstructured data into their enterprise insights, processes, and strategy

Enter Information Optimization, enabling enterprises to harness the power of Big Data by storing, managing, understanding and acting upon the variety, velocity, and volume of organizational data to drive maximum Return on Information. In summary, it’s the HP solution that enables enterprises to monetize all of their Big Data.

Professor McAfee concluded this session by challenging the audience to avoid listening to HIPPOs (Highest Paid Person’s Opinion) and learn from how companies and organizations are using Big Data to accelerate drug discovery, optimize airline seat and scheduling, and even accurately predict the 2012 presidential election.

Livestream Twitter Chat on Big Data

HP also piloted its first livestream Twitter chat on Big Data. Led by Paul Muller of HP Software, the leading question was: How can you harness the true power of business data to stay competitive while remaining compliant?

I joined Randy Cairns and Brian Weiss of HP Autonomy in providing our insights and answering a steady flow of questions from bloggers and Twitter followers (#infoopt). Topics covered how organizations can stay compliant and derive value from the unfathomable amount of unstructured data, the security concerns introduced with such broad access to Big Data, and which data (structured, unstructured, semistructued) to retain and use and which data to confidently discard.

 

For our responses and perspectives, see the archived video chat.

 

Announcing “Bulldozer” HP Vertica Analytics Platform 6.1

We rolled out our “Bulldozer” HP Vertica Analytics Platform 6.1 at the show to much fanfare and interest from companies in need of maximum speed, performance, and scale to power their Big Data analytics initiatives. For an overview, see Luis Maldonado’s blog post as well as Colin Mahony’s video interview with Yahoo!

Those are our perspectives, but we always learn the most from you, our community. So, we welcome your thoughts and feedback.

Vertica Inside

Nearly everyone inside of high tech is familiar with Intel Inside. Even those outside of high tech are familiar with its jingle, heard in living rooms around the world during televised events.

But did you know that the HP Vertica Analytics Platform can also be found “inside” or embedded into a growing number of software solutions as the real-time analytics engine?

At only 80 Megabytes and with a standard SQL engine, the HP Vertica Analytics Platform takes only two minutes to install and just a day or two for you to compare its advantages over OLTP databases. The HP Vertica Analytics Platform scales up and down with ease running on shared single-node appliances to clusters of hundreds of servers both on-premise and in the Cloud.  With standard drivers to help it fit right in with your application and reference architectures with all the major ETL and BI vendors, it’s all about the flexibility to align with any deployment, licensing, and pricing model.

But what kind of results can you expect?

On the infrastructure side, OEM partners have seen as much as a 1,000x query and 100x load performance improvements. They have been able to store more detailed data in the same hardware footprint, take on customers with higher data volumes and rates than ever before, and give their customers more real-time, interactive, and ad-hoc access to the data. On the analytics end, they can extend HP Vertica’s built-in analytics with their own algorithms or take advantage of our platform’s SQL99 extensions or integration points into R and SAS.

And what about the business benefits?

OEM partners ultimately choose the HP Vertica Analytics Platform so that they can:

  • Address the needs of larger customers, particularly when they run into scalability issues with their current database
  • Improve the customer experience with much more interactive, response times
  • Offer new capabilities to their customers, such as ad-hoc query access (which is often restricted because good performance could not be guaranteed with their former database)
  • Lower administration costs around the database for both the OEM partner and their customers (HP Vertica is essentially zero administration after deployment – end customers don’t even need to know that’s there!)

Get started today

Sign up for a 30-day evaluation license of HP Vertica Analytics Platform today, and let’s talk about how the HP Vertica Analytics Platform has everything you need to enhance your software solution with real-time analytics—except the jingle.