Vertica

Archive for October, 2012

Optimizing Value – Creating a Conversational Relationship with Your Big Data

I spent most of the past week on the road, attending Gartner Symposium in Orlando and then later in the week at Strata Hadoop World in NYC. (For more, see my colleague Jeff Healey’s excellent recap of Hadoop World here.)

In the course of delivering the session ‘Big Data, Turning the information Overload into an Information Advantage,’ which I delivered with my colleague Jerome Levadoux of our sister company Autonomy, and in just walking the events in general, I spoke to many people, and unsurprisingly found the interest level in Big Data continuing to skyrocket.

Some of the most notable comments came from those who had already begun to tackle the Big Data challenge, since so many are trying to uncover the fourth ‘V’ of Big DataValue.

What I continue to hear is that the Value of effectively leveraging Big Data (or as we across HP like to call it ‘Information Optimization’) lies in fundamentally changing the relationship between the organization and the data. In particular, moving from static queries which take minutes, hours or sometimes days to run, to providing nearly-instantaneous answers that lead to more interactive ‘conversations’ with the data, completely changes how business executives perceive their data, and allows them to gain significantly more meaning and value.

Suddenly, it is no longer “specify the reports, set up the queries, run the reports, deliver to the business users” daily or weekly (rinse and repeat), but “I have a question, I need an answer”, which delivered in near-real-time via a platform such as Vertica then leads quickly to follow-on questions, what-if scenarios, and a virtuous cycle that puts the data – and the Analysts/Data Scientists who provide access to it – in a much more strategic and business-critical role.

My colleague Jim Campbell discussed this during his visit to Cloudera’s booth at Hadoop World.

Live from Strata + Hadoop World 2012: Jim Campbell, Vertica from Cloudera.

If you want to take a live look at how Vertica can add game-changing Velocity to your organizations’ conversations with your Big Data, sign up for an Evaluation today.

Observations from Hadoop World 2012

Strata Hadoop World Logo

More than 3,000 attendees converged on the sold-out O’Reilly Strata Conference and Hadoop World 2012 in New York City to gain some clarity on arguably the biggest high-tech megatrend in recent years: Big Data.

From a 100,000-foot view, the majority of attendees—from press to developers to exhibitors to event staff—understood that we are generating a nearly incomprehensible amount of data, really Big Data. And there’s no reason to believe that this Big Data will continue to grow by orders of magnitude, given the proliferation of:

But from my conversations, attendees came to the show to understand how their organization could manage, analyze, and ultimately monetize this Big Data, and, specifically, how Hadoop could help with that effort.

As a newbie to this space, I could relate to the quizzical faces of attendees, barraged with messages claims as the next Big Data solution, but with very different offerings—everything from search engines to hosted solutions to ETL tools to even staffing resources.

Hadoop in itself comprises a uniquely named set of technologies: Hive, Sqoop, Pig, Flume, etc. Despite the unusual terminology, the Hadoop-focused sessions proved educational and featured an impressive range of real-world case studies even large companies (such as Facebook) using Hadoop to store and analyze an impressive amount of Big Data.

But the question still remains: is Hadoop the answer or are there other technologies that can either complement or serve as a better path?

As is often the case when choosing technology, the answer is “It depends on your business need.”

At HP, many of our customers used Hadoop for batch processing before ultimately adopting the HP Vertica Data Analytics Platform to manage and analyze their Big Data for sub-second query response times.

Other customers, particularly with the Hadoop Connector released with HP Vertica Version 6, use the technologies together to seamlessly move data back and forth between Hadoop and HP Vertica.

Which use cases do you feel are a good fit for Hadoop and how can we provide better integration with our platform? Let us know.

We’re passionate about providing the data analytics platform to help you obtain answers from your Big Data questions and add some clarity, in the process.

Big Data is Changing Software and (Product) Development as We Know It

I am often asked about “Big Data”, its use cases, real-world business value and how it will transform various products, services and markets.  This is one of my favorite topics, and I am fortunate in that I get to spend significant amounts of time with our amazing customers and partners who teach me a lot.  I am actually writing this from a plane after a few recent customer meetings that inspired me to share a point of view.

“Big Data” is already having and will continue to have the most impact in products and services where there is an ability to capture information about usage, experience and behavior in a manner that is accepted, yet not disruptive by the consumer of that product or service.  Data warehousing has been around for a long time with regards to retail transactions and purchasing behavior, but usage and experience measurement hasn’t had the same repository equivalent.  It now does, and I believe this will lead to an exponential jump in the quality and variety of products and services that are delivered to consumers.  In fact, this will not only improve existing solutions, but it will spawn entirely new products and services in industries as diverse as entertainment to medical treatments.

While the notion of experience analysis has been around for a long time through various manual observation efforts, focus groups, and survey methods- the results have been fragmented, small, and analyzed in what I’ll call a “basic” manner. Thanks to technology advancements and the resulting cost shifts, massive near real-time “feedback” collection can now be done through automation and sensor technology.  While the prospect of having this information delights any product manager and merchandiser, the challenge of capturing, storing, and analyzing the information at this scale is still foreign to many.

 

There is one community who is embracing this feedback fire hose with greater ease and speed than most- software developers.  Vertica has several ISVs, who are leaving “breadcrumbs” in their code to collect usage information that can be anonymously transferred back to headquarters for very specific feedback on how users of software are interfacing with it.  Their users agree to this data collection and sharing, and the ISVs ensure that it has no impact on the operational performance of their software.

These “breadcrumbs” can measure how long someone spends on a screen, which buttons they clicked on to get there, how successful they were, etc.  For instance, good development organizations analyze the time that a user should get from one place to another, that is, navigation within and between screens.  If and ISVs software is the track, this is the laser measurement for precise timing.

Vertica is an ideal platform to store and analyze this information.  Using Vertica’s advanced analytic and pattern matching capabilities, correlations of usage patterns can be identified and the developers can patch, redesign, or document accordingly to deliver a better experience to end users.  For example, you could quite easily determine that users who spent 3 minutes on one screen, clicked a certain button, spent less than 1 minute on that screen, then quit might not be happy with their experience compared with users who started in the same place but stayed online longer. Further analysis could determine “why” through the more traditional interview techniques to improve the experience.

Why are software developers so eager to embrace this as the early adopters?  Well, one reason is that it gives them direct feedback on their work, without having to get the sometimes editorialized version from sales, support, management and yes even product managers!  Traditionally, most feedback to this community is sparse at best with highly anecdotal sentiment mixed in.  This method can augment that sentiment, (which should still be captured through sales, support, and product management by the way) with very complete data sets.  The product managers at these customers actually love this capability, and many of them are directly interacting and analyzing with the raw data collected.

Software developers also have the ability to make and control their own sensors- pretty cool when you think about it.  The savvy developer is able to create these listening points at various places in their code.  Savvy developers and product managers these days are spending time on these breadcrumbs because while they know they require more work (just as good quality assurance does), the payback is huge and ultimately can save them a lot of time.  Recently I visited one of our customers that develops enterprise software and they are piloting a project in this area that already has 8 Billion rows of this type of information- now that’s bigger than a breadbox!

This capability is not limited to SaaS vendors (although they certainly have more control and an easier time collecting the data).  Our online gaming customers are at the forefront, but we see all ISVs getting into this.  There is so much we can learn from software developers.  What is especially exciting is seeing how other physical sensors are being used in everything from automobiles to jet engines and even refrigerators to deliver the same type of feedback.  There is no question, the sensor economy is upon us.  In the end, this will lead to better products and services for you and me, the consumer, which is a good thing.

When Customers Buy you Beer you are on to Something

A few weeks ago, Shilpa, our VP of engineering was in New York City visiting prospective customers. While there, she also had an informal meetup with new and existing customers. One of our new customers liked Vertica so much that he literally handed Shilpa money to buy the Vertica Engineering team beer.

So, she did what all good managers do – delegate the acquisition to Sumeet. Thanks to his efforts we had a very special addition to one of our recent engineering lunches.

Nick, cheers from the entire engineering team! Thank you for your gift – we will all keep working hard to ensure your experience with Vertica continues to be a pleasure.

 

Vertica Lunch

If you are intrigued, don’t take my anecdotal customer stories for why Vertica is great – try it yourself with the Vertica Community Edition.

P.S. If you are interested in working somewhere customers like your product so much they send you tasty beverages, we are hiring in all areas. Within engineering specifically we are looking for hackers from the lowest level depths of the database server, up through the client interfaces, the management console and third party integration programs. Consider coming in to talk with us: marcia.langdon@hp.com.

Avro parser UDx – Using Apache Avro to enable easier data transfer from Hadoop to Vertica

After careful research and brainstorming of different ideas for the intern UDx competition we decided to implement an Avro parser UDx. Our team, “The Avro-rian Revolutionaries” wanted to implement something useful, ready to use, and is in the top-3 wish list of customers. And what better than an Avro parser which would help users to easily transfer data from Hadoop to Vertica!. (This Avro parser UDx package is now available on github [6] and Vertica users are encouraged to try it out!)

Apache Avro [1] is a data serialization format widely used in Hadoop world. It is a new data serialization format which succeeds Thrift [2] and Protocol Buffers [3]. According to some technologists, Avro is the best data serialization framework out there [4]. This was good motivation for us to implement an Avro parser for the intern competition, hoping to make importing Avro data into Vertica, feasible.

Figure 1. Hadoop, Avro, Avro UDx and Vertica workflow

With this motivation, we began our day 1 of the 5 day intern competition. The first milestone was to get the standalone Avro parser to work. This basic, standalone parser (still no Vertica in picture) which will just read an Avro file and print out the header and data in text format. The Avro API’s were our means to do it and by referring the basic documentation [5] we quickly came up with a parser which could dump out the contents of a sample Avro file in text format as in Figure 2.

Figure 2: weather.avro sample file in text format.

We spent day 2 of the competition learning the Vertica SDK, the next tool of trade.
There were some great examples already out there on github. We picked a simple example UDx and began using and playing with it. Once we got our hands on loading, testing, and running this UDx we started learning the required SDK interfaces for loading the data into Vertica. One important interface was called UDParser which parses a stream of bytes parallelly into Vertica. Very quickly we were able to use this and develop an UDx skeleton, ready to get integrated into the module developed on day 1.

On day 3, midway through the competition we had the most important milestone to achieve. The task was to integrate our standalone Avro parser developed on day 1 with a parser UDx skeleton developed on day 2. And this was point where we got stuck and had an unexpected setback. After talking to our mentors we discovered that there is an interface gap between Avro file reader api and Vertica UDParser interface. To fill this gap we developed a couple of modules called CRReader and CRStream which successfully addressed the issue.

Day 4, we began integrating the modules, and finally the moment of judgement arrived. This was the moment when we ran our first test of loading a weather.avro file into vertica, which exercised most of the code we wrote. And we did not have to hold our breath long. Within a fraction of a second the data was loaded into Vertica. We really couldn’t believe our eyes that all the 3 pieces of modules we wrote in 3 days are working like parts of an engine. The magic of UDx was happening! and the Avro file was successfully loaded into Vertica. (Figure 3)

Figure 3: Demo screenshot

On day 5, the last day of the competition, we spent all our efforts in testing and packaging the UDx. We wanted to have a quality product which will be ready to use by the customer by the end of competition.

Finally we presented our work with other interns in front of a fully packed room with audience from all departments of Vertica. This was a unique experience by itself because we had to present the work in the most appealing format for audience of different perspective apart from the technical dimension. End of the day we were happy that we learnt lots of new things, collaborated with senior mentors and received great response feedback and comments for our work which made the competition a great success! And now when looking at our UDx parser available on github[6] and ready to use, it gives us great satisfaction of achieving of our first step of getting one step closer to the Avro-rian revolution!

References:
[1] http://avro.apache.org/docs/1.7.1/
[2] http://wiki.apache.org/thrift/FrontPage
[3] http://code.google.com/p/protobuf/
[4] http://www.cloudera.com/blog/2011/05/three-reasons-why-apache-avro-data-serialization-is-a-good-choice-for-openrtb/
[5] http://avro.apache.org/docs/1.6.1/api/cpp/html/index.html
[6] https://github.com/vertica/Vertica-Extension-Packages/tree/master/avro_parser

The First Boston Area Vertica User Group

We recently held the first Boston area Vertica meetup / user group and it was a huge success! The crowd consisted of a few members from Vertica, representatives from 7 area Vertica customers, a Vertica Partner, Vertica consultants/experts, and also a few (hopefully) future Vertica users!  For the first hour it was all about Vertica users meeting each other for the first time and learning about how each of them use the platform, why they use it, what they like about it, tips and tricks etc…it was pretty cool to take a back seat and listen to them talk about our database!

We had a few speakers,  up first was Syncsort a Vertica partner on the ETL side.  They spoke about how Vertica connects to Syncsort and the benefits of using it with Vertica’s database.  Then up next we had Compete speak about how they use Vertica and the benefits that it brings to their business.

Seth from Compete talking about the fastest database in the world!

We had 2 special guests Colin Mahony, Vertica’s CEO and also Shilpa Lawande, Vertica’s VP of Engineering say a few things and also answer questions from the crowd!  I thought it was awesome for not only our customers to meet them but for Colin and Shilpa to meet with the Vertica community as well!

That was the play by play for the first Boston Vertica User Meetup Group.  It was a success I am hoping that we grow this group with more and more Vertica enthusiasts! Special thanks to Compete for hosting the 1st event. If you have not already, make sure you sign up right here and look for the new Meetup to be announced soon! Don’t miss out!

Cardlytics Powers More Ads with HP Vertica

Cardlytics is the pioneer and leader of the cutting edge field of Transaction-Driven MarketingTM, expected to grow into a multi-billion industry in the US over the next four years (according to the Aite Group). The company’s unique advertising platform enables banks to deliver rich, relevant rewards to its customers based on purchasing history while fully protecting their privacy. Since its founding in 2008, Cardlytics has been a leader of innovation with recent accolades including being named one of the 100 most innovative private companies in the world by Red Herring.

In this short video, Jon Wren, Director of Data Innovation for Cardlytics, discusses how Cardlytics uses HP Vertica to enable better personalization for the ads they serve to online banking customers.  In addition, Jon talks about gaining so much time with HP Vertica that his analysts can finally do what they were hired to do – analyze data – instead of wrangling data.

With HP Vertica, Cardlytics believes they have the power to scale Cardlytics to handle any large banking customers without having to worry about the database.

Get Started With Vertica Today

Subscribe to Vertica