Archive for September, 2012

Back to School with Big Data

In case you missed it, we’ve spent much of this past week featuring some of the awesome projects our brilliant interns worked on this summer at Vertica. Projects ranging from improving video game design to building an application for sorting through the ever growing emails in our inboxes. These projects are great examples of how students are training to solve and pose new problems using big data and real-time analytics – thank you Matt & Mark Fay and Alex Leblang for your excellent posts.

As we grow Vertica by adding top notch talent we are excited to see how universities continue to innovate to train the workforce we depend on. For example, Stanford University is offering introductory courses on line for intro to databases, natural language processing, machine learning. We are also seeing an increased interest from universities in updating their curriculum to include classes for analytical thinking to help train leaders to ask better questions of the newly available data. Other universities are leading by example, implementing Big Data solutions of their own to improve retention, and graduation rates with sophisticated data analytics.

The website Online recently posted an infographic that provides a great summary of how Big Data is impacting campus life and the college experience, which we’ve posted below. There’s no doubt that Big Data is having a tremendous impact on the college experience – and that our college interns have had a hugely important impact on us here at Vertica. We wish them – and all – students well in the academic year ahead, and if you’re interested in learning more about Vertica’s internship programs or current job opportunities we would love to hear from you!

To see the infographic, click the “Read More >>” link below!


Data Quest – Sifting through terabytes of diagnostic system data

All of the interns at Vertica have found interesting and useful projects to work on during their stay. The focus of my efforts has been the Data Quest project.

At Vertica, an important goal is to keep customers happy. In order to do this effectively it’s necessary to know what it is a customer wants even if they aren’t sure themselves. A potential source of this information comes in the form of diagnostic data which can tell us how Vertica is actually used in the wild. Internally, the Vertica Analytics Platform tracks a wealth of information about its state and usage in over 300 system tables. If something goes wrong, customers have the option of creating a diagnostics data dump file of the current states of these system tables so that this information can be used to resolve problems and improve future experiences. As of this writing there is over 515 dump files available having over 100Gb of compressed data.

The Data Quest goal has been to take all this Vertica meta-data, pump it into a database and get cracking on the analysis. Unfortunately, getting this data ready for analysis is a nontrivial task. A number of hurdles had to be crossed to get the data ready for loading an analysis.


Fun with Vertica UDX

During a week in July, Vertica held a competition for all of us, the 2012 summer interns. We had a week to write a user defined function for Vertica. We could work individually or in teams of two, and the function could do whatever we wanted it to do. We were, however, told that we would be judged by a range of people from across the company on criteria that was never quite explained to us.

Like most people, and probably many Vertica customers, I had never written a UDX for Vertica before. I was already in the middle of my normal summer work, and did not necessarily relish the idea of putting off my other tasks, which I was enjoying, to learn a new skill that seemed very specific. I was still not sure what exactly a UDX was, but writing one sounded difficult and unpleasant.

So, being the natural procrastinator that I am, I put the competition off. Unfortunately for me, this summer Vertica has an odd number of engineering interns and my fellow interns did not seem as daunted as I was. In the time that I spent pretending that the competition was not actually occurring, they had already divided into teams and started working on their ideas. The final presentation was a mere couple days away, and I needed an idea quickly. Because I also was now apparently working alone, I needed that idea to be easy to implement.


Heating up Data in Vertica

A couple of weeks ago Vertica saw the interns teaming up and working on their own UDx projects. These projects were then presented under the eyes of judges who scored each group on various criteria. One hot project was our Heat Map UDT. After some additional work, the Heat Map is becoming a promising addition to the Vertica UDx family!  Let’s get this show on the road!

Columns of data in need of some quick and dirty analysis?  Enter Vertica and the Heat Map Transform, a highly extensible and parameterized analysis tool.

Imagine you had a client who came to you for advice on how to improve their popular first person shooter.  After some talking you come to the conclusion that the maps may not be well balanced and players simply die too often.  If only there was some way to keep them alive longer, and still have an exciting and fast-paced level!  A good first step might be to figure out where the dangerous zones are in the maps so that the client may figure out where the maps may need balance changes.  Perhaps by balancing out the map, players will start to utilize the level in its entirety and allow for more tactical, strategic play.  Conveniently, the client has been logging all of the death and killing locations into his Vertica database. Now how to quickly process this information to get some useful visual results?  Easy! Let’s try out the Heat Map UDT from the Vertica Extension Packages GitHub repository.


Get Started With Vertica Today

Subscribe to Vertica