Vertica

Archive for the ‘interns’ Category

HP Vertica Intern Program – No Coffee Runs Required

“No one asked me to fetch coffee for them.” —HP Vertica 2013 summer intern

Does anyone really ask an intern to go for coffee these days? Lucky for the intern I mentored this summer, I don’t drink coffee.

When I arrived at HP Vertica in July 2012, one of the first things I noticed was the great summer intern program. The interns were fully integrated into life at HP Vertica, developing features for the product, participating in design discussions, and giving presentations about their work. As a group, they had a great camaraderie, working hard while having fun.

When the opportunity presented itself, I volunteered to be a mentor for the summer of 2013. The documentation team has an intern every summer, and luckily for us, last year’s summer intern is now a permanent member of our team. I was helping out the team by volunteering….but really I wanted to join in the fun that the interns and mentors have.

Adam Seering did a great job organizing the intern program this summer, managing the recruitment of mentors and the development interns, and letting us know what was expected of us mentors: good communication, meaningful work assignments, and a positive attitude. We want HP Vertica interns to go back to their universities and tell people what a great workplace we have. Some of our best hires have been former interns and friends of former interns!

Of course, we want our interns to work hard, but with this summer’s interns hailing from Georgia, California, North Carolina, Pennsylvania, New York, and, of course, Massachusetts, we really wanted them to experience Boston. Our first outing was a trip to see the movie Star Trek Into Darkness. As the summer progressed, we climbed the Great Blue Hill in Milton, visited the MIT Museum, hosted an ice cream party on one of the hottest summer days, went horseback riding, and celebrated the summer with a lakeside party at Shilpa’s house. Most of the interns rode on inflated tubes behind a power boat on Forge Pond, with CSE’s Alex Jackson steering the boat in an effort to knock them off!

Oh, and the interns did some real work as well. They worked on interesting projects that are important to the company, and they shared their work with the rest of the organization. The most common theme I heard was, “Everyone at HP Vertica is happy to take the time to answer my questions.” Near the end of the summer, Sumeet Keswani organized a coding competition to see who could design the coolest app for HP Vertica’s AMI. First prize was a toy light saber, which Tech Support director Amy Miller had to try out before passing it on to the winner, Wei, an RPI grad student who developed a cool app called “Tweetica,” which collected and analyzed tweets. (Did you know that the most trending topic in the Boston area is the Red Sox? If not, you should have guessed that!)

In August, Boston was the site of HP Vertica’s first annual Big Data Conference. The marketing interns focused on this event—planning, recruiting volunteers, registering attendees, handling communications, talking to customers at the event, and in general, making sure everything ran smoothly. Their contributions were a big reason why the conference was so successful.

Vicki, our documentation intern, arrived from southern California, to daily rain followed by two weeks of heat and humidity. What she learned about Boston weather: “I have to bring an umbrella every day.” In addition to writing traditional documentation, she produced a great video about the intern program itself, which you can see for yourself below. She had a great time filming and editing it, and the interns enjoyed participating.

We will start recruiting for the 2014 summer interns this fall. Keep an eye out at www.vertica.com/careers.

InternVideo2013 from Vertica Systems on Vimeo.

Hiking with Interns!

HP Vertica’s Summer Intern program is up to 9 interns this year. HP Vertica has experienced tremendous growth recently with 76 new employees hired this year alone. Several of the interns are working on projects to be released with the next version of Vertica, while others are helping with the organization and marketing of the first ever HP Vertica Big Data Conference from August 5th-7th.

Sunday June 23rd was the second annual intern hiking trip. This year’s HP Vertica summer interns, their mentors, managers and some family members ventured out to Blue Hills Reservation, a Massachusetts State Park in Norfolk County. With bags of trail mix, water bottles and plenty of sunscreen, the group trekked up a mile to the peak for a beautiful view point.

The group finished off a fun, but hot, day with food at the Hillside Pub in Canton. Individuals from various teams across HP Vertica had the chance to catch up, share what they are working on and enjoy each other’s company!

Avro parser UDx – Using Apache Avro to enable easier data transfer from Hadoop to Vertica

After careful research and brainstorming of different ideas for the intern UDx competition we decided to implement an Avro parser UDx. Our team, “The Avro-rian Revolutionaries” wanted to implement something useful, ready to use, and is in the top-3 wish list of customers. And what better than an Avro parser which would help users to easily transfer data from Hadoop to Vertica!. (This Avro parser UDx package is now available on github [6] and Vertica users are encouraged to try it out!)

Apache Avro [1] is a data serialization format widely used in Hadoop world. It is a new data serialization format which succeeds Thrift [2] and Protocol Buffers [3]. According to some technologists, Avro is the best data serialization framework out there [4]. This was good motivation for us to implement an Avro parser for the intern competition, hoping to make importing Avro data into Vertica, feasible.

Figure 1. Hadoop, Avro, Avro UDx and Vertica workflow

With this motivation, we began our day 1 of the 5 day intern competition. The first milestone was to get the standalone Avro parser to work. This basic, standalone parser (still no Vertica in picture) which will just read an Avro file and print out the header and data in text format. The Avro API’s were our means to do it and by referring the basic documentation [5] we quickly came up with a parser which could dump out the contents of a sample Avro file in text format as in Figure 2.

Figure 2: weather.avro sample file in text format.

We spent day 2 of the competition learning the Vertica SDK, the next tool of trade.
There were some great examples already out there on github. We picked a simple example UDx and began using and playing with it. Once we got our hands on loading, testing, and running this UDx we started learning the required SDK interfaces for loading the data into Vertica. One important interface was called UDParser which parses a stream of bytes parallelly into Vertica. Very quickly we were able to use this and develop an UDx skeleton, ready to get integrated into the module developed on day 1.

On day 3, midway through the competition we had the most important milestone to achieve. The task was to integrate our standalone Avro parser developed on day 1 with a parser UDx skeleton developed on day 2. And this was point where we got stuck and had an unexpected setback. After talking to our mentors we discovered that there is an interface gap between Avro file reader api and Vertica UDParser interface. To fill this gap we developed a couple of modules called CRReader and CRStream which successfully addressed the issue.

Day 4, we began integrating the modules, and finally the moment of judgement arrived. This was the moment when we ran our first test of loading a weather.avro file into vertica, which exercised most of the code we wrote. And we did not have to hold our breath long. Within a fraction of a second the data was loaded into Vertica. We really couldn’t believe our eyes that all the 3 pieces of modules we wrote in 3 days are working like parts of an engine. The magic of UDx was happening! and the Avro file was successfully loaded into Vertica. (Figure 3)

Figure 3: Demo screenshot

On day 5, the last day of the competition, we spent all our efforts in testing and packaging the UDx. We wanted to have a quality product which will be ready to use by the customer by the end of competition.

Finally we presented our work with other interns in front of a fully packed room with audience from all departments of Vertica. This was a unique experience by itself because we had to present the work in the most appealing format for audience of different perspective apart from the technical dimension. End of the day we were happy that we learnt lots of new things, collaborated with senior mentors and received great response feedback and comments for our work which made the competition a great success! And now when looking at our UDx parser available on github[6] and ready to use, it gives us great satisfaction of achieving of our first step of getting one step closer to the Avro-rian revolution!

References:
[1] http://avro.apache.org/docs/1.7.1/
[2] http://wiki.apache.org/thrift/FrontPage
[3] http://code.google.com/p/protobuf/
[4] http://www.cloudera.com/blog/2011/05/three-reasons-why-apache-avro-data-serialization-is-a-good-choice-for-openrtb/
[5] http://avro.apache.org/docs/1.6.1/api/cpp/html/index.html
[6] https://github.com/vertica/Vertica-Extension-Packages/tree/master/avro_parser

Big Interns For Big Data

“[My wife] won’t let me talk about work anymore.” — Intern overheard talking at lunch

Quotes can be forged, but casual lunchtime conversation tends to be very candid. Indeed, it’s the indirect signals that mean to the most to me as I coordinate the intern program for the second year.

Another intern expressed delighted surprise at how well the interns are integrated into our teams. I take serious pride in this trait of the Vertica Summer Intern Program, as we ensure our interns each have at least one personal mentor and project that matters to us and to them. With eight interns this year, we have them doing everything from releasing features to customers and researching ways to improve performance to analyzing Vertica usage patterns and improving our testing framework. Our interns represented some geographic diversity, hailing from MIT, UVa, UMass Amherst, University of Houston, Brown and Purdue. The program has doubled and with good reason – in the last six months, two of our interns from 2011 have started full-time, as did a fellow co-intern of mine from back in 2009.

We encourage our interns to work their 40 hours and then enjoy Boston. Still, during the week-long Intern User-Defined-Function Contest, one of the eventual winners told me at 10 PM he wanted to skip school and come work at Vertica, while another pair of interns extended their internships. Though all will be returning to school in the fall, I’m thrilled that we can inspire the interns this deeply and grateful to all my coworkers who helped choose them from the candidate pool.

Vertica intern party

Annual intern party at Shilpa's, complete with (brand new) traditions of single-ski water skiing and watermelon carving. Photos taken by Ramachandra CN

But it’s not all work at Vertica. Along with individual lunches with Vertica’s top-brass, we managed hiking trips, poker nights (intern-organized!), creative four-player bocce matches, horse riding, and water skiing. Trust Vertica interns to even take the weekly Counter Strike game and turn it into a data-collection event, loading in-game kill locations into a Vertica database. I leave you with a level heatmap produced by our interns’ very own Vertica User-Defined-Function.

Heat Map

Here we see the deadliest locations of the Counter Strike map Italy. Though the concentration of carnage while attempting to rescue the hostages in the upper left is unsurprising, we can also understand how dangerous each of the access paths to the hostages are. For the same contest, the other interns created an AVRO parser, a JSON parser, and an automatic email-sending function for their contest entries. Heat map from Mark Fay and Matt Fay

We’ll be keeping in touch with this year’s crop of interns as they finish here and return to their respective academic programs. Many people have helped with the intern program this year, but I feel Adam Seering deserves special mention for all his work in making this summer a success. I also appreciate the support our coworkers have given the intern program from 1-on-1 help to attending the intern presentations in numbers.

Thank you Vertica 2012 interns for all your hard work this summer. You have no idea how much positive feedback I’ve heard about you all!

Our interns ending a successful summer by riding off into the sunset. Literally.

Our interns ending a successful summer by riding off into the sunset. Literally. Photo taken by Ramachandra CN.

Back to School with Big Data

In case you missed it, we’ve spent much of this past week featuring some of the awesome projects our brilliant interns worked on this summer at Vertica. Projects ranging from improving video game design to building an application for sorting through the ever growing emails in our inboxes. These projects are great examples of how students are training to solve and pose new problems using big data and real-time analytics - thank you Matt & Mark Fay and Alex Leblang for your excellent posts.

As we grow Vertica by adding top notch talent we are excited to see how universities continue to innovate to train the workforce we depend on. For example, Stanford University is offering introductory courses on line for intro to databases, natural language processing, machine learning. We are also seeing an increased interest from universities in updating their curriculum to include classes for analytical thinking to help train leaders to ask better questions of the newly available data. Other universities are leading by example, implementing Big Data solutions of their own to improve retention, and graduation rates with sophisticated data analytics.

The website Online Degrees.org recently posted an infographic that provides a great summary of how Big Data is impacting campus life and the college experience, which we’ve posted below. There’s no doubt that Big Data is having a tremendous impact on the college experience – and that our college interns have had a hugely important impact on us here at Vertica. We wish them – and all – students well in the academic year ahead, and if you’re interested in learning more about Vertica’s internship programs or current job opportunities we would love to hear from you!

To see the infographic, click the “Read More >>” link below!

(more…)

Data Quest – Sifting through terabytes of diagnostic system data

All of the interns at Vertica have found interesting and useful projects to work on during their stay. The focus of my efforts has been the Data Quest project.

At Vertica, an important goal is to keep customers happy. In order to do this effectively it’s necessary to know what it is a customer wants even if they aren’t sure themselves. A potential source of this information comes in the form of diagnostic data which can tell us how Vertica is actually used in the wild. Internally, the Vertica Analytics Platform tracks a wealth of information about its state and usage in over 300 system tables. If something goes wrong, customers have the option of creating a diagnostics data dump file of the current states of these system tables so that this information can be used to resolve problems and improve future experiences. As of this writing there is over 515 dump files available having over 100Gb of compressed data.

The Data Quest goal has been to take all this Vertica meta-data, pump it into a database and get cracking on the analysis. Unfortunately, getting this data ready for analysis is a nontrivial task. A number of hurdles had to be crossed to get the data ready for loading an analysis.

(more…)

Fun with Vertica UDX

During a week in July, Vertica held a competition for all of us, the 2012 summer interns. We had a week to write a user defined function for Vertica. We could work individually or in teams of two, and the function could do whatever we wanted it to do. We were, however, told that we would be judged by a range of people from across the company on criteria that was never quite explained to us.

Like most people, and probably many Vertica customers, I had never written a UDX for Vertica before. I was already in the middle of my normal summer work, and did not necessarily relish the idea of putting off my other tasks, which I was enjoying, to learn a new skill that seemed very specific. I was still not sure what exactly a UDX was, but writing one sounded difficult and unpleasant.

So, being the natural procrastinator that I am, I put the competition off. Unfortunately for me, this summer Vertica has an odd number of engineering interns and my fellow interns did not seem as daunted as I was. In the time that I spent pretending that the competition was not actually occurring, they had already divided into teams and started working on their ideas. The final presentation was a mere couple days away, and I needed an idea quickly. Because I also was now apparently working alone, I needed that idea to be easy to implement.

(more…)

Get Started With Vertica Today

Subscribe to Vertica