All of the interns at Vertica have found interesting and useful projects to work on during their stay. The focus of my efforts has been the Data Quest project.
At Vertica, an important goal is to keep customers happy. In order to do this effectively it’s necessary to know what it is a customer wants even if they aren’t sure themselves. A potential source of this information comes in the form of diagnostic data which can tell us how Vertica is actually used in the wild. Internally, the Vertica Analytics Platform tracks a wealth of information about its state and usage in over 300 system tables. If something goes wrong, customers have the option of creating a diagnostics data dump file of the current states of these system tables so that this information can be used to resolve problems and improve future experiences. As of this writing there is over 515 dump files available having over 100Gb of compressed data.
The Data Quest goal has been to take all this Vertica meta-data, pump it into a database and get cracking on the analysis. Unfortunately, getting this data ready for analysis is a nontrivial task. A number of hurdles had to be crossed to get the data ready for loading an analysis.