On the Trail of a Red-Tailed Hawk! – Part 2

Part 2:  Why User-Defined-Loads is music to my ears!

Webcams and hawks – really?  I heard a healthy dose of skepticism expressed about what User-defined Loads can accomplish.  So how about I show something a bit more practical?

I’m a fan of music of all flavors. Among the various devices I’ve had connected to my stereo at home are an LP player and an open-reel tape deck.  While it’s fun to get out the old vinyl records from time to time, they’re a bit cumbersome and don’t travel very well; more often I just want to pull up iTunes and hit “Play”.  So if I have a recording that I particularly enjoy, I try to digitize it.  But it is a huge pain to split it up into tracks.  I thought – why not automate it? You guessed right – with UDL, of course!

Vertica’s not much for music; natively it couldn’t tell the difference between The Beach Boys and Van Halen.  In fact, it can’t make heads or tails of most any music format.  So I need one (or both) of Vertica’s two other UDL tools.  One is a User-Defined Parser:  I could write a parser that directly parsed the WAV format, for example.  This would give me the best performance, but would take some time to code.

The other is a User-Defined Filter:  A Filter takes in data and processes it and emits it in a new form.  They’re intended for things like decompression and transcoding.  It’d be nice to support FLAC files and the like (c’mon, you can’t use lossy MP3 files for this!; this is the good stuff!), and my computer has more free time than I do, so a simple Filter seems like the way to go.

We conveniently have an example external-Filter example on Github as well; it opens up an external process and passes data through it via pipes.  So let’s go install it:

$ cd Vertica-Extension-Packages/shell_load_package
$ make install

Now, let’s once again make a table and try loading some data:

>>> CREATE TABLE music (filename varchar(64), time_index float, L float, R float)
     ORDER BY filename, time_index;
>>> COPY music (filename AS ‘Sample’, time_index, L, R)
     FROM ‘/home/aseering/Music/Recordings/Sample.flac’
     FIXEDWIDTH COLSIZES (17, 17, 17)
     WITH FILTER ExternalFilter(cmd=’sox –type flac – –type dat -‘);

The FIXEDWIDTH parser takes advantage of the format of the ‘.dat’ file:  It’s three floating-point columns; each floating-point number can be up to 16 characters wide (and is padded with spaces to that length if it’s smaller), and there’s one extra leading space as a separator.

Of course, using ExampleSource, I can load directly from my audio input as well, assuming a properly-configured sound system on the computer:

>>> COPY music (filename AS ‘Sample’, time_index, data filler int, L AS data, R AS data)
     WITH SOURCE ExternalSource(cmd=’arecord -d 10′)
     FILTER ExternalFilter(cmd=’sox –type wav – –type dat -‘);

Once I’ve loaded the data, I have to split up tracks.  On recordings of old records, the only real way I know to split tracks is to look for areas where the audio is quiet for a while.  Vertica can do this with window analytics.  The following is a good start, though there is of course a ton of flexibility if you want to tune the algorithm:

>>> SELECT sqrt(avg(L*L + R*R)
         over (order by time_index rows between 100 preceding and 100 following))
     AS avg_volume
     FROM music WHERE filename = ‘Sample';

And there we go!  Now I can enjoy my Carpenters on the move.

Most people think of databases as tools for managing payroll or user accounts, or diving through sales records.  Vertica can do all of that.  But Vertica 6 is so much cooler than that:  It gives you a platform for running your own custom code in a robust, distributed way.  You can mix-and-match UDLs (and all the other UDx types) however you want; develop libraries of UDL tools; even share those tools on Github if you want, or download tools that others have shared.  (We take pull requests!)  You can also use UDLs with external tables to provide a real queryable table that’s backed by any data store that you want, so you can run any task through Vertica’s parallel execution engine.

Are you convinced now? Whether you are or not, I’ve spend the past few hours digitizing several of my favorite LPs.  Now I can sit by my window with my headphones on listening to music and working away until the hawk shows up. Life is good!

Get Started With Vertica Today

Subscribe to Vertica