by Adam Seering
A couple weeks ago, Vertica moved into our cool new space in Cambridge, MA. I’d been eagerly awaiting the commute; a scenic bike ride along the Charles River. I didn’t expect nature to follow me all the way up to the 9th floor, though. So imagine my excitement when, on our first day, a magnificent red-tailed hawk swoops down and lands right on the window ledge! We’re at the top of the tallest building around, so I guess the ledge works well as a cliff. The hawk has become a regular visitor, and the unofficial Vertica mascot. After all, it’s “Simply Fast”. As much as we’d like to sit there waiting for it to show up all day, we all do have work to get done. So I figured – why not put a webcam next to the window to analyze its habits and and be alerted when it shows up.
And with Vertica 6, that’s easy! As it turns out, I’ve been working with the Vertica SDK team on User-Defined Load (UDL), a collection of SDK APIs that lets you ingest data in any arbitrary format from any arbitrary source. Loading files and tables is pretty dull, so instead, I’ll use Vertica to track our mascot! In this blog, I’d like to show you how I did it.
The first step is to get Vertica to read data not from a file, but from a webcam. For this, we will need a User-Defined Source. UDSources are a part of the UDL SDK. As far as Vertica is concerned, all they do is emit data. They could get that data from anywhere: A file, a remote server, a webcam (!), an API call into a special service; they could even make the data up.
We could write a custom User-Defined Source in C++, that linked in OpenCV or a similar imaging library and interacted with the webcam directly. But that takes work. The command-line program “streamer” already provides a simple way to grab a snapshot from a webcam. Our Github repository contains an example UDSource that runs a program and passes its output to Vertica. We can install it as follows:
$ git clone https://github.com/vertica/Vertica-Extension-Packages.git
$ cd Vertica-Extension-Packages/shell_load_package
$ make install
Now we have a UDSource that can load our data. So we need a table to load it into. At a Vertica prompt:
CREATE TABLE frames (frame_name varchar(64), X int, Y int, R int, G int, B int)
ORDER BY frame_name, X, Y;
Note the “ORDER BY” clause. This is a Vertica extension that tells Vertica how to keep the data sorted. This is the order that the data will come sorted in; we might as well keep it that way so that Vertica can search through it faster.
Now, let’s take a quick look at one line of the data that our command outputs:
0,0: ( 29, 19, 17) #1D1311 rgb(29,19,17)
It’s a little bit unusual, but Vertica’s built-in Delimited parser can handle it. So, we can go ahead and run the following statement to load the data:
>>> COPY frames (frame_name AS ‘frame 1′,
X delimiter ‘,’,
Y delimiter ‘:’,
blank_space filler varchar(10) delimiter ‘(‘,
R delimiter ‘,’,
G delimiter ‘,’,
B delimiter ‘)’,
rest_of_line filler varchar(50))
WITH SOURCE ExternalSource(cmd=’streamer -f jpeg -o /dev/stdout | convert – -compress none txt:-‘);
Now we have our hawk in Vertica! So we can do some analytics. For example, we can run a simple query to get the brightness of each frame that I’ve loaded:
>>> SELECT frame_name, avg((R+G+B)/3) AS brightness FROM frames GROUP BY frame_name;
And if I have a frame where the image suddenly gets darker, I know the bird has flown in.
If I want, I could easily do analytics on a window of the data; look for images where one part of the image changes more than others; even write a UDF that does proper object recognition. Pick out if the hawk is eyeing its prey or snacking on a rodent (which it has done more than once). But this turns out to be good enough for us here in Cambridge!
Do you have an idea for a UDSource? Have your own local hawk, or any other project you might be working on? Download Vertica’s Community Edition and try our UDSource, or create your own and post it to Github. (We take pull requests!)