Vertica

Archive for June, 2012

On the Trail of a Red-Tailed Hawk! – Part 1

by Adam Seering

A couple weeks ago, Vertica moved into our cool new space in Cambridge, MA.  I’d been eagerly awaiting the commute; a scenic bike ride along the Charles River.  I didn’t expect nature to follow me all the way up to the 9th floor, though.  So imagine my excitement when, on our first day, a magnificent red-tailed hawk swoops down and lands right on the window ledge!  We’re at the top of the tallest building around, so I guess the ledge works well as a cliff.  The hawk has become a regular visitor, and the unofficial Vertica mascot.  After all, it’s “Simply Fast”.   As much as we’d like to sit there waiting for it to show up all day, we all do have work to get done.  So I figured – why not put a webcam next to the window to analyze its habits and and be alerted when it shows up.

And with Vertica 6, that’s easy!  As it turns out, I’ve been working with the Vertica SDK team on User-Defined Load (UDL), a collection of SDK APIs that lets you ingest data in any arbitrary format from any arbitrary source.  Loading files and tables is pretty dull, so instead, I’ll use Vertica to track our mascot!  In this blog, I’d like to show you how I did it.

The first step is to get Vertica to read data not from a file, but from a webcam.  For this, we will need a User-Defined Source.  UDSources are a part of the UDL SDK.  As far as Vertica is concerned, all they do is emit data.  They could get that data from anywhere:  A file, a remote server, a webcam (!), an API call into a special service; they could even make the data up.

We could write a custom User-Defined Source in C++, that linked in OpenCV or a similar imaging library and interacted with the webcam directly.  But that takes work.  The command-line program “streamer” already provides a simple way to grab a snapshot from a webcam.  Our Github repository contains an example UDSource that runs a program and passes its output to Vertica.  We can install it as follows:

$ git clone https://github.com/vertica/Vertica-Extension-Packages.git
$ cd Vertica-Extension-Packages/shell_load_package

$ make install

Now we have a UDSource that can load our data.  So we need a table to load it into.  At a Vertica prompt:

CREATE TABLE frames (frame_name varchar(64), X int, Y int, R int, G int, B int)
             ORDER BY frame_name, X, Y;

Note the “ORDER BY” clause.  This is a Vertica extension that tells Vertica how to keep the data sorted.  This is the order that the data will come sorted in; we might as well keep it that way so that Vertica can search through it faster.

Now, let’s take a quick look at one line of the data that our command outputs:

0,0: ( 29, 19, 17)  #1D1311  rgb(29,19,17)

It’s a little bit unusual, but Vertica’s built-in Delimited parser can handle it.  So, we can go ahead and run the following statement to load the data:

>>> COPY frames (frame_name AS ‘frame 1′,
                X delimiter ‘,’,
                Y delimiter ‘:’,
                blank_space filler varchar(10) delimiter ‘(‘,
                R delimiter ‘,’,
                G delimiter ‘,’,
                B delimiter ‘)’,
                rest_of_line filler varchar(50))
WITH SOURCE ExternalSource(cmd=’streamer -f jpeg -o /dev/stdout | convert – -compress none txt:-‘);

Now we have our hawk in Vertica!  So we can do some analytics.  For example, we can run a simple query to get the brightness of each frame that I’ve loaded:

>>> SELECT frame_name, avg((R+G+B)/3) AS brightness FROM frames GROUP BY frame_name;

And if I have a frame where the image suddenly gets darker, I know the bird has flown in.

If I want, I could easily do analytics on a window of the data; look for images where one part of the image changes more than others; even write a UDF that does proper object recognition.  Pick out if the hawk is eyeing its prey or snacking on a rodent (which it has done more than once).  But this turns out to be good enough for us here in Cambridge!

Do you have an idea for a UDSource?  Have your own local hawk, or any other project you might be working on?  Download Vertica’s Community Edition and try our UDSource, or create your own and post it to Github.  (We take pull requests!)

Happy Hawking!


Introducing Vertica 6

by Shilpa Lawande & Luis Maldonado

Big data is all the rage these days.  It is incredible to watch this segment mature- even in just the past few years.  Of course, Vertica continues to evolve and innovate to keep up with the challenges and demands of this movement, especially when it comes to analytics.  The three greatest challenges customers face with regards to Big Data are 1) the volume and velocity of data, 2) the rapidly growing variety of disparate sources of data, and 3) the complex analytics that are required for maximizing the value of information derived from data.  In Vertica 6, we address all of these and more!

It has been just over two and a half years since Vertica first released FlexStore™, our patented architecture for flexible storage and placement of data based on usage patterns. During this time, we have been hard at work to evolve our flagship database product from a blazingly  fast database into a rich and flexible analytic platform.  Our high performance MPP columnar engine combined with our suite of built-in analytic functions including time-series, event-series pattern matching, C++ SDK, and more have enabled over 600 customers to transform their businesses by harnessing the power of large scale real-time analytics.  Last week at HP Discover, we raised the bar once again, with the release of Vertica 6, including the industry’s first and only open analytics architecture that enables broad analytics over any data source, structured, semi-structured or unstructured.

Let’s review some of the highlights of Vertica 6.

  • Vertica 6 FlexStore™ has been expanded to allow access to any data, stored at any location, via any interface, providing a comprehensive architecture to handle the requirements of Big Data – open, extensible, and flexible.  Using our patent-pending User-Defined Load (UDL) feature, you can now connect natively to popular storage systems like Hadoop File System (HDFS), existing databases and data warehouses, as well as unstructured analysis platforms such as HP/Autonomy IDOL™.  Further, by combining our External Tables capability with UDL, you can access those data sources in a federated manner at query time, without pre- loading data into Vertica. Now, whether you are just exploring your data in HDFS to find the right data model or building a production application, you can use the same robust Vertica SQL analytics you know and love, and use your favorite reporting and visualization tools to slice and dice the data!
  • Vertica 6 includes high performance and massively scalable in-database analytics for the R statistical tool.  Data scientists and analysts can now run their favorite statistical algorithms written in R natively and in parallel on Vertica, without the in-memory and single-threading limitations of R!  We’ve also expanded our C++ SDK to add secure sandboxing of user-defined code, and additional APIs for user-defined aggregates, analytics and multi-phase transform functions that can enable Map-Reduce style computations to be done in Vertica.
  • Vertica 6 simplifies the out-of-the-box user experience and enables more diverse workloads with our improvements to Workload Management.  An early adopter of Vertica 6 experienced a 40x speed up on some of their queries.  Whether you are running complex analytics or short operational BI workloads, Vertica’s workload management capabilities can effectively balance all system resources to meet your SLAs.
  • In 2007, Vertica was the first analytics platform to run on the Cloud. With Vertica 6 we extended this lead in significant ways. First, we announced a private beta of Vertica on HP’s new public Cloud. Further, we extended the core platform with a number of enhancements such as Object-level Backup & Restore and Schema evolution that make it much easier to develop and deploy multi-tenant applications using Vertica.
  • Last but not the least, we continue to invest in our core platform with a  number of enhancements to overall query and load performance, elastic cluster, monitoring views, database designer, security and more.

For a quick overview of Vertica 6, we highly recommend watching this video interview of Luis Maldonado, Director of Product Management, from last week’s HP Discover show, or reading What’s new in Vertica 6 Enterprise Edition.  Watch this space over the next few days, for more articles that drill down into specific features of Vertica 6!

We at Vertica are very excited about Vertica 6 and the role it plays in HP’s vision for Cloud, Security and Information Optimization.  We hope you will try it for yourselves and tell us your stories about how you used it to Make it Matter for your business!

Over the next few days, watch this space for articles that drill down into specific features of Vertica 6.

Get Started With Vertica Today

Subscribe to Vertica