Advanced In-Database Analytics

Vertica offers a robust and ever growing set of Advanced In-Database Analytics functionality so that customers can conduct the analytics computations closer to the data, and get immediate answers from a single place without the need to extract information to a separate environment for processing. Keeping the data in-database is especially critical as the size of data sets shifts from terabytes to petabytes and beyond.

Importantly, Vertica’s native analytic functions and UDF’s were also purpose-built to take full advantage of our unique MPP parallelism, columnar storage and execution so that they execute faster than on any other platform. Vertica offers full ANSI standard SQL support, SQL analytic functions, regular text expressions, native SQL extensions and a User Defined Function (UDF) framework that is rapidly expanding beyond the SQL Macro UDFs we offer today. All of these offer business analysts, developers, and administrators flexibility and simplicity so they can leverage the power of massive information without having a PhD in parallel processing. They can simply ask questions using their favorite tools.

Some of the Vertica Advanced Analytics features include:

  • Native gap filling
  • Interpolation
  • Event window functions
  • Graph traversal
  • Sessionization

Vertica’s concise SQL syntax extension for using these is very straight forward so there is no need to learn and use another language.

There is a lot of talk about NoSQL, but it is usually not the SQL syntax that is the real problem, it is instead the poor performance of a traditional RDBMS beneath it. Take for example Graph Traversal for social network end-point analysis. A traditional RDBMS will use Connect By and painful many-way self-joins to get an answer, which usually only overload the system and fail to return the end-result. Alternatively, this can be done using custom data structures and procedural languages, but unfortunately, these are often not enterprise class and lack simplicity, scalability and efficient parallelism. Vertica on the other hand is able to deliver this eloquently and simply today using our MPP operations and efficient columnar join execution on sorted data with standard SQL syntax. Best of all, the Vertica Optimizer and Execution Engine handle all of the planning and parallelism for you.

Event-Based Windows

You can also use Vertica’s event-based windows to break time series data into windows on the fly. This is especially relevant in financial data where analysis might focus on specific events as triggers to other activity. Sessionization, a special case of event-based windows, is a feature often used to analyze click streams, such as identifying web browsing sessions from recorded web clicks during a specific period of time. Unlike brute force procedural methods that can achieve this, Vertica is simple, efficient, and massively parallel so that web sessionization can be done in an ad-hoc manner with various window parameters determined on the fly. Suppose for example that 30 seconds may not be an average web visit session. Vertica can automatically analyze the intervals of sessions from equal IP addresses to determine what the average session time truly is and then tokenize or sessionize the data automatically based on that parameter.

Vertica is investing a significant amount of R&D in more robust In-Database Analytics so that our customers will be able to perform additional statistical, geospatial, decision tree and several other advanced analyses of their data in more ways, faster using SQL or their favorite procedural programming or scripting language.

Further Reading

For more on Vertica’s Advanced In-Database Analytics, check out these posts on our blog: