What’s New in Vertica 9.0.1: Machine Learning

This blog post was authored by Soniya Shah.

Vertica 9.0.1 introduces new functionality that continues to match our goals for fast-paced development and enhancement of machine learning in Vertica. In this release, we introduce support for random forest for regression, a new statistical summary function, increased support for cross validation, and enhancements for data evaluation.

Summary of Enhancements

New Feature Description
Random Forest for Regression The new RF_REGRESSOR function allows you to predict numerical values on large data sets using both numerical and categorical predictors.
SUMMARIZE_CATCOLfunction This statistical summary function enables users to get important statistical information on categorical columns.
UPGRADE_MODEL function This functionality automatically upgrades the model format to the latest version during a Vertica upgrade and when you import models. The database administrator can upgrade all eligible models.
Cross validation enhancements The CROSS_VALIDATE function now also supports Naïve Bayes.
L1 for Logistic Regression Run logistic regression with L1, leading to sparser solutions.


Random Forest for Regression

The random forest model is a set of decision trees. The algorithm constructs decision trees during training of a model and then uses them for prediction. The output is the mean prediction of the individual trees.

Random forest is a robust regression algorithm that works well on many different types of data sets. A set of function parameters provides good control over how the ensemble model is built, including the number of trees, tree depth, sampling size, and more.

For More Information

For more information, see the following in the Vertica documentation:

Machine Learning Functions in the SQL Reference Manual

Machine Learning for Predictive Analytics in the Analyzing Data guide.

We are constantly expanding machine learning features in Vertica. You can expect to see expanded functionality in future releases.