This blog post was authored by Soniya Shah.
Vertica 9.0.1 introduces new functionality that continues to match our goals for fast-paced development and enhancement of machine learning in Vertica. In this release, we introduce support for random forest for regression, a new statistical summary function, increased support for cross validation, and enhancements for data evaluation.
Summary of Enhancements
|Random Forest for Regression
||The new RF_REGRESSOR function allows you to predict numerical values on large data sets using both numerical and categorical predictors.
||This statistical summary function enables users to get important statistical information on categorical columns.
||This functionality automatically upgrades the model format to the latest version during a Vertica upgrade and when you import models. The database administrator can upgrade all eligible models.
|Cross validation enhancements
||The CROSS_VALIDATE function now also supports Naïve Bayes.
|L1 for Logistic Regression
||Run logistic regression with L1, leading to sparser solutions.
Random Forest for Regression
The random forest model is a set of decision trees. The algorithm constructs decision trees during training of a model and then uses them for prediction. The output is the mean prediction of the individual trees.
Random forest is a robust regression algorithm that works well on many different types of data sets. A set of function parameters provides good control over how the ensemble model is built, including the number of trees, tree depth, sampling size, and more.
For More Information
For more information, see the following in the Vertica documentation:
• Machine Learning Functions
in the SQL Reference Manual
• Machine Learning for Predictive Analytics
in the Analyzing Data guide.
We are constantly expanding machine learning features in Vertica. You can expect to see expanded functionality in future releases.