Machine Learning

Vertica’s In-database Random Forest, Highly Scalable and Easy to Use

Arash Jalal Zadeh Fard authored this post. Vertica has a set of machine learning tools covering a wide spectrum of advanced data analysis functionalities like data discovery, data preparation, model training, model evaluation, and model management. The goal of this blog post is to provide a hands-on example demonstrating how the built-in distributed Random Forest […]

Make data analysis easier with dimensionality reduction

This blog post was authored by Anh Le. Introduction As the number of features in your data set grows, it becomes harder to work with. Visualizing 2D or 3D data is straightforward, but for higher dimensions you can only select a subset of two or three features to plot at a time, or turn to […]

Machine Learning Key Terms

This blog post was authored by Soniya Shah. Machine learning seems to be everywhere these days – in the online recommendations you get on Netflix, the self-driving cars that hyped in the media, and in serious cases, like fraud detection. Data is a huge part of machine learning, and so are the key terms. Unless […]

What’s New in Vertica 9.1: Precision-Recall Curve and F1-Score Machine Learning Evaluation Functions

This blog post was authored by Ginger Ni. The precision-recall curve is a measure for evaluating binary classifiers. It is a basic measure derived from the confusion matrix. In Vertica 9.1, we provide a new machine learning evaluation function PRC() for calculating precision and recall values from the results of binary classifiers. Along with the […]

Using Vertica Machine Learning to Analyze Smart Meter Data

This blog post was authored by Soniya Shah. Machine learning and data science have the potential to transform businesses because of their ability to deliver non-obvious, valuable insights from massive amounts of data. However, many data scientist’s workflows are hindered by computational constraints, especially when working with very large data sets. While most real-world data […]

What’s New in Vertica 9.0.1: Machine Learning

This blog post was authored by Soniya Shah. Vertica 9.0.1 introduces new functionality that continues to match our goals for fast-paced development and enhancement of machine learning in Vertica. In this release, we introduce support for random forest for regression, a new statistical summary function, increased support for cross validation, and enhancements for data evaluation. […]

Estimate the Price of Diamonds Using Vertica Machine Learning

This blog post was authored by Vincent Xu. In this blog post, I’ll take you through the exercise I did to estimate the price of a diamond based on its characteristics, using the linear regression algorithm in Vertica. Besides Vertica 9.0, I used Tableau for charting and DbVisualizer as the SQL editor. From this exercise, […]

Machine Learning Mondays: Vertica 9.0 Cheat Sheet

This blog post was authored by Vincent Xu. Vertica 9.0 is out and here is the updated Vertica machine learning cheat sheet. Vertica 9.0 introduces a slew of new machine learning features including one-hot encoding, Lasso regression, cross validation, model import/export, and many more. See the cheat sheet for examples of how to use the […]

What’s New in Vertica 9.0: Machine Learning Enhancements

This blog post was authored by Soniya Shah. Vertica 9.0 introduces new functionality that continues to match our goals for fast-paced development of the existing machine learning functions. In this release, we introduce two new summary functions, support for cross validation, support for one hot encoding, and the ability to import and export your models […]

Compute Engine or Analytical Data Mart for Distributed Machine Learning? Vertica Explains How to Choose

This blog post was authored by Sarah Lemaire. On Tuesday, August 22, The Boston Vertica User Group hosted a late-summer Meetup to talk to attendees about compute engines and data mart applications, and the advantages and disadvantages of both solutions. In the cozy rustic-industrial atmosphere of Commonwealth Market and Restaurant, decorated with recycled wood pallets, […]

What’s New in Vertica 8.1.1: Machine Learning

This blog post was authored by Soniya Shah. Vertica 8.1.1 continues with the fast-paced development for machine learning. In this release, we introduce the highly-requested random forest algorithm. We added support for SVM to include SVM for regression, in addition to the existing SVM for classification algorithm. L2 regularization was added to both the linear […]

Machine Learning Mondays: How Vertica Implements Efficient and Scalable Machine Learning

This blog post was authored by Vincent Xu. As of Vertica 8.1, Vertica has introduced a set of popular machine learning algorithms, including Linear Regression, Logistic Regression, Kmeans, Naïve Bayes, and SVM. Based on our recent benchmarks, they run faster than MLlib on Apache Spark. The following chart shows the performance difference between Vertica 8.1.0 […]

Machine Learning Mondays: Data Preparation for Machine Learning in Vertica

This blog post was authored by Vincent Xu. This post is part of our Machine Learning Mondays series. Stay tuned for more! Introduction Machine learning (ML) is an iterative process. From understanding data, preparing data, building models, testing models to deploying models, every step of the way requires careful examination and manipulation of the data. […]

What’s New in Vertica 8.1: Machine Learning

This blog post was authored by Soniya Shah. Overall, you will notice that Machine Learning for Predictive Analytics, introduced in Vertica 7.2.2, is more accessible to use in Vertica 8.1, with the addition of several important functions. There are improvements to model management with access control ability to save and re-apply normalization parameters, missing value […]

Vertica Machine Learning Series: Logistic Regression

This blog post is based on a white paper authored by Maurizio Felici. What is Logistic Regression? Logistic regression is a popular machine learning algorithm used for binary classification. Logistic regression labels a sample with one of two possible classes, given a set of predictors in the sample. Optionally, the output can be the probability […]

Vertica Machine Learning Series: k-means

The content of this blog is based on a white paper that was authored by Maurizio Felici. What is k-means Clustering? K-means clustering is an unsupervised learning algorithm that clusters data into groups based on their similarity. Using k-means, you can find k clusters of data, represented by centroids. As the user, you select the […]

Machine Learning Series: Linear Regression

The content of this blog is based on a white paper that was authored by Maurizio Felici. This blog post is just one in a series of blog posts about the machine learning algorithms in Vertica. Stay tuned for more! What is Linear Regression? Let’s start with the basics. Linear regression is one of the […]

Watch Machine Learning for Predictive Analytics in Action

Watch this video to learn more about the Vertica Machine Learning for Predictive Analytics features new in 7.2

Learn More From Your Data with Machine Learning Algorithms

New in Vertica 7.2.2 is the Machine Learning for Predictive Analytics package. This analytics package allows you to use built-in machine learning algorithms on data in your Vertica database. Machine learning algorithms are extremely valuable in data analytics because, as their name suggests, they can learn from your data and provide information about deductive and […]