Keep up with the new speed and scale of business

In today’s data-driven world, creating a competitive advantage depends on your ability to transform massive volumes of data into meaningful insights.

Machine learning is gaining popularity as an essential way of not only identifying patterns and relationships, but also predicting outcomes. This is creating a fundamental shift in the way businesses are operating—from being reactive to being proactive. Unfortunately, the growing velocity, volume, and variety of data has increased the complexity of building predictive models, since few tools are capable of processing these massive data sets at the speed of business. HPE Vertica’s in-database machine learning allows you to embrace the power of Big Data and accelerate business outcomes with no limits and no compromises.

Built into Vertica’s core—with no need to download and install separate packages— in-database machine learning algorithms deliver:


While most external tools like R and Python have limitations on the size of the data set they can handle—forcing users to down-sample for analysis and reducing the benefits of analyzing large volumes of data—Vertica’s in-database machine learning takes advantage of a scale-out architecture to analyze massive volumes of data.


Vertica’s native ingest, data preparation, and model management features cover the entire data mining lifecycle, eliminating the need to export and load data into another tool for analysis. Users can train, test, and deploy machine learning models using a familiar, SQL-like interface, without having to learn new techniques.


Vertica’s in-database machine learning accelerates time-to-insight by reducing the amount of time spent preparing, normalizing and moving data across tools and platforms. Users can also leverage Vertica’s massively parallel processing (MPP) architecture, advanced compression and blazing fast query speed.

Let’s look at just some of the in-database machine learning algorithms built into the core of Vertica.

The K-Means algorithm is a type of unsupervised learning algorithm, meaning the input data is unlabeled. The algorithm takes the unlabeled data and clusters the data points into different clusters based on similarities between the data points.

The Logistic Regression algorithm is used to classify data into groups based on the logical relationship between independent variables, or features, and some dependent variable, or outcome. The outcome of logistic regression is a binary value which represents an outcome such as true/false, pass/fail, yes/no, 1/0.

The Linear Regression algorithm is used to predict continuous numerical outcomes in linear relationships along a continuum. Using linear regression, you can model the linear relationship between independent variables, or features, and a dependent variable, or outcome.