Vertica Blog

Vertica Blog

Under the Hood

Exploring VerticaPyLab: A Quick Start to ML, Data Analytics, and Vertica

Authored by Badr Ouali and Umar Farooq Ghumman Welcome to VerticaPyLab, a transformative solution that paves the way for effortless Machine Learning and Data Analytics. If the world of Python's ML libraries has intrigued you but appeared complex to navigate, VerticaPyLab is here to redefine your journey. It is designed to make ML accessible and...

How to Group Timeseries Data at Any Granularity

You might have come across this. You would have obtained a huge set of time stamped log data or sensor data that you would like to understand. Millions of rows are nothing for human consumption - and far too much for plotting on a monitor that just has a few thousand pixels across - why...
A key with a key hole and Unlock Vertica Potential in text at the bottom

How to Use UDx to Extend Vertica Live Aggregate Projections

Thanks to its sophisticated optimizer and extremely efficient query engine Vertica can process data aggregations order of magnitudes faster than traditional Database Management Systems. To further boost data aggregation performance, we can use Vertica’s Live Aggregate Projections (LAP from now on). The concept is very simple: pre-aggregate data once during load operations rather than doing...

How To Filter a Behavioral Pattern in a Time Series

Behavioral patterns in time series are something that many analysts are keen to find in time series. A clickstream analyst wants to find the series of clicks that happened between coming to the website, browsing one or more articles in the web site, and finally filling the basket and checking out; a financial analyst wants...
A key with a key hole and Unlock Vertica Potential in text at the bottom

How to Use Database Designer to Optimize Vertica Physical Data Model in Six Steps

Vertica provides out-of-the-box a powerful tool to tune its Physical Data Model: Database Designer (DBD from now on). You can run DBD using the Management Console GUI but in this article we will focus on what they call “programmatic interface” (read... plain SQL) and – as practical examples are often more enlightening than “theoretical” documentation...
Kubernetes and containers

Vertica on Amazon Elastic Kubernetes Service (EKS)

Introduction In my previous blog, I showed you the steps to run Vertica on Kubernetes (K8s) on your laptop. That’s fine if you want to try things out on a small scale, but what if you want to run it on a larger system? In this blog, I will teach you how to deploy Vertica...
Kubernetes and containers

Vertica on Kubernetes

This blog post has been updated from an earlier one to use new features that make deploying on your own system easier. Vertica released the VerticaDB operator in August, 2021, which began Vertica’s integration with Kubernetes. The operator automates many Vertica administrator tasks, such as restarting Vertica if any of the nodes go down, upgrading...
Tips in bright neon red letters on a dark background with touchscreens.

High Availability for a Vertica in Eon Mode Database with Read-Only Mode

When Vertica is deployed in Eon Mode, there are two major requirements for high availability: maintaining quorum and shard coverage. If either of them is lost, the whole cluster goes down. Before going into details, let us understand what quorum and shard coverage mean. Quorum: In Eon Mode, more than half of the primary nodes...

VerticaPy reaches a milestone at 100 stars

The Vertica team is happy to share a milestone in our “VerticaPy journey”: We just reached 100 stars in our GitHub repo, and it’s growing every day. (Repo: That’s “repository” for those of you unfamiliar with GitHub.) Repos accumulate stars as an indication of user interest – think of them as bookmarks in a user’s...
SQL Query Optimization

Improving COUNT DISTINCT Performance with Approximate Functions

A common analytic use case is to find the number of distinct items in a data set. Vertica performs well at solving COUNT DISTINCT in a few ways. Since Vertica stores all data in columns, it is possible to optimize for COUNT DISTINCT by building a projection that is tuned for this use case. Vertica...

Break the bias – and predict brake bias

The theme for this year’s International Women’s Day (IWD) was given the name #BreakTheBias to get us to imagine a gender equal world. A world free of bias, stereotypes, and discrimination, a world that is diverse, equitable, and inclusive, where difference is valued and celebrated. Together, we can protect women’s equality. At Micro Focus, as...
spark plus vertica hands touching

Unleash the Power of Vertica and Apache Spark Using the Upgraded Spark Connector

This post is authored by Alex Le What is Apache Spark? Apache Spark is a distributed compute engine that provides a robust API for data science, machine learning, or to work with big data. It is fast, scalable, simple, and supports multiple languages, including Python, SQL, Scala, Java, and R. Backed by the Apache 2.0...