Vertica Blog

Vertica Blog

Under the Hood

Kubernetes and containers

Vertica on Amazon Elastic Kubernetes Service (EKS)

Introduction In my previous blog, I showed you the steps to run Vertica on Kubernetes (K8s) on your laptop. That’s fine if you want to try things out on a small scale, but what if you want to run it on a larger system? In this blog, I will teach you how to deploy Vertica...
Kubernetes and containers

Vertica on Kubernetes

This blog post has been updated from an earlier one to use new features that make deploying on your own system easier. Vertica released the VerticaDB operator in August, 2021, which began Vertica’s integration with Kubernetes. The operator automates many Vertica administrator tasks, such as restarting Vertica if any of the nodes go down, upgrading...
Tips in bright neon red letters on a dark background with touchscreens.

High Availability for a Vertica in Eon Mode Database with Read-Only Mode

When Vertica is deployed in Eon Mode, there are two major requirements for high availability: maintaining quorum and shard coverage. If either of them is lost, the whole cluster goes down. Before going into details, let us understand what quorum and shard coverage mean. Quorum: In Eon Mode, more than half of the primary nodes...

VerticaPy reaches a milestone at 100 stars

The Vertica team is happy to share a milestone in our “VerticaPy journey”: We just reached 100 stars in our GitHub repo, and it’s growing every day. (Repo: That’s “repository” for those of you unfamiliar with GitHub.) Repos accumulate stars as an indication of user interest – think of them as bookmarks in a user’s...
SQL Query Optimization

Improving COUNT DISTINCT Performance with Approximate Functions

A common analytic use case is to find the number of distinct items in a data set. Vertica performs well at solving COUNT DISTINCT in a few ways. Since Vertica stores all data in columns, it is possible to optimize for COUNT DISTINCT by building a projection that is tuned for this use case. Vertica...

Break the bias – and predict brake bias

The theme for this year’s International Women’s Day (IWD) was given the name #BreakTheBias to get us to imagine a gender equal world. A world free of bias, stereotypes, and discrimination, a world that is diverse, equitable, and inclusive, where difference is valued and celebrated. Together, we can protect women’s equality. At Micro Focus, as...
spark plus vertica hands touching

Unleash the Power of Vertica and Apache Spark Using the Upgraded Spark Connector

This post is authored by Alex Le What is Apache Spark? Apache Spark is a distributed compute engine that provides a robust API for data science, machine learning, or to work with big data. It is fast, scalable, simple, and supports multiple languages, including Python, SQL, Scala, Java, and R. Backed by the Apache 2.0...
A key with Anomaly Description written on it

Detect Outliers Using Isolation Forest

The Isolation Forest (IForest) algorithm is a powerful and scalable algorithm for identifying outliers in your data. It uses a unique approach which focuses directly on characterizing outliers instead of normal data points. Specifically, IForest focuses on the two quantitative properties of outliers: they are few and different. A scalable implementation of the Isolation Forest algorithm is available...

When the typewriter is broken: Vertica’s Event-Based Windows to the rescue

Those of you who read my blog post “Take nothing but memories. Leave nothing but footprints” may recall my passion for using Raspberry Pi computers to capture IoT data from aeroplane transponders and GPS satellites, then feeding that streaming data into Apache Kafka and to integrate it with Vertica and perform near real-time analytics and...

What’s needed for a happy software dev team in data analytics?

For an organization to excel at data analytics, the IT team needs to coordinate a number of different disciplines and personnel with experience in those disciplines. This usually includes data analysts, data engineers, and, increasingly, data scientists. The data engineering discipline is sometimes thought of as the plumbing that, like pipes in a house, delivers...

VerticaPy Unify 2022 Sessions

Vertica Unify 2022 is a great time to learn about Vertica, its new features, and best practices. To complement the many great presentations at Vertica Unify 2022 both in Boston and Paris, I’m very excited to present two sessions: one on VerticaPy best practices, and another general session on VerticaPy and its features. VerticaPy is...

No need to extract data from your database to do your analytics!

While taking a long-awaited, and, IMHO some well-deserved R&R in the sun, I heard that ping notification coming from my backpack, and not being one who can ignore such things, reached for my iPad. With my OOTO email response already set, I should be able to ignore most incoming messages, texts and otherwise, safe in...