Vertica Blog
Bryan Herger

Bryan Herger

Vertica Big Data Solution Architect at Micro Focus

My background is analytical chemistry and current interests and work are in finance, adtech, health information, and real-time streaming analytics.

Connect With Bryan on

Global Partitions

Optimizing Vertica Data With Partitions

How do partitions affect overall Vertica operations? Let’s look at a snapshot of the Flight Data Tracker dataset and compare performance ...
Map with pins

Loading ZIP Code Boundaries from a Shapefile in Three Easy Steps!

Are you working on a project where you'd like to classify position data into a geographic region? The US Census Bureau provides shapefiles for ZIP code tracts that you can ingest into Vertica and use to identify the ZIP code for given coordinates. In this exercise, we'll load the shapefile into Vertica, examine the contents,...
Business man showing profitable stock market graph hologram over a tablet.

Finding Patterns in Stock Prices With Vertica

Vertica can ingest data from many sources and enable SQL-based preparation and analytics. Here is another real-world example: Let's use Vertica to find patterns in a public data set of stock and (Exchange-Traded Fund) ETF prices from Kaggle. I’ll use the following data source in this post: This data source provides several thousand files for...

Vertica on the Road! Using Vertica to Prepare and Analyze Vehicle Telematics Data

Vehicles generate a lot of data from onboard electronics and there is increasing interest in monitoring vehicle handling to help assess insurance risk and premiums. At Vertica, we're working with a few auto manufacturers, so I built this demo to show a relatively simple example of data collection, aggregation and preparation, and analysis of the...

Are Your Columns too Wide?

This tip expands on the earlier post on encoding and compression at When you have millions to billions of rows, data type becomes a bit more important: even an extra 10 bytes per row across a huge data set will impact storage or performance (or both!). When I created the big_fact_table, I included some VARCHAR...

Checking and Improving Column Compression and Encoding

When working with terabytes of data, storage and transfer become major time and cost sinks. Vertica can help minimize storage cost and transfer time with column compression and encoding. How can we identify Vertica tables that might benefit from compression? Information about column size and current compression is stored across column_storage and projection_columns table. The...
Red arrow points to silver ball with 4 red arrows pointing away representing dividing a single thing into multiple

Parallel Processing Using Partitions With Vertica UDx

You can add functionality to Vertica using UDx, but what if you need to process more data than can be efficiently processed in a single thread or a single node? Vertica can divide data into partitions defined with the OVER() clause and distribute computing across nodes. This partition processing is "shared-nothing" similar to the Map...
Vertica and Python logos stacked

Extending Vertica with Python functions: Adding NumPy FFT as a UDx

User-Defined Transform Function (UDTF) support for Python UDx were added back in Vertica 9.1, allowing you to add a much greater range of existing libraries and functions to Vertica. In this example, I'll add Fast Fourier Transform (FFT) from the NumPy package. FFT is a way to transform time-domain data into frequency-domain data. My test...
Clusters of points colored differently by grouping on a black background

Finding the “K” in K-means Clustering With a UDx

You can apply k-means clustering to partition data points into k different groups. Along with the data, the number of clusters "k" is an input to the algorithm. Common examples like the Iris data set tell you upfront how many different groups exist, so you set k=3. What if you don't know how many clusters...
Disk usage, traffic, and other usage concerns

Diving into Disk Usage

Would you like to know how much disk space Vertica is using as it runs? This could be useful for capacity planning, monitoring trends, or debugging. Here are some ways to follow disk usage trends and also look at temporary events like Tuple Mover and Join Spills: Helpful Link: Have fun!
Vintage businessman concept wearing futuristic helmet at office

Find and Fix Issues from Vertica Query Events

Vertica offers tools like the Workload Analyzer in Management Console (MC) to tune up a Vertica Cluster, but there's a simple way to find and fix issues that Vertica observes and records if you aren't using MC. The query_events table captures optimization issues and suggests fixes. Let's take a look at my demo cluster, checking...

Streaming Data in One Line!

Remember that game show, "Name That Tune", where contestants were challenged to name a tune in as few notes as possible? Today's tip converts that for Vertica Big Data, showing how we can ingest streaming data in just one line! A simple way to stream data is to write CSV rows to a network socket....