Vertica Blog
Bryan Herger

Bryan Herger

Vertica Big Data Solution Architect at Micro Focus

My background is analytical chemistry and current interests and work are in finance, adtech, health information, and real-time streaming analytics.

Connect With Bryan on

SQL Query Optimization

Improving COUNT DISTINCT Performance with Approximate Functions

A common analytic use case is to find the number of distinct items in a data set. Vertica performs well at solving COUNT DISTINCT in a few ways. Since Vertica stores all data in columns, it is possible to optimize for COUNT DISTINCT by building a projection that is tuned for this use case. Vertica...

Extracting Log Data into Vertica With Regex Parsing

Formatted text such as system logs hold a huge amount of actionable data if you can extract and discover the content. If there is a pattern to the lines, then you can build a Vertica regex parser to extract fields and contents into Vertica for query and analysis at scale. Vertica’s regex parser uses Perl-like...
Electronic medical record on digital smart device

Vertica in Good Health: Ingesting, Securing, and Analyzing Healthcare Data

Electronic Medical Records and new personal healthcare technology, particularly medical devices, generate a huge amount of data. Organizations including Cerner and Philips Healthcare rely on Vertica for data analysis. How can Vertica help you analyze this data? Vertica excels at data ingest and provides hundreds of built-in SQL and analytic functions with broad integration with...
Global Partitions

Optimizing Vertica Data With Partitions

How do partitions affect overall Vertica operations? Let’s look at a snapshot of the Flight Data Tracker dataset and compare performance ...
Map with pins

Loading ZIP Code Boundaries from a Shapefile in Three Easy Steps!

Are you working on a project where you'd like to classify position data into a geographic region? The US Census Bureau provides shapefiles for ZIP code tracts that you can ingest into Vertica and use to identify the ZIP code for given coordinates. In this exercise, we'll load the shapefile into Vertica, examine the contents,...
Business man showing profitable stock market graph hologram over a tablet.

Finding Patterns in Stock Prices With Vertica

Vertica can ingest data from many sources and enable SQL-based preparation and analytics. Here is another real-world example: Let's use Vertica to find patterns in a public data set of stock and (Exchange-Traded Fund) ETF prices from Kaggle. I’ll use the following data source in this post: https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs This data source provides several thousand files...

Vertica on the Road! Using Vertica to Prepare and Analyze Vehicle Telematics Data

Vehicles generate a lot of data from onboard electronics and there is increasing interest in monitoring vehicle handling to help assess insurance risk and premiums. At Vertica, we're working with a few auto manufacturers, so I built this demo to show a relatively simple example of data collection, aggregation and preparation, and analysis of the...

Are Your Columns too Wide?

This tip expands on the earlier post on encoding and compression at Checking and Improving Column Compression and Encoding When you have millions to billions of rows, data type becomes a bit more important: even an extra 10 bytes per row across a huge data set will impact storage or performance (or both!). When I...

Checking and Improving Column Compression and Encoding

When working with terabytes of data, storage and transfer become major time and cost sinks. Vertica can help minimize storage cost and transfer time with column compression and encoding. How can we identify Vertica tables that might benefit from compression? Information about column size and current compression is stored across column_storage and projection_columns table. The...
Red arrow points to silver ball with 4 red arrows pointing away representing dividing a single thing into multiple

Parallel Processing Using Partitions With Vertica UDx

You can add functionality to Vertica using UDx, but what if you need to process more data than can be efficiently processed in a single thread or a single node? Vertica can divide data into partitions defined with the OVER() clause and distribute computing across nodes. This partition processing is "shared-nothing" similar to the Map...
Vertica and Python logos stacked

Extending Vertica with Python functions: Adding NumPy FFT as a UDx

User-Defined Transform Function (UDTF) support for Python UDx were added back in Vertica 9.1, allowing you to add a much greater range of existing libraries and functions to Vertica. In this example, I'll add Fast Fourier Transform (FFT) from the NumPy package. FFT is a way to transform time-domain data into frequency-domain data. My test...
Clusters of points colored differently by grouping on a black background

Finding the “K” in K-means Clustering With a UDx

You can apply k-means clustering to partition data points into k different groups. Along with the data, the number of clusters "k" is an input to the algorithm. Common examples like the Iris data set tell you upfront how many different groups exist, so you set k=3. What if you don't know how many clusters...