Vertica Blog

Vertica Blog

Quick Tips

Are Your Columns too Wide?

This tip expands on the earlier post on encoding and compression at When you have millions to billions of rows, data type becomes a bit more important: even an extra 10 bytes per row across a huge data set will impact storage or performance (or both!). When I created the big_fact_table, I included some VARCHAR...

Checking and Improving Column Compression and Encoding

When working with terabytes of data, storage and transfer become major time and cost sinks. Vertica can help minimize storage cost and transfer time with column compression and encoding. How can we identify Vertica tables that might benefit from compression? Information about column size and current compression is stored across column_storage and projection_columns table. The...
Construction site crane building a blue SQL 3D text.

Get the Row Count from an Outer Table Join

Aggregate functions summarize data over groups of rows from a query result set. When using an aggregate function like COUNT with an "*" or "1" parameter value, you may get a different result when the query implements a LEFT join verses an INNER join. If you want a row count of just the rows from...
Helpful Tips message on post-it note

DIRECT Is Now the Default Load Type

Write Optimized Store (WOS) is a memory-resident data structure for short-term data storage while Read Optimized Store (ROS) is a highly optimized, read-oriented, disk storage structure, organized by projection. Prior to Vertica 9.3, by default, Vertica initially loads data into WOS. For databases created in version 9.3 and later, Vertica now uses a default load...
Create Read Update Delete in red 3D letters surrounded by 3D ones and zeroes

Vertica 9.3 Supports UPDATE and DELETE Operations on Tables Having Live Aggregate Projections!

A live aggregate projection contains columns with values that are aggregated from columns in its anchor table. When you load data into the table, Vertica aggregates the data before loading it into the live aggregate projection. On subsequent loads (for example, through INSERT or COPY) Vertica recalculates aggregations with the new data and updates the...
Red arrow points to silver ball with 4 red arrows pointing away representing dividing a single thing into multiple

Parallel Processing Using Partitions With Vertica UDx

You can add functionality to Vertica using UDx, but what if you need to process more data than can be efficiently processed in a single thread or a single node? Vertica can divide data into partitions defined with the OVER() clause and distribute computing across nodes. This partition processing is "shared-nothing" similar to the Map...
Vertica and Python logos stacked

Extending Vertica with Python functions: Adding NumPy FFT as a UDx

User-Defined Transform Function (UDTF) support for Python UDx were added back in Vertica 9.1, allowing you to add a much greater range of existing libraries and functions to Vertica. In this example, I'll add Fast Fourier Transform (FFT) from the NumPy package. FFT is a way to transform time-domain data into frequency-domain data. My test...
Clusters of points colored differently by grouping on a black background

Finding the “K” in K-means Clustering With a UDx

You can apply k-means clustering to partition data points into k different groups. Along with the data, the number of clusters "k" is an input to the algorithm. Common examples like the Iris data set tell you upfront how many different groups exist, so you set k=3. What if you don't know how many clusters...
Hand writing the text: Helpful Tips

Tracking Raw Schema Size Over Time

There is no way to go back in time to see how large a schema was a day/month/year ago. But you can use the AUDIT function to get a “raw” size of a schema and track growth over time by storing the results from a daily audit. Using cron, you can run this daily. Helpful...
Photo of actual roadrunner running on a road showing yellow stripe in the middle

Have VSQL Beep Upon Command Completion

There are a lot of cool command line options available to VSQL. One of my favorites is the –b option which causes Vertica to “beep” when a command completes. This is very useful if you are running a very long sequence of SQL commands via a script. You can go off and do other things...
hands sweeping into a dustpan

Easy Development Schema Cleanup (i.e., Have Vertica Drop a Bunch of Tables for You)

On a development database, I have a lot of tables in the PUBLIC schema having similar names starting with “test_”. I’d like to drop all of these tables quickly. To do that, I can have Vertica generate the DROP commands and then execute them for me! P.S. Be careful with this command! Make sure when...
Helpful Tips in blue text with magnifying glass

Investigate Data Collector Table Data Produced by Scrutinize

The diagnostics tool scrutinize collects a broad range of information from a Vertica cluster. Part of the output from scrutinize includes data collector table data. Once extracted, we can easily load the data from a data collector table file into a Vertica Flex Table for analysis! Let’s take a look at the data from the...