Under the Hood

Do you have experience using Vertica?

The Vertica Product Management team is collecting feedback on what new features and functionality our customers would like to see added to the product. We’ve been actively gathering product-related feedback for nearly a year and have been using it to make more informed decisions about how we manage the Vertica product roadmap. Thank you for […]

Elastic Throughput Scaling: Overview

Manvendra Panwar authored this post. Elastic Throughput Scaling (ETS) enables Vertica to increase concurrency through which we can achieve higher throughput. ETS can take advantage of redundant shard subscriptions to automatically identify the nodes in the cluster that can execute a query. More Nodes (N) than Shards (S): Achieving higher throughput • S nodes can […]

Multi-Factor Authentication for Vertica with Kerberos and Duo

This blog post was authored by Poojan Khanpara, Tom Wall, and Elizabeth Michaud. Multi-factor authentication (MFA) provides a secure method for validating users on login. And as we mentioned in another recent post, GDPR Ready Recipe, the full implementation of GDPR may soon make multi-factor authentication a legal necessity for vendors doing business in the […]

Try Vertica in a Virtual Machine

Curious about Vertica? Check out the Vertica Community Edition VM! It’s a free, quick, and an easy hands-on introduction. In the VM, you’ll find everything up and running. You won’t have any setup to do. You’ll find a single-node installation of Vertica Community Edition with the VMart example database, the Vertica client tools, and a […]

Plotting data with vsql and gnuplot

Maurizio Felici authored this post. We all know how plotting data is a very useful – and widely adopted – technique. Linux (and MacOS) offer a wonderful command line tool to plot data called gnuplot. You can install and run gnuplot on one of your Vertica nodes so you don’t have to move data out […]

Encode Projection Columns with Zstandard Compression: Quick Tip

Vertica 9.2 now supports the Zstandard Compression Encoding Type which offers higher compression than the BZIP and GZIP Encoding Types! Example: dbadmin=> CREATE TABLE test (a VARCHAR(10000), b VARCHAR(10000), c VARCHAR(10000), d VARCHAR(10000), e VARCHAR(10000)); CREATE TABLE dbadmin=> CREATE PROJECTION test_pr dbadmin-> (a ENCODING BZIP_COMP, dbadmin(> b ENCODING GZIP_COMP, dbadmin(> c ENCODING ZSTD_COMP, dbadmin(> d […]

Vertica’s In-database Random Forest, Highly Scalable and Easy to Use

Arash Jalal Zadeh Fard authored this post. Vertica has a set of machine learning tools covering a wide spectrum of advanced data analysis functionalities like data discovery, data preparation, model training, model evaluation, and model management. The goal of this blog post is to provide a hands-on example demonstrating how the built-in distributed Random Forest […]

Pure Vertica SQL Graph Flavors

Moshe Goldberg authored this post. The art and science of the best sales conversation is “Visual story telling”. This document shows how Vertica VSQL alone can visualize query results in color and graphics. The idea presented is to provide dynamic visuals, based only on Vertica VSQL built-in capabilities. All the graph examples below were drawn […]

How to recognize strings containing nbsp

Maurizio Felici authored this post. Word Processors and Web forms often use the non-breaking space character (nbsp) to prevent line breaks. This character looks the same as a normal space, but uses a different Unicode character with its own UTF-8 encoding. If you inadvertently cut and paste an nbsp character in your database strings, any […]

Fast Data Loading with Vertica

Curtis Bennett authored this post. Vertica is well known for its blinding query performance at big data scale, but it can also insert data at very high rates of speed. It can even load data non-stop while being queried, thus enabling real-time analysis of data. Basic Loading Methods There are many ways of loading data […]

Data Preparation Tools – Technical Brief

Curtis Bennett authored this blog Vertica supports a number of industry standard data preparation tools for use in the data science life-cycle. In addition to the functions described here, Vertica has a wide array of analytic capabilities which can be leveraged for additional data preparations including time-series analytics (with missing value imputation), analytic windowing and […]

Master Blog Series: Getting Started with Vertica

This post was authored by Soniya Shah. Are you a new Vertica user? If so, you’re probably wondering where to start. We’re here to help you on your big data analytics journey, from understanding Vertica terminology to making the most of your resources. If you find yourself asking questions like What does the Tuple Mover […]

Converting CHAR and VARCHAR to BINARY or VARBINARY

Jim Knicely authored this tip. As of Vertica 9.1.1, explicit coercion (casting) from CHAR and VARCHAR data types to either BINARY or VARBINARY data types is supported! Example: Prior to Vertica 9.1.1: dbadmin=> CREATE TABLE test_coercion (c1 VARCHAR(100)); CREATE TABLE dbadmin=> INSERT INTO test_coercion SELECT ‘Convert me!’; OUTPUT ——– 1 (1 row) dbadmin=> SELECT c1::VARBINARY […]

Getting Help with Command-Line Tools

Jim Knicely authored this post. Vertica comes with some nifty command-line tools like vsql, admintools, and vbr. If you need assistance using any of these, you can run them with the –-help parameter to view the options available for each tool. Example: [dbadmin@s18384357 ~]$ /opt/vertica/bin/vbr –help vertica backup and restore script help -h –help print […]

Change the Owner of a Schema

Jim Knicely authored this tip. As of Vertica 9.1.1, you can now transfer the ownership of a schema to another user! Example: dbadmin=> SELECT schema_name, schema_owner FROM schemata WHERE schema_name = ‘my_etl_schema’; schema_name | schema_owner —————+————– my_etl_schema | dbadmin (1 row) dbadmin=> CREATE USER etl_user; CREATE USER dbadmin=> ALTER SCHEMA my_etl_schema OWNER TO etl_user; ALTER […]

Referencing Multiple Related LONG VARCHAR Columns

Jim Knicely authored this post. Table columns having the LONG VARCHAR data type can store up to 32,000,000 octets. Since there is a table row limit size of 32,768,000 bytes, how do I reference more than one related LONG VARCHAR, each having the maximum length? Simple. Use more than one table! Example: dbadmin=> CREATE TABLE […]

Generate Random Integers, Including Negative Numbers

Jim Knicely authored this tip. The RANDOMINT(n) function returns one of the n integers from 0 through n – 1. Those are all positive integers. What if I want to include negative integers? That’s easy with a simple multiplication. Example: dbadmin=> SELECT DECODE(randomint(2), 1, 1, -1) * randomint(11) “Random INT from -10 to 10”; Random […]

Summarize a Table’s Projections: Quick Tip

Jim Knicely authored this tip. Use the built-in Vertica function GET_TABLE_PROJECTIONS to quickly summarize the details of all of the projections of a Vertica table. Example: dbadmin=> \a Output format is unaligned. dbadmin=> \t Showing only tuples. dbadmin=> SELECT get_table_projections(‘public.jim’); Current system K is 1. # of Nodes: 3. Table public.jim has 3 projections. Projection […]

What are your Data Loading Preferences?

We’re back with our newest product management survey for this summer! This time we’re asking about how you load your data – everything from the ETL tools you use to how you manage your information. The answers you provide will help us fit Vertica into your infrastructure in a way that is ideal for your […]

Sending Emails from the Vertica Database

Maurizio Felici authored this post. You can create an external procedure to send alert emails from the Vertica database. For more information on external procedures see Using External Procedures in the Vertica documentation. Follow the procedure below to create an external procedure for sending emails. NOTE: These procedures were performed on the Linux OpenSUSE distribution. […]

Master Blog Series : Vertica in Eon Mode

This post was authored by Soniya Shah. In Vertica 9.0 we introduced Eon Mode. Since then, there have been many improvements in recent releases. Vertica in Eon Mode is a new architecture that separates compute and storage, allowing users to take advantage of cloud economics that enable rapid scaling and shrinking of clusters in response […]

How many seconds since midnight?

Jim Knicely authored this post. There are many reasons you might want to know the number of seconds that have passed since midnight (i.e. Event Logging). You could combine several Vertica built-in date manipulation functions to calculate the number of seconds since midnight, but the easiest way is to use the MIDNIGHT_SECONDS function! Example: dbadmin=> […]

Skipping Records with Unspecified JSON Fields

Serge Bonte and Jim Knicely authored this post. Vertica provides a built-in file parser named FJSONPARSER that parses and loads a JSON file. This file can contain either repeated JSON data objects (including nested maps) or an outer list of JSON elements. For a flex table, the parser stores the JSON data in a single-value […]

Be Careful with the Sequence CACHE Value

Jim Knicely authored this tip. The default session cache for a sequence is 250,000. Although you can change the cache value of a sequence, setting the value too low can adversely affect performance! Example: dbadmin=> SELECT COUNT(*) FROM ten_thousand_records; COUNT ——- 10000 (1 row) dbadmin=> CREATE SEQUENCE default_cache; CREATE SEQUENCE dbadmin=> CREATE SEQUENCE non_default_cache CACHE […]

Query Tuning with Vertica: Dos and Don’ts

This blog post was authored by Eugenia Moreno. Query tuning in Vertica is not an exact science. Recommendations differ based on your database. This document assumes that all nodes in the cluster are UP, your Vertica configuration is ok, and that v*perf tools have been executed. The following diagram shows the query flow in Vertica: […]

Rebalance Taking a Long Time

After you add a node to your Vertica cluster or remove a node from your cluster, Vertica rebalances the data across all the nodes. If rebalancing is taking a long time, review these steps to find out the probable cause. Pre-Requisites To ensure a successful rebalance of your cluster, before you start the rebalance, take […]

Storage not Accessible and Vertica Fails to Start on Host

When the database host is still up and available and the power is on, Vertica storage may be inaccessible. Vertica may be down and the disk volumes /data and /catalog are unavailable. To troubleshoot, follow this checklist: Step Task Results 1 When the host rebooted, was there a problem auto-mounting the file systems? If yes, […]

Why is the Vertica Host Slow?

Symptoms: Sluggish response time Server is CPU-bound with a high load and too many processes. The host is possibly in a hung state, unable to make connections, or connections are hanging. Possible causes: Low available memory Too many concurrent active processes running Server actively using swap space To troubleshoot, follow this checklist: Step Task Results […]

What Should I do When Vertica Host is Down?

Each member of a Vertica cluster is referred to as a host throughout this checklist. The Vertica process communicates to other Vertica nodes in a cluster. Node refers to the Vertica database software. If your host is down, follow this checklist: Step Task Results 1 The Vertica host is physically powered off. No status lights […]

Debug Vertica Host Network Connectivity

If your host is physically available and the operating system is up and running, but you cannot access your network to the Vertica cluster, then external connections might not be possible. Client software connections are failing, and users are not able to ssh to the host. From observing a hardware console connection, it looks like […]

Upgrading Vertica

Vertica adds new features and enhances existing features with every release. To access the new and improved features, upgrade to the latest release of Vertica. Prerequisites Before you upgrade, perform the following: Perform a full database backup. If upgrade is unsuccessful, a full back up allows you to roll back to your current version. You […]

Expanding Your Cluster with New Nodes

Does your business need more storage? Consider adding nodes to your database if you need more storage. Vertica recommends that you add all new nodes at the same time. You can use this checklist to add nodes. These are basic steps. There are additional options discussed in the Vertica documentation. Step Task Results 1 Perform […]

What Should I do if the Ancient History Mark is not Advancing?

If the AHM is not advancing, troubleshoot using this checklist: Step Task Results 1 Check whether the Last Good Epoch (LGE) is advancing. => SELECT CURRENT_EPOCH, LAST_GOOD_EPOCH, AHM_EPOCH FROM SYSTEM; If the LGE is advancing, go to Step 2. If the LGE is not advancing, go to Step 5. 2 Check whether all nodes are […]

What Should I do to Shut Down Vertica Node for Maintenance?

If you need to shut down a Vertica node for maintenance, follow this checklist. Step Task Results 1 Verify that all cluster nodes are UP. $ /opt/vertica/bin/admintools -t view_cluster To avoid a long node recovery time after shutdown, if one or more nodes is DOWN, identify and restart them using the instructions in Restarting Vertica […]

What Should I do if my Node Recovery is Slow?

If you are running Vertica 7.2.x or later, perform recovery by table. For details, see Recovery By Table in the Vertica documentation. If you are running a Vertica version prior to 7.1.x, stop the ETL jobs and restart node recovery. Step Task Results 1 Monitor progress of recovery: => SELECT node_name, is_running FROM RECOVERY_STATUS;If is_running […]

Why Does My Node Not Connect to Spread?

If your node is not connected to spread, follow the below steps to troubleshoot the connection issue. Step Task Results 1 Check whether the spread.conf file in the catalog folder is identical across all the nodes in the cluster. $ cat spread.conf If the spread.conf flies are identical across all nodes, go to Step 2. […]

Investigate Sudden Performance Degradation of a Query

Have you ever had a formerly fast-running query begin to run slowly? Use the following checklist to investigate the sudden slowdown of a formerly fast-running query. Step Task Results 1 Check for host error messages using the following commands: $ cat /var/log/messages $ dmesg If the cluster is in good health, go to Step 2. […]

What Should I do if the Database Performance is Slow?

Troubleshootusing the following checklist if your database performance is slow. Check if any of the following problems exist: Step Task Results 1 Is the query performance slow? If the query performance is slow, review the Query Performance checklist. If the query performance is not slow, go to Step 2. 2 Is the entire database slow? […]

Catalog Size Debugging

You have a large catalog if your catalog size is more than 10 GB or your catalog is changing more than 5% per day. This checklist gives you some suggestions and recommendations to monitor and reduce your database catalog size. The database catalog contains tractable metadata such as tables, projections, users, nodes, ROSes, and so […]

How Do I Manage Delete Vectors?

If you want to remove delete vectors manually or troubleshoot why they are not being removed automatically, follow this checklist. Step Task Results 1 Check if you have too many delete vectors (more than 100) in a projection. =>SELECT node_name, schema_name, projection_name, COUNT(*) num_dv, SUM(deleted_row_count) del_cnt, SUM(used_bytes) ubytes, MIN(start_epoch) min_epoch, MAX(start_epoch) max_epoch FROM delete_vectors GROUP […]

What Should I do When the Database Process is not Starting?

If you want to troubleshoot why the database process is not starting, follow this checklist. Step Task Results 1 Ensure Vertica is not already running on any node. $ ps –ef | grep vertica The Vertica process displays as follows: /opt/vertica/bin/vertica -D <catalog directory> -C <dbname> -n <node name> -h <host IP> -p <port> Prior […]

What Should I do When the Database Node is DOWN?

When database node is DOWN, troubleshoot using the following checklist. Step Task Results 1 Check whether your database is UP. $ admintools -t db_status -s UP If the database is UP, go to Step 2. If the database if not UP, restart your database. $ admintools -t start_db -d <Database_name> -p <Database_password> If the database […]

What Version of Vertica am I Running?

Jim Knicely authored this tip. The built-in VERSION function returns a VARCHAR that contains your Vertica node’s version information. Example: dbadmin=> SELECT version(); version ———————————— Vertica Analytic Database v9.1.0-2 (1 row) The Vertica version is formatted as X.Y.Z-R, where… • X.Y is the two-digit Vertica major release number, e.g., 8.1, 9.0 and 9.1 • Z […]

Understanding Vertica Query Budgets

This blog post was authored by Shrirang Kamat. The purpose of this document is to explain how the query budget of a resource pool used by the query can influence the initial memory acquisition for a query and how it impacts query performance. For more details about how we compute the query budget, see the […]

How to Code Vertica UDx

This blog post was authored by Ding-Qiang Liu. In analytic businesses supported by Vertica, complex processing logic is sometimes unavoidable. Using ANSI SQL might cause query strings to be much longer, and will slow the query with a huge volume data to query. If using Vertica SDKs, you can encapsulate that general computing logic in […]

Time Series Analysis with Vertica

This blog post was authored by Maurizio Felici. Time series analytics evaluate the values of a given set of variables over time and group those values into a window, based on a time interval for analysis and aggregation. There are a few types of functions that help perform time series analytics: • Event based functions […]

Load Balancing Options

This blog post was authored by Soniya Shah. Connection load balancing automatically spreads the overhead of client connections across the cluster by redirecting connections. Each client connection a host in your Vertica cluster requires memory and processor time. If a lot of clients connect to a single host, this can affect database performance. The initiator […]

Can you tell us about your data lake?

It’s the fourth round of Vertica product management surveys and we have really appreciated getting your feedback! In this survey, we want to know all about your data lake. We want to know what tools you use, how much data is in your lake, and the types of workloads you are running. We are hoping […]

Make data analysis easier with dimensionality reduction

This blog post was authored by Anh Le. Introduction As the number of features in your data set grows, it becomes harder to work with. Visualizing 2D or 3D data is straightforward, but for higher dimensions you can only select a subset of two or three features to plot at a time, or turn to […]

Reusing EBS Volumes

This blog post was authored by Mark Hayden. You can deploy a Vertica cluster running in Eon or Enterprise mode using EC2 instances that support EBS volumes. This document provides the steps you need to use EBS volumes that were preserved from a Vertica cluster that was terminated, and how to reattach them to revive […]

Vertica in Eon Mode: Revive

This blog post was authored by Soniya Shah. Overview An Eon Mode database keeps an updated version of its data and metadata in a communal storage location. After you shut down the database, the data continues to reside in communal storage. When you are ready to use the storage again, you can revive the database […]

UPDATE: Vertica Test Results with Microcode Patches for the Meltdown and Spectre Security Flaws

This blog post was authored by Michael Leuchtenburg. Introduction Vertica engineers tested database performance with and without Meltdown and Spectre mitigations on both Haswell and Skylake microprocessors. As compared to no mitigation, with the fastest settings, we found a minimal, average 1% slowdown on Haswell, and an average 25% slowdown on Skylake. These results are […]

Understanding Vertica Eon Mode

This blog post was authored by Soniya Shah. This blog post is updated as of July 2018. With Eon mode, Vertica is a database that takes advantage of all elastic compute and scalable storage capabilities in the cloud. In Eon mode, Vertica can take advantage of cloud economics – it can be made to scale […]

Sizing Your Vertica Cluster for an Eon Mode Database

This blog post was authored by Shrirang Kamat. Vertica in Eon Mode is a new architecture that separates compute and storage, allowing users to take advantage of cloud economics that enable rapid scaling and shrinking of clusters in response to a variable workload. Eon Mode decouples the cluster size from the data volume and lets […]

Vertica in Eon Mode: Tables, Parameters, Functions, and Useful Queries

This blog post was authored by Rakesh Bankula. The purpose of this document is to list new system tables, configuration parameters and meta-functions that were added for Vertica Eon mode. This document also has sample queries you can run on the new system tables to get up to speed with Eon mode. This document assumes […]

Running an Eon Mode Database: Live in Vertica 9.1

This blog post was authored by Sarah Lemaire. Before now, you could only operate your Vertica database in Enterprise Mode, the traditional Vertica architecture where your data is distributed across the local nodes. Now, Vertica 9.1, released in April, 2018, allows you to operate your database in Eon Mode, which was previously released as beta […]

Understanding the Vertica Query Optimizer

This blog post was authored by Soniya Shah. The Vertica query optimizer uses statistics about the data to create a query plan, which contains a set of operations to compute the requested result. Depending on the properties of the projections defined in your database, the optimizer can choose faster and more efficient operations. This is […]

Using Vertica Machine Learning to Analyze Smart Meter Data

This blog post was authored by Soniya Shah. Machine learning and data science have the potential to transform businesses because of their ability to deliver non-obvious, valuable insights from massive amounts of data. However, many data scientist’s workflows are hindered by computational constraints, especially when working with very large data sets. While most real-world data […]

Handling Duplicate Records in Input Data Streams

This blog post was authored by Ravi Gupta. We have often found that sources or operational systems that provide data for further analysis have duplicate records and these are sent to a downstream application or EDW for processing. This post shows a few scenarios of how to handle these duplicate records using various SQL options, […]

Hierarchical Partitioning

This blog post was authored by Shrirang Kamat. This document demonstrates how you can use hierarchical partitions, introduced in Vertica 9.0, to solve a few example use cases. For more information, see Hierarchical Partitioning. Partitioning a table helps with managing the information life cycle. Since data for partitions is segregated into different ROS containers, predicates […]

Resource Management

This blog post was authored by Soniya Shah. A Vertica database runs on a cluster of hardware. All loads and queries running against the database take up system resources, such as CPU, memory, disk I/O, bandwidth, file handles, and more. Query performance depends on how many resources are allocated to it. In a single-user environment, […]

Ntile, and why you should love it

This blog post was authored by Curtis Bennett. According to a quick internet search, the hottest recorded temperature in Los Angeles was 109 degrees Fahrenheit, recorded on July 8th, 2017. If you had access to all kinds of weather data (spoiler alert: you do!) you could query the data and figure out the highs and […]

Dynamic Row and Column Access Policies

This blog post was authored by Serge Bonte. Vertica’s row and column access policies can be used to provide extra security on data in your tables. These policies are well covered in Best Practices for Creating Access Policies in Vertica and Dynamic Row and Column Access Policies. In this blog, we will explore how dynamic […]

Improving Performance and Memory Acquisitions for Vertica Queries

This blog post was authored by Shrirang Kamat. The following design considerations will help you improve the performance and memory of your Vertica queries. When creating table definitions, you should carefully choose the size of the lookup column based on your data. Properly sizing your column based on your data will help to improve performance. […]

Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real-Time – Part 6

This blog post was authored by Mark Whalley. I have discussed in previous blog posts the continuous stream of messages from aircraft transponders, captured and decoded using the DUMP1090 application, which we are planning on feeding into a series of Kafka topics, prior to loading into their corresponding tables in a Vertica database. This blog […]

Projection Delete Concerns

This blog post was authored by Curtis Bennett. Deletes in Vertica are a complicated topic. I’ve had many people say to me that they’d heard that one should never do deletes in Vertica, or that deletes in Vertica are slow. Nothing could be further from the truth. But like anything in a database, they should […]

Monitoring Login Failures

This blog post was authored by Curtis Bennett. Security is an ever-present topic on the minds of database administrators. Certainly Vertica has a number of security features like Kerberos, and SSL. This blog isn’t about those! Instead, let’s take a look at a much more simplistic aspect of security which Vertica tracks – login failures. […]

Blog Post Series: Using Vertica to Track Commercial Aircraft in near Real-Time

This blog post was authored by Mark Whalley. As discussed in an earlier blog post, we have a continuous stream of messages from aircraft transponders captured and decoded using the DUMP1090 application. We now want to load this data into a series of Vertica tables. As with the streaming nature of the incoming data, with […]

Operational Considerations for Database Administrators in Eon Mode

This blog post was authored by Shrirang Kamat. The purpose of this document is to familiarize advanced Vertica Enterprise mode users about some of the behavioral similarities and differences between Vertica Enterprise mode and Vertica Eon mode. This document assumes you have a basic understanding about Eon mode architecture. You can find more details about […]

Vertica Eon Mode: Caching

This blog post was co-authored by Pratibha Rana and Ben Vandiver. Running every query directly against the data in shared storage would result in poor performance and subject the shared storage to heavy load. Eon mode in Vertica introduces a cache called the Depot that avoids reading from the shared storage for frequently used data […]