Best Practices

Single Sign-On for Support: New Login for Passport Users

As part of the Micro Focus ongoing IT transformation, we have consolidated single sign-on systems to improve your experience of interacting with Micro Focus. All Passport users need to create a new account. To maintain your current access rights, you need to register with the same e-mail address you used with passport. If you use […]

Data Day Texas: Keep Your Architecture Open and Avoid Mindset Lock-in

Data Day Texas is an event in Austin that was started about nine years ago by an old acquaintance of mine, Lynn Bender, who founded Global DataGeeks. The one big theme that struck me as running through the whole conference was the highly cooperative landscape that has developed between proprietary and open source software, and […]

Finding All Columns Storing Some Value: Quick Tip

A long time ago a database developer asked me how she could produce a list of all table columns in the database which store a given value. Note that she was in the process of normalizing tables and wanted to find every fact table that referenced a particular string value to replace it with a […]

Viewing External Table Definitions: Quick Tip

An alternative to importing data into Vertica is to query it in place. For that we use external tables. Once an external table is created you can view its definition via the TABLES system table. Example: dbadmin=> \! cat /home/dbadmin/ext.txt 1|TEST1 2|TEST2 dbadmin=> CREATE EXTERNAL TABLE ext (c1 INT, c2 VARCHAR(10)) AS COPY FROM ‘/home/dbadmin/ext.txt’; […]

Aggregate Projections

This blog post was authored by Curtis Bennett. Vertica stores physical data for tables in objects known as projections. Unlike traditional RDBMS’s, Vertica does not rely on indexes for performance. Instead, Vertica stores the physical data (either all or some of the columns) in whatever sort order is required for optimal query processing. This can […]

Improve the Efficiency of Mergeout on Wide Tables

This blog post was co-authored by Xiao Ling and Jim Kelley. Introduction When resource pools were first introduced to Vertica, the average computer had a lot less memory than it does today. The default memory size for the Tuple Mover resource pool, 200 MB, reflects the more limited resources of that period. As hardware and […]

Identifying Projection Skew

This blog post was authored by Curtis Bennett. In Vertica, projections can either be replicated (unsegmented), or segmented. A segmented projection divides the data up across all the nodes in your cluster. Segmentation works by hashing a key value, and then using some simple math, figuring out which node that piece of data will live […]

What Projections are not Being Used

This blog post was authored by Eugenia Moreno. It is common to create new projections to improve performance in Vertica. However, you might forget about the old projections. Vertica still loads data in projections that you might not be using. A projection that is loaded but not picked up by the Vertica optimizer consumes storage […]

Beware of Segmentation Islands

This blog post was authored by Curtis Bennett. Many clients who are new to Vertica are also new to big data. While Vertica’s reliance on industry-standard SQL can make the transition very easy, often the introduction of multiple nodes used in support of a database platform can take some getting used to. It is the […]

Authentication Methods for dbadmin

This blog post was authored by Sumeet Keswani. In Vertica, when you create a new database, there are no configured authentication methods. In this case, Vertica assumes that all users, including the dbadmin, have an implicit password authentication. Users can use this authentication method both for authenticating over a network interface and for over a […]

Phrase Search with Vertica Text Search

This blog post was authored by Serge Bonte. Vertica Text Search Vertica already provides Text Search. Text Search allows you to quickly search the contents of a single CHAR, VARCHAR, LONG VARCHAR, VARBINARY, or LONG VARBINARY field within a table to locate a specific token. Vertica implements that capability using a dedicated Text Index to […]

CPU and Memory Starvation in SPREAD

This blog post was authored by Sumeet Keswani. What is Spread? Vertica uses an open source toolkit, Spread, to provide a high-performance control message service. Spread daemons start automatically when your database starts up for the first time. The spread daemons run on control nodes in your cluster. The control nodes manage message communication. On […]

Adding Nodes to Fault Groups

This blog post was authored by Sarah Lemaire. Suppose you are adding new cluster nodes to your Vertica database. You want to add those nodes to particular fault groups without having to restart your Vertica database. The following steps use the example of a database with five racks and fault groups, with 9 Vertica nodes […]

Building a Secure Vertica Environment

This blog post was authored by Soniya Shah. Vertica has a client-server architecture system, where applications that reside on the client access the Vertica cluster through drivers including ODBC, JDBC, OLEDB and ADO.NET. This post discusses secure client to server communications, authenticating access to Vertica, and administrator access. Method Vertica Options Authentication: Validate user credentials […]

MERGE Statement with Filters

This blog post was authored by Soniya Shah. Vertica 8.1 introduced new functionality for the MERGE statement. In this post, we discuss new functionality for MERGE that allows users to filter conditions on INSERT and UPDATE clauses in a MERGE statement. The MERGE operation allows users to join the target table on another table, a […]

Configuring tcp Idle Settings for Long Running Idle Sessions

This blog post was authored by Soniya Shah. Important: For all recommendations to changing setting values, you must change the settings on all nodes in the cluster. It is not advisable to have different settings on different nodes. Have you ever encountered one of the following types of errors? ==> VSQL vsql => select sleep(3600); […]

Understanding Backup Space Utilization

This blog post was authored by Soniya Shah. Creating regular database backups is an important part of database maintenance. The vbr utility lets you back up, restore, and copy your database to another cluster. You can create full and incremental backups, and even back up objects, such as tables. Ideally, backups should match what is […]

Concurrency and Workload Management

This blog post was authored by Soniya Shah. Vertica workloads range from simple primary key lookups to analytical queries that include several large tables and joins. Different types of load jobs must keep the data updated. Vertica has a mixed-workload management capability that is easy to use. Vertica can process queries both concurrently and in […]

How to Set Vertica in Read-Only

This blog post was authored by Soniya Shah. You probably know that you can create READ ONLY users in Vertica. These users can view everything within a schema, but don’t have the proper permissions to change anything within the database. This is useful for sets of users that don’t need as many permissions or for […]

Getting Rid of Range Joins

This blog post was authored by Soniya Shah. You can use range joins to categorize data into buckets. Vertica provides performance optimizations for =, and BETWEEN predicates. These optimizations are particularly useful when a column from one table is restricted to be in a range specified by two columns of another table. Range joins can […]

Query Optimization Using Projections

In Vertica, tables are logical representations of the data. Vertica stores the actual data in projections. When data is loaded into a Vertica table, Vertica creates or updates a column-store projection. Vertica also compresses and/or encodes projection data, optimizing data access and storage. If you experience performance issues, your best first step is to run […]

Filtering Data While Loading into Vertica

Suppose you have a CSV file and you want to copy some, but not all, of its contents into a Vertica table. There are two ways you can to do this: • Use the SKIP keyword with COPY. • Use the head or tail Linux command. Let’s see how this works. The Data Here’s a […]

Best Practices for Using LDAP Link with Vertica

There are a few best practices that you should follow to make sure that you don’t accidentally lose any users or data. This blog explains how to keep your LDAP Link service working smoothly.

Troubleshooting Vertica Query Performance with System Tables

Do you want to learn how to troubleshoot your query performance issues?  We’ve got you covered. Just attend the Query Performance Tuning and Troubleshooting Issues session at Vertica’s Big Data Conference.

Do You Need to Put Your Query on a Budget?

Before we scare you away with the word “budget”, rest assured that after reading this blog, you won?t have to give up your favorite activities or sell your car. What you will be able to do is understand how Vertica resource pool parameters affect query budget.

Restoring Objects from a Full Backup

Before Vertica version 7.2, you had two options: perform full backups and restore full backups (remove everything from the storage unit) or perform object-level backups and restore those individually (maintain separate storage units). But as of Vertica 7.2, you can have the best of both worlds. You can perform full backups and restore individual objects from that backup! This means you don?t have to waste time and resources on restoring data that you do not need at the moment.

Best Practices for Refreshing Large Projections

More than once I have worked with customers who need to update a superprojection or create a new projection for a large fact table. It seems like a simple and easy process: just create a projection and perform a refresh. However, refreshing projections for large fact tables can produce unwanted complications. In this blog, we?ll discuss these complications and how they can be remedied.

Optimizing for Merge Join

In an earlier post, join operations were introduced followed by hash join operations. The other operator, merge join, may sometimes be needed in situations when a spill to disk occurs. In these situations, resources are being wasted. One approach may be optimizing for merge join. In this post, a query was optimized and tested for merge join using subqueries and specific projections.

Workload Management Metrics ? A Golden Triangle

Modern databases are often required to process many different kinds of workloads, ranging from short/tactical queries, to medium complexity ad-hoc queries, to long-running batch ETL jobs to extremely complex data mining jobs (See my previous blog on workload classification for more information.) DBAs must ensure that all concurrent workload, along with their respective Service Level Agreements (SLAs), can co-exist well with each other while maximizing a system?s overall performance.

Optimizing Big Data Storage in Vertica

With the explosion of data volumes all enterprises are capturing, new technological solutions, such as Vertica, offer a solution to non-expert users who need to analyze and monetize their Big Data. If you are a non-expert user, the Database Designer (DBD) module in Vertica can help you choose a physical database design that minimizes storage footprint while optimizing the performance of the input query workload. The DBD can recommend good physical designs as quickly as possible using minimal computing resources.