Vertica 8.1 introduced new functionality for the MERGE statement. In this post, we discuss new functionality for MERGE that allows users to filter conditions on INSERT and UPDATE clauses in a MERGE statement.
The MERGE operation allows users to join the target table on another table, a view, or a subquery result set. The new syntax allows users to specify an UPDATE filter and an INSERT filter as follow. The MERGE will INSERT new rows, if an ID does not match, and the INSERT filter conditions are met or UPDATE rows if there is an ID match, the UPDATE filter conditions are met, and at least one column value is different. The syntax is efficient because there is better control of when you can update or insert.
The goal of this new syntax is to limit the number of rows affected by the MERGE statement on a target table. For example, if you are merging source table rows that differ from the matching destination table rows, only the rows with a change are updated. This saves time because identical rows are never updated.
The new syntax is depicted as follows:
MERGE [ /*+ [, ] */ ]
INTO [[.].] [ [AS] ]
You might wonder when to use MERGE with filters and when to use MERGE with subqueries. Often, it depends on what you are looking to merge. To filter rows from a table using just its attributes, use MERGE with subqueries. Both options work in every scenario when you must filter a predicate from the table, such as merging all records after a certain timestamp.
Use this new syntax on wide tables or on tables with many projections. In both cases, you might want to skip identical rows, and use this syntax to save time and resources.
You should NOT use the new MERGE syntax in place of the optimized MERGE statement.
For more information, see Update and Insert Filters in the Vertica documentation and check out our previous post about how to use MERGE to demonstrate Vertica’s performance at scale here.
Currently, a first year law student with a background in science and technology. Experienced technical writer, with specializations in software documentation, big data, blog development, and website development. I build user-centered content to communicate complex and technical information more easily.
I used to work for Vertica full time for about 3 years. I still work at Vertica part time while going to law school.
Update: Soniya is now doing her law internship, and no longer working at Vertica. Good luck, Soniya!
This blog post was authored by Curtis Bennett. Vertica stores physical data for tables in objects known as projections. Unlike traditional RDBMS's, Vertica does not rely on indexes for performance. Instead, Vertica stores the physical data (either all or some of the columns) in whatever sort order is required for optimal query processing. This can...
This blog post was co-authored by Xiao Ling and Jim Kelley. Introduction When resource pools were first introduced to Vertica, the average computer had a lot less memory than it does today. The default memory size for the Tuple Mover resource pool, 200 MB, reflects the more limited resources of that period. As hardware and...
This blog post was authored by Curtis Bennett. In Vertica, projections can either be replicated (unsegmented), or segmented. A segmented projection divides the data up across all the nodes in your cluster. Segmentation works by hashing a key value, and then using some simple math, figuring out which node that piece of data will live...
This blog post was authored by Eugenia Moreno. It is common to create new projections to improve performance in Vertica. However, you might forget about the old projections. Vertica still loads data in projections that you might not be using. A projection that is loaded but not picked up by the Vertica optimizer consumes storage...
You may not use more than 1TB (including Parquet and ORC External Tables) and 3 nodes.
You may not use software to provide services to third parties.
You may not distribute, resell, share or sublicense software to third parties.
You may not download and use patches, enhancements, bug fixes, or similar updates unless you have a license to the underlying software. Community Edition license does not give you a right to receive such updates.
You may not copy the Software or make it available on a public or external distributed network.
You may copy the Software for archival purposes or when it is an essential step in authorized use so long as You retain any product identification, trademark, copyright or other notices in the Software.
You may not modify, reverse engineer, disassemble, decrypt, decompile or make derivative works of the Software. If you have a right to do so under law, you must first inform Microfocus in writing about such modifications.
You may not disclose to any third-party performance information or analysis (including, without limitation, benchmarks and performance tests) from any source relating to the Software.
Support is not included. Additional information regarding the software may be available from the Vertica Community at https://forum.vertica.com. Vertica has no obligation to provide You with any bug fixes, upgrades, patches, new versions, new releases, or support.
You consent to the collection of anonymous analytics.
What are we collecting? We collect anonymized data including the date and timestamp, number of nodes, data size, storage size, version #, OS and other data. We’re collecting this information to learn how we can make the product better for you in the future.
How is this information stored and processed? We store and process all information on our secure servers. No third-party has access to any of this data.
What about my personal information? We do not track or store any personal data. All data collected is anonymized, including your IP address.