Vertica Quick Tip: Getting Better Performance via External Table Row Counts

Posted May 21, 2018 by Soniya Shah, Information Developer

This blog post was authored by Jim Knicely. The Vertica optimizer uses its own internal logic to determine whether to join one table to another as an inner or outer input. When joining external tables, you can help out the optimizer immensely by letting it know the exact row counts of those external tables! For this purpose, Vertica provides the ANALYZE_EXTERNAL_ROW_COUNT function which calculates the exact number of rows in an external table.
Example:
dbadmin=> CREATE EXTERNAL TABLE small_table_ext (c1 int, c2 VARCHAR(100)) AS COPY FROM '/home/dbadmin/small_table.txt';
CREATE TABLE

dbadmin=> SELECT COUNT(*) FROM small_table_ext;
COUNT
-------
10
(1 row)

dbadmin=> CREATE EXTERNAL TABLE big_table_ext (c1 int, c2 VARCHAR(100)) AS COPY FROM '/home/dbadmin/big_table.txt';
CREATE TABLE

dbadmin=> SELECT COUNT(*) FROM big_table_ext;
COUNT
----------
10000000
(1 row)

dbadmin=> \timing on
Timing is on.

dbadmin=> SELECT COUNT(*) FROM big_table_ext JOIN small_table_ext USING (c1);
COUNT
-------
10
(1 row)

Time: First fetch (1 row): 9152.170 ms. All rows formatted: 9152.234 ms

dbadmin=> SELECT analyze_external_row_count('small_table_ext');
analyze_external_row_count
----------------------------
0
(1 row)

Time: First fetch (1 row): 20.781 ms. All rows formatted: 20.824 ms

dbadmin=> SELECT analyze_external_row_count('big_table_ext');
analyze_external_row_count
----------------------------
0
(1 row)

Time: First fetch (1 row): 1499.519 ms. All rows formatted: 1499.563 ms

dbadmin=> SELECT COUNT(*) FROM big_table_ext JOIN small_table_ext USING (c1);
COUNT
-------
10
(1 row)

Time: First fetch (1 row): 1538.365 ms. All rows formatted: 1538.413 ms
In the example, we dropped the query run time from 9.2 seconds down to 1.5 seconds by simply letting Vertica know the external table row counts! Have Fun!