Vertica Quick Tip: Avoid Using Functions on Very Large Data Sets

Posted January 17, 2018 by Soniya Shah, Information Developer

This blog post was authored by Jim Knicely.

You can store billions and billions and billions (i.e. a lot) of records in your Vertica tables. When querying these large data sets, try to avoid using database functions like TO_DATE, TO_CHAR, NVL, etc. when unnecessary.

Example:

A table named BIG_DATE_TABLE has 1 billion rows and a column named THE_DATE, which stores a date value as an integer in the format YYYYMMDD. dbadmin=> SELECT COUNT(*) FROM big_date_table; COUNT ------------ 1000000000 (1 row) I want to know how many records there are in the table where THE_DATE column is equal to the current SYSDATE.

My first attempt at getting that answer is the following SQL statement: dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE TO_DATE(the_date::VARCHAR, 'YYYYMMDD') = TRUNC(SYSDATE); COUNT -------- 134976 (1 row) Time: First fetch (1 row): 78354.856 ms. All rows formatted: 78354.900 ms That query took a whopping 80 seconds to run!

Fortunately I realized a better solution is to avoid using the TO_DATE function directly against the billion records! dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE the_date = TO_CHAR(SYSDATE, 'YYYYMMDD')::INT; COUNT -------- 134976 (1 row) Time: First fetch (1 row): 782.399 ms. All rows formatted: 782.440 ms That’s better! Have Fun!