Vertica Quick Tip: Keeping Track of Data Load File Sources

Posted March 6, 2018 by Soniya Shah, Information Developer

Modern Database Analytics
This blog post was authored by Jim Knicely. If you are like most companies you will be loading data into a Vertica table from many different data files. And if you are like me, you’ll forget later which file a table record arrived in. Fortunately the CURRENT_LOAD_SOURCE function returns the file name used when executing a COPY statement and I can use it to keep a record of the data file used. Example: dbadmin=> CREATE TABLE daily_load (id INT, some_data1 VARCHAR(100), some_data2 VARCHAR(100), file_loaded VARCHAR(200)); CREATE TABLE dbadmin=> COPY daily_load (id, some_data1, some_data2, file_loaded AS CURRENT_LOAD_SOURCE()) FROM '/home/dbadmin/test*_load.txt'; Rows Loaded ------------- 6 (1 row) dbadmin=> SELECT * FROM daily_load; id | some_data1 | some_data2 | file_loaded ----+------------+------------+------------------------------ 1 | TEST1 | TEST2 | /home/dbadmin/test1_load.txt 2 | TEST10 | TEST20 | /home/dbadmin/test1_load.txt 3 | TEST100 | TEST200 | /home/dbadmin/test1_load.txt 4 | TEST3 | TEST4 | /home/dbadmin/test2_load.txt 5 | TEST30 | TEST40 | /home/dbadmin/test2_load.txt 6 | TEST300 | TEST400 | /home/dbadmin/test2_load.txt (6 rows) Have Fun!