Using COPY FILLER to Handle Empty String NULL Values Enclosed in Control Characters

Posted March 18, 2019 by Jim Knicely, Big Data Solutions Architect, Vertica

High angle view of Beijing Guomao.
You can load data files into a Vertica table using the COPY command. Your data can be delimited and separated by control characters. Unfortunately if you also enclose NULL values as an empty string, you might run into a data cast issue.

Example: dbadmin=> \d test1 List of Fields by Tables Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key --------+-------+--------+-----------+------+---------+----------+-------------+------------- public | test1 | c1 | int | 8 | | f | f | public | test1 | c2 | timestamp | 8 | | f | f | public | test1 | c3 | int | 8 | | f | f | (3 rows) dbadmin=> \! cat -v /home/dbadmin/test1.csv ^V1^V^G^V2019-03-13 18:55:58.073597^V^G^V1^V ^V2^V^G^V2019-03-13 18:55:58.073597^V^G^V2^V ^V3^V^G^V^V^G^V3^V ^V4^V^G^V2019-03-13 18:55:58.073597^V^G^V4^V dbadmin=> COPY test1 FROM '/home/dbadmin/test1.csv' DELIMITER e'\x07' ENCLOSED e'\x16' NULL AS '' DIRECT REJECTED DATA TABLE test1_bad; Rows Loaded ------------- 3 (1 row) dbadmin=> SELECT * FROM test1; c1 | c2 | c3 ----+----------------------------+---- 1 | 2019-03-13 18:55:58.073597 | 1 2 | 2019-03-13 18:55:58.073597 | 2 4 | 2019-03-13 18:55:58.073597 | 4 (3 rows) dbadmin=> SELECT rejected_reason FROM test1_bad; rejected_reason -------------------------------------------------------------------------------------- Invalid timestamp format '' for column 2 (c2).Invalid input syntax for timestamp: "" (1 row) One work around is to use a COPY FILLER. dbadmin=> TRUNCATE TABLE test1; TRUNCATE TABLE dbadmin=> COPY test1 (c1, c2_f FILLER VARCHAR, c2 AS c2_f::TIMESTAMP, c3) FROM '/home/dbadmin/test1.csv' DELIMITER e'\x07' ENCLOSED e'\x16' NULL AS '' DIRECT REJECTED DATA TABLE test1_bad; Rows Loaded ------------- 4 (1 row) dbadmin=> SELECT * FROM test1; c1 | c2 | c3 ----+----------------------------+---- 1 | 2019-03-13 18:55:58.073597 | 1 2 | 2019-03-13 18:55:58.073597 | 2 3 | | 3 4 | 2019-03-13 18:55:58.073597 | 4 (4 rows) Helpful Links:

https://www.vertica.com/docs/latest/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/BulkLoadingData.htm https://www.vertica.com/docs/latest/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/ExtendedStringLiterals.htm

Have fun!