Loading Delimited Data

When bulk-loading data using the default (DELIMITED) parser, you can specify the following parameters at either a statement or column level:

  • ENCLOSED BY
  • ESCAPE AS
  • NULL
  • DELIMITER

Vertica assumes that data is in the UTF-8 encoding.

Loading Special Characters As Literals

The default COPY statement escape key is a backslash (\). By preceding any special character with an escape character, COPY interprets the character that follows literally, and copies it into the database. These are the special characters that you escape to load them as literals:

Special Character COPY Statement Usage

Vertical bar (|)

Default COPY ... DELIMITER character

Empty string ('')

Default COPY ... NULL string.

Backslash (\)

Default COPY ... ESC character.

Newline and other control characters

Various

To use a special character as a literal, prefix it with an escape character. For example, to include a literal backslash (\) in the loaded data (such as when including a file path), use two backslashes (\\). COPY removes the escape character from the input when it loads escaped characters.

Using a Custom Column Separator (DELIMITER)

The default COPY delimiter is a vertical bar (|). The DELIMITER is a single ASCII character used to separate columns within each record of a file. Between two delimiters, COPY interprets all string data in load files as characters. Do not enclose character strings in quotes, since quote characters are also treated as literals between delimiters.

You can define a different delimiter using any ASCII value in the range E'\000' to E'\177' inclusive. For instance, if you are loading CSV data files, and the files use a comma (,) character as a delimiter, you can change the default delimiter to a comma. You cannot use the same character for both the DELIMITER and NULL options.

If the delimiter character is among a string of data values, use the ESCAPE AS character (\ by default) to indicate that the delimiter should be treated as a literal.

The COPY statement accepts empty values (two consecutive delimiters) as valid input data for CHAR and VARCHAR data types. COPY stores empty columns as an empty string (''). An empty string is not equivalent to a NULL string.

To indicate a non-printing delimiter character (such as a tab), specify the character in extended string syntax (E'...'). If your database has StandardConformingStrings enabled, use a Unicode string literal (U&'...'). For example, use either E'\t' or U&'\0009' to specify tab as the delimiter.

Using a Custom Column Option DELIMITER

This example, redefines the default delimiter through the COLUMN OPTION parameter.

  1. Create a simple table.

    => CREATE TABLE t(     pk INT, 
         col1 VARCHAR(10), 
         col2 VARCHAR(10), 
         col3 VARCHAR(10), 
         col4 TIMESTAMP);
    
  2. Use the COLUMN OPTION parameter to change the col1 default delimiter to a tilde (~).

    => COPY t COLUMN OPTION(col1 DELIMITER '~') FROM STDIN NO COMMIT;
    >> 1|ee~gg|yy|1999-12-12
    >> \. 
    => SELECT * FROM t;
     pk | col1 | col2 | col3 |        col4
    ----+------+------+------+---------------------
      1 | ee   | gg   | yy   | 1999-12-12 00:00:00
    (1 row)
    

Defining a Null Value (NULL)

The default NULL value for COPY is an empty string (''). You can specify a NULL as any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NUL: E'\000'). You cannot use the same character for both the DELIMITER and NULL options.

When NULL is an empty string (''), use quotes to insert an empty string instead of a NULL. For example, using NULL " ENCLOSED BY '"':

  • 1||3 — Inserts a NULL in the second column.
  • 1|""|3 — Inserts an empty string instead of a NULL in the second columns.

To input an empty or literal string, use quotes (ENCLOSED BY); for example:

NULL ''NULL 'literal'

A NULL is case-insensitive and must be the only value between the data field delimiters. For example, if the null string is NULL and the delimiter is the default vertical bar (|):

|NULL| indicates a null value.

| NULL | does not indicate a null value.

When you use the COPY command in a script, you must substitute a double-backslash for each null string that includes a backslash. For example, the scripts used to load the example database contain:

COPY ... NULL E'\\n' ...

Loading NULL Values

You can specify NULL by entering fields without content into a data file, using a field delimiter.

For example, given the default delimiter (|) and default NULL (empty string) definition, COPY inserts the following input data:

|  |  1|  2  | 3
4  |  | 5
6  |  | 

into the table as follows:

(null, null, 1)(null, 2, 3)
(4, null, 5)
(6, null, null)

If NULL is set as a literal ('null'), COPY inserts the following inputs:

null | null |    1null |    2 |    3
   4 | null |    5
   6 | null | null

as follows:

(null, null, 1)(null, 2, 3)
(4, null, 5)
(6, null, null)

Filling Columns with Trailing Nulls (TRAILING NULLCOLS)

Loading data using the TRAILING NULLCOLS option inserts NULL values into any columns without data. Before inserting TRAILING NULLCOLS, Vertica verifies that the column does not have a NOT NULL constraint.

To use the TRAILING NULLCOLS parameter to handle inserts with fewer values than data columns:

  1. Create a table:

    => CREATE TABLE z (a INT, 
         b INT, 
         c INT );
    
  2. Insert some values into the table:

    => INSERT INTO z VALUES (1, 2, 3);
  3. Query table z to see the inputs:

    => SELECT * FROM z;
     a | b | c
    ---+---+---
     1 | 2 | 3
    (1 row)
    
  4. Insert two rows of data from STDIN, using TRAILING NULLCOLS:

    => COPY z FROM STDIN TRAILING NULLCOLS;
    >> 4 | 5 | 6
    >> 7 | 8
    >> \. 
    
  5. Query table z again to see the results. Using TRAILING NULLCOLS, the COPY statement correctly handled the third row of column c, which had no value:

    => SELECT * FROM z;
     a | b | c
    ---+---+---
     1 | 2 | 3
     4 | 5 | 6
     7 | 8 |
    (3 rows)
    

Attempting to Fill a NOT NULL Column with TRAILING NULLCOLS

You cannot use TRAILING NULLCOLS on a column that has a NOT NULL constraint. For instance:

  1. Create a table n, declaring column b with a NOT NULL constraint:

    => CREATE TABLE n ( a INT, 
              b INT NOT NULL, 
              c INT );
    
  2. Insert some table values:

    => INSERT INTO n VALUES (1, 2, 3);
    => SELECT * FROM n;
     a | b | c
    ---+---+---
     1 | 2 | 3
    (1 row)
    
  3. Use COPY with TRAILING NULLCOLS on table n to see the COPY error due to the column constraint:

    => COPY n FROM STDIN trailing nullcols abort on error;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 4 | 5 | 6
    >> 7 | 8
    >> 9
    >> \. 
    ERROR:  COPY: Input record 3 has been rejected (Cannot set trailing column to NULL as column 2 (b) is NOT NULL)
    
  4. Query the table to see that the COPY statement values were rejected:

    => SELECT * FROM n;
     a | b | c
    ---+---+---
     1 | 2 | 3
    (1 row)
    

Changing the Default Escape Character (ESCAPE AS)

The default escape character is a backslash (\). To change the default to a different character, use the ESCAPE AS option. To use an alternative escape character:

=> COPY mytable FROM '/data/input.txt' ESCAPE AS E('\001');

You can set the escape character to be any ASCII value value in the range E'\001' to E'\177' inclusive.

Eliminating Escape Character Handling

If you do not want any escape character and want to prevent any characters from being interpreted as escape sequences, use the NO ESCAPE option as part of the COPY statement.

Delimiting Characters (ENCLOSED BY)

The COPY ENCLOSED BY parameter lets you set an ASCII character to delimit characters to embed in string values. You can use any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000') for the ENCLOSED BY value. Using double quotation marks (") is the most commonly used quotation character. For instance, the following parameter specifies that input data to the COPY statement is enclosed within double quotes:

ENCLOSED BY '"'

With the following input (using the default DELIMITER (|) character), specifying:

"vertica | value"

Results in:

  • Column 1 containing "vertica
  • Column 2 containing value"

Notice the double quotes (") before vertica and after value.

Using the following sample input data as follows, columns are distributed as shown:

"1", "vertica,value", ",", "'"
 col1 | col2          | col3 | col4------+---------------+------+-----
 1    | vertica,value | ,    | '
(1 row)

                                            

Alternatively, write the above example using any ASCII character of your choosing:

~1~, ~vertica,value~, ~,~, ~'~

If you use a single quote ('), rather than double quotes (") as the ENCLOSED BY character, you must escape it using extended string syntax, a Unicode literal string (if StandardConformingStrings is enabled), or by using four single quotes:

ENCLOSED BY E'\''ENCLOSED BY U&'\0027' 
ENCLOSED BY '''' 

Using any of the definitions means the following input is properly parsed:

'1', 'vertica,value', ',', '\''

See String Literals (Character) for an explanation of the string literal formats you can use to specify the ENCLOSED BY parameter.

Use the ESCAPE AS character to embed the ENCLOSED BY delimiter within character string values. For example, using the default ESCAPE AS character (\) and double quote as the ENCLOSED BY character, the following input returns "vertica":

"\"vertica\""

Using ENCLOSED BY for a Single Column

The following example uses double quotes to enclose a single column (rather than the entire row). The COPY statement also specifies a comma (,) as the delimiter.

=> COPY Retail.Dim (Dno, Dname ENCLOSED BY '"', Dstore) FROM '/home/dbadmin/dim3.txt' 
   DELIMITER ','
   EXCEPTIONS '/home/dbadmin/exp.txt';

This example correctly loads data such as:

123,"Smith, John",9832

Specifying a Custom End of Record String (RECORD TERMINATOR)

To specify the literal character string that indicates the end of a data file record, use the RECORD TERMINATOR parameter, followed by the string to use. If you do not specify a value, then Vertica attempts to determine the correct line ending, accepting either just a linefeed (E'\n') common on UNIX systems, or a carriage return and linefeed (E'\r\n') common on Windows platforms.

For example, if your file contains comma-separated values terminated by line feeds that you want to maintain, use the RECORD TERMINATOR option to specify an alternative value:

=>  COPY mytable FROM STDIN DELIMITER ',' RECORD TERMINATOR E'\n';

To specify the RECORD TERMINATOR as non-printing characters, use either the extended string syntax or Unicode string literals. The following table lists some common record terminator characters. See String Literals for an explanation of the literal string formats.

Extended String Syntax

Unicode Literal String


Description


ASCII Decimal

E'\b'
U&'\0008'
Backspace
8
E'\t'
U&'\0009'
Horizontal tab
9
E'\n'
U&'\000a'
Linefeed
10
E'\f'
U&'\000c'
Formfeed
12
E'\r'
U&'\000d'
Carriage return
13
E'\\'
U&'\005c'
Backslash
92

If you use the RECORD TERMINATOR option to specify a custom value, be sure the input file matches the value. Otherwise, you may get inconsistent data loads.

The record terminator cannot be the same as DELIMITER, NULL, ESCAPE,or ENCLOSED BY.

If using JDBC, Vertica recommends that you use the following value for the RECORD TERMINATOR:

System.getProperty("line.separator")

Examples

The following examples use a comma (,) as the DELIMITER for readability.

,1,2,3,,1,2,3
1,2,3,

Leading (,1) and trailing (3,) delimiters are ignored. Thus, the rows all have three columns.

123,'\\n',\\n,456

Using a non-default null string, the row is interpreted as:

123newline
\n 
456
123,this\, that\, or the other,something else,456

The row would be interpreted as:

123this, that, or the other
something else
456