Loading Delimited Data

If you do not specify another parser, Vertica defaults to the DELIMITED parser. You can specify the delimiter, escape characters, how to handle null values, and other parameters in the COPY statement.

The following example shows the default behavior, in which the delimiter character is '|'

=> CREATE TABLE employees (id INT, name VARCHAR(50), department VARCHAR(50));
CREATE TABLE
				
=> COPY employees FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 42|Sheldon Cooper|Physics
>> 17|Howard Wolowitz|Astronomy
>> \.

=> SELECT * FROM employees;
 id |      name       |  department
----+-----------------+--------------
 17 | Howard Wolowitz | Astrophysics
 42 | Sheldon Cooper  | Physics
(2 rows)

By default, collection values are delimited by brackets and elements are delimited by commas. Collections must be one-dimensional arrays or sets of scalar types.

=> CREATE TABLE researchers (id INT, name VARCHAR, grants ARRAY[VARCHAR], values ARRAY[INT]);
CREATE TABLE

=> COPY researchers FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 42|Sheldon Cooper|[US-7376,DARPA-1567]|[65000,135000]
>> 17|Howard Wolowitz|[NASA-1683,NASA-7867,SPX-76]|[16700,85000,45000]
>> \.

=> SELECT * FROM researchers;
 id |      name       |               grants               |       values       
----+-----------------+------------------------------------+---------------------
 17 | Howard Wolowitz | ["NASA-1683","NASA-7867","SPX-76"] | [16700,85000,45000]
 42 | Sheldon Cooper  | ["US-7376","DARPA-1567"]           | [65000,135000]
(2 rows)		

To use a special character as a literal, prefix it with an escape character. For example, to include a literal backslash (\) in the loaded data (such as when including a file path), use two backslashes (\\). COPY removes the escape character from the input when it loads escaped characters.

When loading delimited data, two consecutive delimiters indicate a null value, unless the NULL parameter is set otherwise. The final delimiter is optional. For example, the following input is valid for the previous table:

=> COPY employees FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 45|Raj|
>> 21|Leonard||
>> \.

=> SELECT * FROM employees;
 id |  name   |  department
----+---------+--------------
 21 | Leonard | 
 42 | Raj     | 
(2 rows)

By default, if the data has too few values, the load fails. You can use the TRAILING NULLCOLS option to accept any number of missing columns and treat their values as null.

Vertica assumes that data is in the UTF-8 encoding.

The options specific to the DELIMITED parser and their default values are:

Option Default
DELIMITER |
ENCLOSED BY "
ESCAPE \
NULL '' (empty string)
COLLECTIONOPEN [
COLLECTIONCLOSE ]
COLLECTIONDELIMITER ,
COLLECTIONNULLELEMENT null
COLLECTIONENCLOSE " (double quote)
TRAILING NULLCOLS (none)

To load delimited data into a Flex table, use the FDELIMITEDPARSER parser.

Changing the Column Separator (DELIMITER)

The default COPY delimiter is a vertical bar ('|'). The DELIMITER is a single ASCII character used to separate columns within each record of an input source. Between two delimiters, COPY interprets all string data in the input as characters. Do not enclose character strings in quotes, because quote characters are also treated as literals between delimiters.

You can define a different delimiter using any ASCII value in the range E'\000' to E'\177' inclusive. For instance, if you are loading CSV data files, and the files use a comma (',') character as a delimiter, you can change the default delimiter to a comma. You cannot use the same character for both the DELIMITER and NULL options.

If the delimiter character is among a string of data values, use the ESCAPE AS character ('\' by default) to indicate that the delimiter should be treated as a literal.

The COPY statement accepts empty values (two consecutive delimiters) as valid input data for CHAR and VARCHAR data types. COPY stores empty columns as an empty string (''). An empty string is not equivalent to a NULL string.

To indicate a non-printing delimiter character (such as a tab), specify the character in extended string syntax (E'...'). If your database has StandardConformingStrings enabled, use a Unicode string literal (U&'...'). For example, use either E'\t' or U&'\0009' to specify tab as the delimiter.

The following example loads data from a comma-separated file:

=> COPY employees FROM ... DELIMITER ',';

In the following example, the first column has a column-specific delimiter:

=> COPY employees(id DELIMITER ':', name, department) FROM ... DELIMITER ',';

Changing Collection Delimiters (COLLECTIONDELIMITER, COLLECTIONOPEN, COLLECTIONCLOSE)

The DELIMITER option specifies the value that separates columns in the input. For a column with a collection type (ARRAY or SET), a delimiter is also needed between elements of the collection. In addition, the collection itself has start and end markers. By default, collections are enclosed in brackets and elements are delimited by commas, but you can change these values.

In the following example, collections are enclosed in braces and delimited by periods.

=> COPY researchers FROM STDIN COLLECTIONOPEN '{' COLLECTIONCLOSE '}' COLLECTIONDELIMITER '.';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 19|Leonard|{"us-1672"."darpa-1963"}|{16200.16700}
>> \.

=> SELECT * FROM researchers;
 id |      name       |               grants               |       values       
----+-----------------+------------------------------------+---------------------
 17 | Howard Wolowitz | ["NASA-1683","NASA-7867","SPX-76"] | [16700,85000,45000]
 42 | Sheldon Cooper  | ["US-7376","DARPA-1567"]           | [65000,135000]
 19 | Leonard         | ["us-1672","darpa-1963"]           | [16200,16700]
(3 rows)				

Changing the Character Enclosing Column or Collection Values (ENCLOSED BY, COLLECTIONENCLOSE)

The ENCLOSED BY parameter lets you set an ASCII character to delimit characters to embed in string values. The enclosing character is not considered to be part of the data if and only if it is the first and last character of the input. You can use any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000') for the ENCLOSED BY value. Using double quotation marks (") is common, as shown in the following example.

=> COPY employees FROM STDIN ENCLOSED BY '"';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 21|Leonard|Physics
>> 42|"Sheldon"|"Physics"
>> 17|Rajesh "Raj" K|Astronomy
>> \.

=> SELECT * FROM employees;
 id |     name       |  department
----+----------------+--------------
 17 | Rajesh "Raj" K | Astronomy
 21 | Leonard        | Physics
 42 | Sheldon        | Physics
(3 rows)

Notice that while ENCLOSED BY is a double quote, the embedded quotes in Rajesh's name are treated as part of the data because they are not the first and last characters in the column. The quotes that enclose "Sheldon" and "Physics" are dropped because of their positions.

Within a collection value, the COLLECTIONENCLOSE parameter is like ENCLOSED BY for individual elements of the collection.

Changing the Null Indicator (NULL)

By default, an empty string ('') for a column value means NULL. You can specify a different ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NUL: E'\000') as the NULL indicator. You cannot use the same character for both the DELIMITER and NULL options.

A column containing one or more whitespace characters is not NULL unless the sequence of whitespace exactly matches the NULL string.

A NULL is case-insensitive and must be the only value between the data field delimiters. For example, if the null string is NULL and the delimiter is the default vertical bar (|):

|NULL| indicates a null value.

| NULL | does not indicate a null value.

When you use the COPY statement in a script, you must substitute a double-backslash for each null string that includes a backslash. For example, the scripts used to load the example database contain:

COPY ... NULL E'\\n' ...

Changing the Null Indicator for Collection Values (COLLECTIONNULLELEMENT)

The NULL option specifies the value to be treated as null for a column value. For a column with a collection type (ARRAY or SET), a separate option specifies how to interpret null elements. By default, "null" indicates a null value. An empty value, meaning two consecutive element delimiters, does not indicate null:

=> COPY researchers FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 17|Howard|["nasa-143",,"nasa-6262"]|[10000,1650,15367]
>> 19|Leonard|["us-177",null,"us-6327"]|[16200,64000,26500]
>> \.
				
=> SELECT * FROM researchers;
 id |  name   |           grants            |       values
----+---------+-----------------------------+---------------------
 17 | Howard  | ["nasa-143","","nasa-6262"] | [10000,1650,15367]
 19 | Leonard | ["us-177",null,"us-6327"]   | [16200,64000,26500]
(2 rows)	

Use COLLECTIONNULLELEMENT to specify a different value, as in the following example.

=> COPY researchers from STDIN COLLECTIONNULLELEMENT 'x';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 42|Sheldon|[x,"us-1672"]|[x,165000]
>> \.
				
=> SELECT * FROM researchers;
 id |  name   |           grants            |       values
----+---------+-----------------------------+---------------------
 17 | Howard  | ["nasa-143","","nasa-6262"] | [10000,1650,15367]
 19 | Leonard | ["us-177",null,"us-6327"]   | [16200,64000,26500]
 42 | Sheldon | [null, "us-1672"]           | [null,165000]
(3 rows)

Filling Missing Columns (TRAILING NULLCOLS)

By default, COPY fails if the input does not contain enough columns. Use the TRAILING NULLCOLS option to instead insert NULL values for any columns that lack data. This option cannot be used with columns that have a NOT NULL constraint.

The following example demonstrates use of this option.

=> CREATE TABLE z (a INT, b INT, c INT );
				
--- insert with enough data:
=> INSERT INTO z VALUES (1, 2, 3);
				
=> SELECT * FROM z;
 a | b | c
---+---+---
 1 | 2 | 3
(1 row)

--- insert deficient data:
=> COPY z FROM STDIN TRAILING NULLCOLS;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 4 | 5 | 6
>> 7 | 8
>> \.

=> SELECT * FROM z;
 a | b | c
---+---+---
 1 | 2 | 3
 4 | 5 | 6
 7 | 8 |
(3 rows)	

Changing the Escape Character (ESCAPE AS, NO ESCAPE)

You can specify an escape character, which enables any special characters to be treated as part of the data. For example, if an element from a CSV file should contain a comma, you can indicate that by pre-pending the escape character to the comma in the data. The default escape character is a backslash (\).

To change the default to a different character, use the ESCAPE AS option. You can set the escape character to be any ASCII value in the range E'\001' to E'\177' inclusive.

If you do not want any escape character and want to prevent any characters from being interpreted as escape sequences, use the NO ESCAPE option.

ESCAPE AS and NO ESCAPE can be set at both the column and global levels.

Changing the End-of-Line Character (RECORD TERMINATOR)

To specify the literal character string that indicates the end of a data file record, use the RECORD TERMINATOR parameter, followed by the string to use. If you do not specify a value, then Vertica attempts to determine the correct line ending, accepting either just a linefeed (E'\n') common on UNIX systems, or a carriage return and linefeed (E'\r\n') common on Windows platforms.

For example, if your file contains comma-separated values terminated by line feeds that you want to maintain, use the RECORD TERMINATOR option to specify an alternative value:

=>  COPY mytable FROM STDIN DELIMITER ',' RECORD TERMINATOR E'\n';

To specify the RECORD TERMINATOR as non-printing characters, use either the extended string syntax or Unicode string literals. The following table lists some common record terminator characters. See String Literals for an explanation of the literal string formats.

Extended String Syntax

Unicode Literal String


Description


ASCII Decimal

E'\b'
U&'\0008'
Backspace
8
E'\t'
U&'\0009'
Horizontal tab
9
E'\n'
U&'\000a'
Linefeed
10
E'\f'
U&'\000c'
Formfeed
12
E'\r'
U&'\000d'
Carriage return
13
E'\\'
U&'\005c'
Backslash
92

If you use the RECORD TERMINATOR option to specify a custom value, be sure the input file matches the value. Otherwise, you may get inconsistent data loads.

The record terminator cannot be the same as DELIMITER, NULL, ESCAPE,or ENCLOSED BY.

If using JDBC, Vertica recommends that you use the following value for the RECORD TERMINATOR:

System.getProperty("line.separator")