Capturing Load Rejections and Exceptions
Loading data with COPY
has two main phases, parsing and loading. Rejected data is created whenever COPY
cannot parse a row of data. Following are some parser errors that can cause a rejected row:
- Unsupported parser options
- Incorrect data types for the table into which data is being loaded
- Malformed context for the parser in use
- Missing delimiters
Other problems can occur during the load phase, but such problems are not rejected data from parser errors.
Several optional parameters let you determine how strictly COPY
handles rejections when parsing data. For example, you can have COPY
fail when it rejects a single row, or allow a specific number of parsing rejections before the load fails. This section presents the parameters to determine how COPY
handles rejected data.
Save Rejected Rows (REJECTED DATA and EXCEPTIONS)
The COPY
statement automatically saves a copy of each rejected row in a rejected-data file. COPY
also saves a corresponding explanation of what caused the rejection in an exceptions file. By default, Vertica saves both files in a database catalog subdirectory, called CopyErrorLogs
, as listed in this example:
v_mart_node003_catalog\CopyErrorLogs\trans-STDIN-copy-from-rejected-data.1
v_mart_node003_catalog\CopyErrorLogs\trans-STDIN-copy-from-exceptions.1
You can optionally save COPY
rejections and exceptions in two other ways:
- Use the
REJECTED DATA reject_file
andEXCEPTIONS except_file
parameters to save both files to a location of your choice. The reject_file includes rejected rows, while the exceptions file contains a description of why each row was rejected. - Use the
REJECTED DATA AS TABLE reject_table
clause. Saving rejected rows to a reject_table also retains the exception descriptions.
Note: Vertica recommends saving rejected data to a table. However, saving to a table excludes saving to a default or specific rejected data file.
If you save COPY
rejected data to a table, the table files are stored in the data subdirectory. For example, in a VMart database
installation, rejected data table records are stored in the RejectionTableData
directory as follows:
=> cd v_mart_node003_data\RejectionTableData\ => ls TABLE_REJECTED_RECORDS_"bg"_mytest01.example.-25441:0x6361_45035996273805099_1.1 TABLE_REJECTED_RECORDS_"bg"_mytest01.example.-25441:0x6361_45035996273805113_2.2 . . . TABLE_REJECTED_RECORDS_"delimr"_mytest01.example.-5958:0x3d47_45035996273815749_1.1 TABLE_REJECTED_RECORDS_"delimr"_mytest01.example.-5958:0x3d47_45035996273815749_1.2
COPY LOCAL Rejected Data
For COPY LOCAL
operations, you must use the REJECTED DATA reject_file
and EXCEPTIONS except_file
parameters explicitly. The reject_file and except_file paths must reside on the client. If path resolves to a storage location and the user invoking COPY
is not a superuser, the following permissions are required:
- The storage location must have been created with the
USER
usage type (seeCREATE LOCATION). - The user must already have been granted access to the storage location where the files exist, as described in GRANT (Storage Location)
Enforce Truncating or Rejecting Rows (ENFORCELENGTH)
When parsing data of type CHAR
, VARCHAR
, BINARY
, or VARBINARY
, rows may exceed the target table length. By default, COPY
truncates such rows without rejecting them.
Use the ENFORCELENGTH
parameter to reject rows that exceed the target table.
For example, loading 'abc'
into a table column specified as VARCHAR(2)
results in COPY
truncating the value to 'ab'
and loading it. Loading the same row with the ENFORCELENGTH
parameter causes COPY
to reject the row.
Note: Vertica supports NATIVE
and NATIVE VARCHAR
values up to 65K. If any value exceeds this limit, COPY
rejects the row, even when ENFORCELENGTH
is not in use.
Specify a Maximum Number of Rejections (REJECTMAX)
The REJECTMAX
parameter specifies the maximum number of logical records that can be rejected before a load fails. A rejected row consists of the data that could not be parsed into the corresponding data type during a bulk load. Rejected data does not indicate referential constraints. For information about using constraints, and the option of enforcing constraints during bulk loading, see About Constraints.
When the number of rejected records becomes equal to the REJECTMAX
value, the load fails. If you do not specify a value for REJECTMAX
, or if the value is 0, COPY
allows an unlimited number of exceptions to occur.
Abort Data Loads for Any Error (ABORT ON ERROR)
Using the ABORT ON ERROR
argument is the most restrictive way to load data, because no exceptions or rejections are allowed. A COPY
operation stops if any row is rejected. No data is loaded and Vertica rolls back the command.
If you use the ABORT ON ERROR
as part of a CREATE EXTERNAL TABLE AS COPY FROM
statement, the option is used whenever a query references the external table. The offending error is saved in the COPY
exceptions or rejected data file.
Understanding Row Rejections and Roll Back Errors
Depending on the type of error that COPY
encounters, Vertica does one of the following:
- Rejects the offending row and loads other rows into a table
- Rolls back the entire
COPY
statement without loading any data
Note: If you specify ABORT ON ERROR
with the COPY
statement, the load automatically rolls back if COPY
cannot parse any row.
The following table summarizes the difference between a rejected row or rollback.
Rejected Rows | Load Roll back |
---|---|
|
|
This example illustrates what happens when Vertica cannot coerce a row to the requested data type. For example, in the following COPY
statement, "a::INT + b::INT"
is a SQL expression in which a
and b
are derived values:
=> CREATE TABLE t (i INT); => COPY t (a FILLER VARCHAR, b FILLER VARCHAR, i AS a::INT + b::INT) FROM STDIN; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> cat|dog >> \.
Vertica cannot parse the row to the requested data type and rejects the row:
ERROR 2827: Could not convert "cat" from column "*FILLER*".a to an int8
If a
resolved to 'cat'
and b
to 'dog'
, the next expression 'cat'::INT + 'dog'::INT
would return an expression evaluator error:
=> SELECT 'cat'::INT + 'dog'::INT; ERROR 3681: Invalid input syntax for integer: "cat"
The following COPY
statement would also roll back because Vertica cannot parse the row to the requested data type:
=> COPY t (a FILLER VARCHAR, i AS a::INT) FROM STDIN;
In the following COPY
statement, Vertica rejects only the offending row without rolling back the statement. Instead of evaluating the 'cat'
row as a VARCHAR type, COPY parses 'cat'
directly as an INTEGER.
=> COPY t (a FILLER INT, i AS a) FROM STDIN;