Saving Rejected Data To a Table

Use the REJECTED DATA parameter with the AS TABLE clause to specify a table in which to save rejected data. Saving rejected data to a file is mutually exclusive with using the AS TABLE clause.

When you use the AS TABLE clause, Vertica creates a new table if one does not exist, or appends to an existing table. If no parsing rejections occur during a load, the table exists but is empty. The next time you load data, Vertica inserts any rejected rows to the existing table.

The load rejection tables are a special type of table with the following capabilities and limitations:

To make the data in a rejected table K-safe, you can do one of the following: 

Using COPY NO COMMIT

If the COPY statement includes options NO COMMIT and REJECTED DATA AS TABLE, and the reject-table does not already exist, Vertica Analytic Database saves the rejected data table as a LOCAL TEMP table and returns a message that a LOCAL TEMP table is being created.

Rejected-data tables are useful for Extract-Load-Transform workflows, where you will likely use temporary tables more frequently. The rejected-data tables let you quickly load data and identify which records failed to load. If you load data into a temporary table that you created using the ON COMMIT DELETE clause, the COPY operation will not commit.

Location of Rejected Data Table Records

When you save rejected records to a table, using the REJECTED DATA AS TABLE table_name option, the data for the table is saved in a database data subdirectory, RejectionTableData. For example, for a VMart database, table data files reside here:

/home/dbadmin/VMart/v_vmart_node0001_data/RejectionTableData

Rejected data tables include both rejected data and the reason for the rejection (exceptions), along with other data columns, described next. Vertica suggests that you periodically drop any rejected data tables that you no longer require.

Querying a Rejected Data Table

When you specify a rejected data table when loading data with COPY, you can query that table for information about rejected data after the load operation is complete. For example:

  1. Create the loader table: 
  2. => CREATE TABLE loader(a INT)
    CREATE TABLE
  3. Use COPY to load values, saving rejected data to a table, loader_rejects:
  4. => COPY loader FROM STDIN REJECTED DATA AS TABLE loader_rejects;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 1
    >> 2
    >> 3
    >> a
    >> \.
  5. Query the loader table after loading data: 
  6. => SELECT * FROM loader;
     x
    ---
     1
     2
     3
    (3 rows)
    
  7. Query the loader_rejects table to see its column rows:
  8. => SELECT * FROM loader_rejects;
    -[ RECORD 1 ]-------------+--------------------------------------------
    node_name                 | v_vmart_node0001
    file_name                 | STDIN
    session_id                | v_vmart_node0001.example.-24016:0x3439
    transaction_id            | 45035996274080923
    statement_id              | 1
    batch_number              | 0
    row_number                | 4
    rejected_data             | a
    rejected_data_orig_length | 1
    rejected_reason           | Invalid integer format 'a' for column 1 (x)
    

The rejected data table has the following columns: 

Column Data Type Description
node_name
VARCHAR The name of the Vertica node on which the input load file was located.
file_name
VARCHAR The name of the file being loaded, which applies if you loaded a file (as opposed to using STDIN).
session_id
VARCHAR The session ID number in which the COPY statement occurred.
transaction_id
INTEGER Identifier for the transaction within the session, if any; otherwise NULL.
statement_id 
INTEGER

The unique identification number of the statement within the transaction that included the rejected data.

Tip: You can use the session_id, transaction_id, and statement_id columns to create joins with many system tables. For example, if you join against the QUERY_REQUESTS table using those three columns, the QUERY_REQUESTS.REQUEST column contains the actual COPY statement (as a string) used to load this data.

batch_number
INTEGER

INTERNAL USE. Represents which batch (chunk) the data comes from.

row_number
INTEGER The rejected row number from the input file.
rejected_data
LONG VARCHAR The data that was not loaded.
rejected_data_orig_length
INTEGER The length of the rejected data.
rejected_reason
VARCHAR The error that caused the rejected row. This column returns the same message that exists in a load exceptions file when you do not save to a table.

Exporting the Rejected Records Table

You can export the contents of the column rejected_data to a file to capture only the data rejected during the first COPY statement. Then, correct the data in the file, save it, and load the updated file.

To export rejected records: 

  1. Create a sample table:

    => CREATE TABLE t (i int);
    CREATE TABLE
  2. Copy data directly into the table, using a table to store rejected data: 

    => COPY t FROM STDIN REJECTED DATA AS TABLE t_rejects;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 1
    >> 2
    >> 3
    >> 4
    >> a
    >> b
    >> c
    >> \.
  3. Show only tuples and set the output format:

    => \t
    Showing only tuples.
    => \a Output format is unaligned.
  4. Output to a file:

    => \o rejected.txt
    => select rejected_data from t_rejects; => \o
  5. Use the catcommand on the saved file: 

    => \! cat rejected.txt
    a
    b
    c
    

After a file exists, you can fix load errors and use the corrected file as load input to the COPY statement.