Loading Delimited Data

You can load flex tables with one of two delimited parsers, fdelimitedparser or fdelimitedpairparser.

  • Use fdelimitedpairparser when the data specifies column names with the data in each row.
  • Use fdelimitedparser when the data does not specify column names or has a header row for column names.

This section describes using some options that fdelimitedpairparser and fdelimitedparser support.

Rejecting Duplicate Values

You can reject duplicate values using the reject_on_duplicate=true option with the fdelimitedparser. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.

=> CREATE FLEX TABLE delim_dupes();
CREATE TABLE
=> COPY delim_dupes FROM stdin PARSER fdelimitedparser(reject_on_duplicate=true) 
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with 
reject_on_duplicate specified; rejecting.).  Please see /home/dbadmin/load_errors/reject.out, 
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

Rejecting Materialized Column Type Errors

Both the fjsonparser and fdelimitedparser parsers have a boolean parameter, reject_on_materialized_type_error. Setting this parameter to true causes rows to be rejected if both the following conditions exist in the input data:

  • Includes keys matching an existing materialized column
  • Has a value that cannot be coerced into the materialized column's data type

Suppose the flex table has a materialized column, OwnerPercent, declared as a FLOAT. Trying to load a row with an OwnerPercent key that has a VARCHAR value causes fdelimitedparser to reject the data row.

The following examples illustrate setting this parameter.

  1. Create a table, reject_true_false, with two real columns:
  2. => CREATE FLEX TABLE reject_true_false(one VARCHAR, two INT);
    CREATE TABLE
  3. Load JSON data into the table (from STDIN), using the fjsonparser with reject_on_materialized_type_error=false. While false is the default value, the following example specifies it explicitly for illustration: 
  4. => COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=false);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> {"one": 1, "two": 2}
    >> {"one": "one", "two": "two"}
    >> {"one": "one", "two": 2}
    >> \.
  5. Invoke maptostring to display the table values after loading data:
  6. =>  SELECT maptostring(__raw__), one, two FROM reject_true_false;          
    maptostring | one | two
    ----------------------------------+-----+-----
    { "one" : "one", "two" : "2" }
    | one | 2 { "one" : "1", "two" : "2" }
    | 1 | 2
    { "one" : "one",
    "two" : "two" }
    | one |
    (3 rows)
  7. Truncate the table: 
  8. => TRUNCATE TABLE reject_true_false;
  9. Reload the same data again, but this time, set reject_on_materialized_type_error=true:
  10. => COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> {"one": 1, "two": 2}
    >> {"one": "one", "two": "two"}
    >> {"one": "one", "two": 2}
    >> \.
  11. Call maptostring to display the table contents. Only two rows were loaded, whereas the previous results had three rows: 
  12. => SELECT maptostring(__raw__), one, two FROM reject_true_false;
                  maptostring              | one | two
    ---------------------------------------+-----+-----
     {
       "one" : "1",
       "two" : "2"
    }
       | 1   |   2
     {
       "one" : "one",
       "two" : "2"
    }
     | one |   2
    (2 rows)