Loading JSON Data

You can load JSON data into flex or columnar tables. This section describes some examples of using the fjsonparser with several of the parser options.

Checking JSON Integrity

Before loading any JSON data, be sure that the data is valid. You can verifyJSON data integrity using a web tool such as JSONLint. Copy your JSON data into the tool. If any data is invalid, the tool returns a message similar to the one in this example:

Parse error on line 170:...257914002502451200}{    
"id_str": "257
----------------------^
Expecting 'EOF', '}', ',', ']'

Using flatten_maps and flatten_arrays Parameters

When loading JSON data, the fjsonparser uses the parameters flatten_maps and flatten_arrays to control how the parser handles the data it is loading. Here are the default settings for these two parameters: 

Parameter Default Change Default
flatten_maps TRUE: Flatten all maps. flatten_maps=FALSE
flatten_arrays FALSE: Do not flatten arrays. flatten_arrays=TRUE

You control the default the behavior by using one or both flatten_ parameters.

For JSON maps, the parser flattens all submaps, separating the levels with a period (.). Consider the following input data with a submap:

 { grade: { level: 4 } }  

The default parser behavior results in the following map:

{ "grade.level" -> "4" } 

To use the bracket operators ([]) to access deeply nested JSON in VMap data, you must load the data with flatten_maps=FALSE, as described in Querying Nested Data.

For JSON arrays, the parser maintains the array. Consider the following input data containing a 2-element array, with values 1 and 2:

{ grade: [ 1 2 ] }

The default parser behavior results in the following array:

{ "grade": { "0" -> "1", "1" -> "2" } }

Using the parameters flatten_maps and flatten_arrays is recursive, and affects all data.

Loading from a Specific Start Point

You can use the fjsonparser start_point parameter to load JSON data beginning at a specific key, rather than at the beginning of a file. Data is parsed from after the start_point key until the end of the file, or to the end of the first start_point's value. The fjsonparser ignores any subsequent instance of the start_point, even if that key appears multiple times in the input file. If the input data contains only one copy of the start_point key, and that value is a list of JSON elements, the parser loads each element in the list as a row.

This section uses the following sample JSON data, saved to a file (alphanums.json):

 { "A": { "B": { "C": [ { "d": 1, "e": 2, "f": 3 }, { "g": 4, "h": 5, "i": 6 }, 
{ "j": 7, "k": 8, "l": 9 } ] } } }
  1. Create a flex table, start_json
  2. => CREATE FLEX TABLE start_json();
    CREATE TABLE
    
  3. Load alphanums.json into start_json using the fjsonparser without any parameters:  
  4. => COPY start_json FROM '/home/dbadmin/data/flex/alphanums.json' PARSER fjsonparser();
     Rows Loaded
    -------------
               1
    (1 row)
    
  5. Use maptostring to see the results of loading all of alphanums.json:
  6. => SELECT maptostring(__raw__) FROM start_json;
                            maptostring        
    -------------------------------------------------------------------------------
     {
       "A.B.C" : {
          "0.d" : "1",
          "0.e" : "2",
          "0.f" : "3",
          "1.g" : "4",
          "1.h" : "5",
          "1.i" : "6",
          "2.j" : "7",
          "2.k" : "8",
          "2.l" : "9"
       }
    }
    
    (1 row)
    
  7. Truncate start_json and load alphanums.json with the start_point parameter: 
  8. => TRUNCATE TABLE start_json;
    TRUNCATE TABLE
    =>  COPY start_json FROM '/home/dbadmin/data/flex/alphanums.json' PARSE 
    -> fjsonparser(start_point='B');
     Rows Loaded
    -------------
               1
    (1 row)
  9. Next, call maptostring again to compare the results of loading alphanums.jsonfrom start_point='B'
  10. => SELECT maptostring(__raw__) FROM start_json;
                             maptostring          
    --------------------------------------------------------------------------------
     {
       "C" : {
          "0.d" : "1",
          "0.e" : "2",
          "0.f" : "3",
          "1.g" : "4",
          "1.h" : "5",
          "1.i" : "6",
          "2.j" : "7",
          "2.k" : "8",
          "2.l" : "9"
       }
    }
    
    (1 row)
    

Parsing From a Start Point Occurrence

If a start_point value occurs in multiple locations in your JSON data, you can use the start_point_occurrence integer parameter to specify the occurrence at which to start parsing. By defining start_point_occurrence, fjsonparser begins at the nth occurrence of start_point.

Controlling Column Name Separators

By default, fjsonparser produces column names by concatenating JSON field names with a period (.). You can change the default separator by specifying a different character with the key_separator parameter.

Handling Special Characters

Some input JSON data can have special characters in field names. You can replace these characters by setting the suppress_nonalphanumeric_key_chars to TRUE. With this parameter setting, all special characters are converted to an underscore (_) character.

Dealing with Invalid JSON Records

If your JSON data is not perfectly formatted, your load may fail due to invalid records. You can use the RECORD_TERMINATOR parameter to skip these invalid records if your JSON records are consistently delimited by a character like a line break. Setting a record terminator will allow the FJSONPARSER to skip over invalid records and continue parsing the rest of the data.

If your records are not consistently marked by a character, you can use the COPY parameter ERROR TOLERANCE. ERROR TOLERANCE skips entire source files with invalid JSON records, while RECORD_TERMINATOR skips individual malformed JSON records. Using both ERROR TOLERANCE and RECORD_TERMINATOR within the statement will work, but if your records are consistently marked, RECORD_TERMINATOR should sufficiently deal with imperfect records.

  1. Create a flex table named "fruits".
    => CREATE FLEX TABLE fruits();
    CREATE TABLE
  2. Use the FJSONPARSER and call the RECORD_TERMINATOR parameter with a termination key of E'\n' (which denotes a new line). Insert records, including an invalid record.
    => COPY fruits FROM STDIN PARSER FJSONPARSER(RECORD_TERMINATOR=E'\n');
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself
    >> {"name": "orange", "type": "fruit", "color": "orange", "rating": 5 }
    >> {"name": "apple", "type": "fruit", "color": "green" }
    >> {"name": "blueberry", "type": "fruit", "color": "blue", "rating": 10 }
    >> "type": "fruit", "rating": 7 }
    >> {"name": "banana", "type" : "fruit", "color": "yellow", "rating": 3 }
    >> \.
    
  3. View the flex table using MAPTOSTRING to confirm that the invalid record was skipped while the rest of the records were successfully loaded.
    => SELECT MAPTOSTRING(__raw__) FROM fruits;
    maptostring
    --------------------------------------------------------------------------------------------
    {
    "color" : "orange",
    "name" : "orange",
    "rating" : "5",
    "type" : "fruit"
    }
    {
    "color" : "green",
    "name" : "apple",
    "type" : "fruit"
    }
    {
    "color" : "blue",
    "name" : "blueberry",
    "rating" : "10",
    "type" : "fruit"
    }
    {
    "color" : "yellow",
    "name" : "banana",
    "rating" : "3",
    "type" : "fruit"
    }
    (4 rows)

Rejecting Duplicate Values

You can reject duplicate values by using the reject_on_duplicate=true option with the fjsonparser. The next example uses this option while loading data and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.

=> CREATE FLEX TABLE json_dupes();
CREATE TABLE
=> COPY json_dupes FROM stdin PARSER fjsonparser(reject_on_duplicate=true) 
exceptions '/home/dbadmin/load_errors/json_e.out' 
rejected data '/home/dbadmin/load_errors/json_r.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"a":"1","a":"2","b":"3"}
>> \.
=>  \!cat /home/dbadmin/load_errors/json_e.out
COPY: Input record 1 has been rejected (Rejected by user-defined parser).  
Please see /home/dbadmin/load_errors/json_r.out, record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

Rejecting Data on Materialized Column Type Errors

Both the fjsonparser and fdelimitedparser parsers have a Boolean parameter, reject_on_materialized_type_error. Setting this parameter to true causes rows to be rejected if the input data:

  • Includes keys matching an existing materialized column
  • Has a key value that cannot be coerced into the materialized column's data type.

The following examples illustrate setting this parameter.

  1. Create a table, reject_true_false, with two real columns:
  2. => CREATE FLEX TABLE reject_true_false(one VARCHAR, two INT);
    CREATE TABLE
  3. Load JSON data into the table (from STDIN), using the fjsonparser with reject_on_materialized_type_error=false. While false is the default value, the following example specifies it explicitly for illustration: 
  4. => COPY reject_true_false FROM stdin PARSER 
    -> fjsonparser(reject_on_materialized_type_error=false);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> {"one": 1, "two": 2}
    >> {"one": "one", "two": "two"}
    >> {"one": "one", "two": 2}
    >> \.
  5. Invoke maptostring to display the table values after loading data:
  6. =>  SELECT maptostring(__raw__), one, two FROM reject_true_false;          
    maptostring | one | two
    ----------------------------------+-----+-----
    { "one" : "one", "two" : "2" }
    | one | 2 { "one" : "1", "two" : "2" }
    | 1 | 2
    { "one" : "one",
    "two" : "two" }
    | one |
    (3 rows)
  7. Truncate the table: 
  8. => TRUNCATE TABLE reject_true_false;
  9. Reload the same data again, but this time, set reject_on_materialized_type_error=true:
  10. => COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> {"one": 1, "two": 2}
    >> {"one": "one", "two": "two"}
    >> {"one": "one", "two": 2}
    >> \.
  11. Call maptostring to display the table contents. Only two rows were loaded, whereas the previous results had three rows: 
  12. => SELECT maptostring(__raw__), one, two FROM reject_true_false;
                  maptostring              | one | two
    ---------------------------------------+-----+-----
     {
       "one" : "1",
       "two" : "2"
    }
       | 1   |   2
     {
       "one" : "one",
       "two" : "2"
    }
     | one |   2
    (2 rows)
    

Rejecting or Omitting Empty Rows

Valid JSON files OpenText can include empty key and value pairs, such as this one: 

{"": 1 "}

Such rows are invalid for SQL. To prevent this situation, you can control the behavior for empty rows, either rejecting or omitting them. You do so using two boolean parameters for the parsers FDELIMITEDPARSER or FJSONPARSER

  • reject_on_empty_key
  • omit_empty_keys

See Also