Bulk Loading Data into Flex Tables

You bulk load data into a flex table with a COPY statement, specifying one of the flex parsers:

  • FAVROPARSER
  • FCEFPARSER
  • FCSVPARSER
  • FDELIMITEDPAIRPARSER
  • FDELIMITEDPARSER
  • FJSONPARSER
  • FREGEXPARSER

All flex parsers store the data as a single-value VMap. They reside in the VARBINARY __raw__ column, which is a real column with a NOT NULL constraint. The VMap is encoded into a single binary value for storage in the __raw__ column. The encoding places the value strings in a contiguous block, followed by the key strings. Vertica supports null values within the VMap for keys with NULL-specified columns. The key and value strings represent the virtual columns and their values in your flex table.

If a flex table data row is too large to fit in the VARBINARY __raw__ column, it is rejected. By default, the rejected data and exceptions files are stored in the standard CopyErrorLogs location, a subdirectory of the catalog directory: 

v_mart_node003_catalog/CopyErrorLogs/trans-STDIN-copy-from-exceptions.1 
v_mart_node003_catalog/CopyErrorLogs/trans-STDIN-copy-rejections.1

Flex tables do not copy any rejected data, due to disk space considerations. The rejected data file exists, but it contains only a new line character for every rejected record. The corresponding exceptions file lists the reason why each record was rejected.

You can specify a different path and file for the rejected data and exceptions files. To do so, use the COPY parameters REJECTED DATA and EXCEPTIONS, respectively. You can also save load rejections and exceptions in a table. For more information, see Getting Data into Vertica.

Basic Flex Table Load and Query

Loading data into your flex table is similar to loading data into a regular columnar table. The difference is that you must use the parser argument with one of the flex parsers:

=> COPY darkdata FROM '/home/dbadmin/data/tweets_12.json' PARSER fjsonparser();
 Rows Loaded
------------- 12
(1 row)

You can use many additional COPY parameters as required but not all are supported.

Loading Data into Flex Table Real Columns

If you create a hybrid flex table with one or more real column definitions, COPY evaluates each virtual column key name during data load. For each real column with a name that is identical to a virtual column key name, COPY does the following: 

  • Loads the keys and values as part of the VMap data in the __raw__ column
  • Automatically populates real columns with the values from their virtual column counterparts

Subsequent data loads continue loading same-name key-value pairs into both the __raw__ column and the real column.

Over time, storing values in both column types can impact your licensed data limits. For more information about Vertica licenses, see Managing Licenses in the Administrator's Guide.

For example, continuing with the JSON data:

  1. Create a flex table, darkdata1, with a column definition of one of the keys in the data you will load:

    => CREATE FLEX TABLE darkdata1 ("user.lang" VARCHAR);
    CREATE TABLE
    
  2. Load data into darkdata1:

    => COPY darkdata1 FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
     Rows Loaded
    -------------
              12
    (1 row)
    
  3. Query the user.lang column of darkdata1. Loading the JSON data file populated the column you defined:

    => SELECT "user.lang" FROM darkdata1;
     user.lang
    -----------
     es
     es
     tr
     it
     en
     en
     en
     en
    (12 rows)
    

    Empty column rows indicate NULL values. For more information about how NULLs are handled in flex tables, see NULL Value.

  4. You can query for other virtual columns (such as "user.name" in darkdata1), with similar results as for "user.lang":

    => SELECT "user.name" FROM darkdata1;
          user.name
    ---------------------
     I'm Toasterâ¥
     Flu Beach
     seydo shi
     The End
     Uptown gentleman.
     ~G A B R I E L A â¿
     Frederick Danjou
     laughing at clouds.
    (12 rows)
    

    While the results for these two queries are similar, the difference in accessing the keys and their values is significant. Data for "user.lang" has been materialized into a real table column, while "user.name" remains a virtual column. For production-level data usage (rather than test data sets), materializing flex table data improves query performance significantly.

Handling Default Values During Loading

You can create your flex table with a real column, named for a virtual column that exists in your incoming data. For example, if the data you load has a user.lang virtual column, define the flex table with that column. You can also specify a default column value when creating the flex table with real columns. The next example shows how to define a real column (user.lang), which has a default value from a virtual column (user.name): 

=> CREATE FLEX TABLE darkdata1 ("user.lang" LONG VARCHAR default "user.name");

When you load data into your flex table, COPY uses values from the flex table data, ignoring the default column definition. Loading data into a flex table requires MAPLOOKUP to find keys that match any real column names. A match exists when the incoming data has a virtual column with the same name as a real column. When COPY detects a match, it populates the column with values. COPY returns either a value or NULL for each row, so real columns always have values.

For example, after creating the darkdata1 flex table, described in the previous example, load data with COPY:

=> COPY darkdata1 FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
 Rows Loaded
-------------
          12
(1 row)

If you query the darkdata1 table after loading, the data shows that the values for the user.lang column were extracted:

  • From the data being loaded — values for the user.lang virtual column
  • With NULL — rows without values

In this case, the table column default value for user.lang was ignored:

=> SELECT "user.lang" FROM darkdata1;
 user.lang
-----------
 it
 en
 es
 en
 en
 es
 tr
 en
(12 rows)

Using COPY to Specify Default Column Values

You can add an expression to a COPY statement to specify default column values when loading data. For flex tables, specifying any column information requires that you list the __raw__ column explicitly. The following example shows how to use an expression for the default column value. In this case, loading populates the defined user.lang column with data from the input data user.name values:

=> COPY darkdata1(__raw__, "user.lang" as "user.name"::VARCHAR)
   FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
 Rows Loaded
-------------
          12
(1 row)
=> SELECT "user.lang" FROM darkdata1;
      user.lang
---------------------
 laughing at clouds.
 Avita Desai
 I'm Toasterâ¥
 Uptown gentleman.
 ~G A B R I E L A â¿
 Flu Beach
 seydo shi
 The End
(12 rows)

You can specify default values when adding columns, as described in Altering Flex Tables. When you do so, a different behavior results. For more information about using COPY, its expressions and parameters, see Getting Data into Vertica in the Administrator's Guide and COPY in the SQL Reference Manual.