Understanding Flex Tables

You can create flex tables and then manage them with their associated helper, data, and map functions. Flex tables:

  • Do not require schema definitions
  • Do not need column definitions
  • Have full Unicode support
  • Support SQL queries

You can use flex tables to promote data directly from exploration to analytic operations. Flex table features include:

  • Ability to load different formats into one flex table, which lets you handle changing structure over time
  • Full support of delimited and JSON data
  • Extensive SQL queries and built-in analytics for the data you load
  • Usability functions, which let you explore your unstructured data and then use built-in functions to materialize the data

Exploration to Promotion

After you create a flex table, you can quickly load data without specifying its schema. This allows you to load arbitrary JSON data, log files, and other semi-structured data and immediately begin querying it.

Creating flex tables is similar to creating other tables, except column definitions are optional. When you create flex tables, with or without column definitions, Vertica implicitly adds a real column to your table, called __raw__. This column stores loaded data. The __raw__ column is a LONG VARBINARY column with a NOT NULL constraint. It contains the documented limits for its data type (see Long Data Types). The __raw__ column's default maximum width is 130,000 bytes (with an absolute maximum of 32,000,000 bytes). You can change the default width with the FlexTableRawSize configuration parameter.

If you create a flex table without other column definitions, the table includes a second default column, __identity__, declared as an auto-incrementing IDENTITY (1,1) column. When no other columns are defined, flex tables use the __identity__ column for segmentation and sort order.

Loading data into a flex table encodes the record into a VMap type and populates the __raw__ column. The VMap is a standard dictionary type, pairing keys with string values as virtual columns.

Flex Table Terms

This guide uses the following terms when describing how you work with flex tables to explore and analyze flexible data:

  • VMap: An internal map data format.
  • Virtual Columns: Key-value pairs contained in a flex table __raw__ column.
  • Real Columns: Fully featured columns in flex or columnar tables.
  • Promoted Columns: Virtual columns that have been materialized to real columns.
  • Map Keys: Map keys are the virtual column names within VMap data.

Is There Structure in a Flex Table?

The term unstructured data (sometimes called semi-structured or dark data) does not indicate that the data you load into flex tables is entirely without structure. However, you may not know the data's composition or the inconsistencies of its design. In some cases, the data might not be relational.

Your data might have some structure (like JSON and delimited data). Data might be semi-structured or stringently structured, but in ways that you either do not know about or do not expect. The term flexible data encompasses these and other kinds of data. You can load your flexible data directly into a flex table and query its contents.

To summarize, you can load data first, without knowing its structure, and then query its content after a few simple transformations. In some cases, you already know the data structure, such as some keys in the data. If so, you can query these values explicitly as soon as you load the data.

Storing Flex Table Data

While you can store unstructured data in a flex table __raw__ column, that column is implemented as a real column.

Vertica compresses __raw__ column data by about one half (1/2). While this factor is less than the compression rate for real columns, the reduction is significant for large amounts (more than 1TB) of unstructured data. After compression is complete, Vertica writes the data to disk. This approach maintains K-safety in your cluster and supports standard recovery processes should node failures occur. Flex tables are included in full backups (or, if you choose, in object-level backups).

What Happens When You Create Flex Tables?

Whenever you execute a CREATE FLEX TABLE statement, Vertica creates three objects, as follows:

  • The flexible table (flex_table)
  • An associated keys table (flex_table_keys)
  • A default view for the main table (flex_table_view)

The _keys and _view objects are dependents of the parent, flex_table. Dropping the flex table also removes its dependents, although you can drop the _keys or _view objects independently.

You can create a flex table without specifying any column definitions. When you do so, Vertica automatically creates two tables, the named flex table (such as darkdata) and its associated keys table, darkdata_keys:

=> CREATE flex table darkdata();
CREATE TABLE
=> \dt dark* List of tables Schema | Name | Kind | Owner | Comment --------+---------------+-------+---------+--------- public | darkdata | table | dbadmin | public | darkdata_keys | table | dbadmin | (2 rows)

Each flex table has two default columns, __raw__ and __identity__. The __raw__ column exists in every flex table to hold the data you load. The __identity__ column is auto-incrementing. Vertica uses the __identity__ column for segmentation and sort order when no other column definitions exist. The flex keys table has three columns:

=> SELECT * FROM darkdata_keys;
 key_name | frequency | data_type_guess
----------+-----------+-----------------
(0 rows)

If you define columns when creating a flex table, Vertica still creates the __raw__ column. However, the table has no __identity__ column because columns are specified for segmentation and sort order. Two tables are created automatically, as shown in the following example:

=> CREATE FLEX TABLE darkdata1 (name VARCHAR);
CREATE TABLE

=> SELECT * FROM darkdata1;
 __raw__ | name
---------+------
(0 rows)

=> \d darkdata1*
                                            List of Fields by Tables
 Schema |   Table   | Column  |          Type          |  Size  | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+------------------------+--------+---------+----------+-------------+-------------
 public | darkdata1 | __raw__ | long varbinary(130000) | 130000 |         | t        | f           |
 public | darkdata1 | name    | varchar(80)            |     80 |         | f        | f           |
(2 rows)

=> \dt darkdata1*
                   List of tables
 Schema |      Name      | Kind  |  Owner  | Comment
--------+----------------+-------+---------+---------
 public | darkdata1      | table | dbadmin |
 public | darkdata1_keys | table | dbadmin |
(2 rows)

For more examples, see Creating Flex Tables.

Creating Superprojections Automatically

In addition to creating two tables for each flex table, Vertica creates superprojections for both the main flex table and its associated keys table. Using the \dj command, you can display the projections created automatically for the darkdata and darkdata1 tables in this set of examples:

=> \dj darkdata* 
                           List of projections
 Schema |          Name           |  Owner  |       Node       | Comment
--------+-------------------------+---------+------------------+---------
 public | darkdata1_b0            | dbadmin |                  |
 public | darkdata1_b1            | dbadmin |                  |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0001 |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0003 |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0004 |
 public | darkdata_b0             | dbadmin |                  |
 public | darkdata_b1             | dbadmin |                  |
 public | darkdata_keys__super    | dbadmin | v_vmart_node0001 |
 public | darkdata_keys_super     | dbadmin | v_vmart_node0003 |
 public | darkdata_keys_super     | dbadmin | v_vmart_node0004 |
(10 rows)

Default Flex Table View

When you create a flex table, you also create a default view. This default view uses the table name with a _view suffix. If you query the default view, you are prompted to use the COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW function. This view enables you to update the view after you load data so that it includes all keys and values:

=> \dv darkdata*
                  List of View Fields
 Schema |      View      | Column |     Type     | Size
--------+----------------+--------+--------------+------
 public | darkdata_view  | status | varchar(124) |  124
 public | darkdata1_view | status | varchar(124) |  124
(2 rows)

For more information, see Updating Flex Table Views.

Flex Functions

There are three sets of functions to support flex tables and extracting data into VMaps. See the following sections for more information:

Using Clients with Flex Tables

You can use the Vertica-supported client drivers with flex tables as follows:

  • To load data into a flex table, you can use the INSERT or COPY LOCAL statement with the appropriate flex parser.
  • The driver metadata APIs return only real columns. For example, if you select * from a flex table with a single materialized column, the statement returns that column and the __raw__. However, it does not return virtual columns from within __raw__. To access virtual columns and their values, query the associated _keys table, just as you would in vsql.