Understanding Flex Tables

You can create flex tables and then manage them with their associated helper, data, and map functions. Flex tables:

Do not require schema definitions
Do not need column definitions
Have full Unicode support
Support SQL queries

You can use flex tables to promote data directly from exploration to analytic operations. Flex table features include:

Ability to load different formats into one flex table, which lets you handle changing structure over time
Full support of delimited and JSON data
Extensive SQL queries and built-in analytics for the data you load
Usability functions, which let you explore your unstructured data and then use built-in functions to materialize the data

Exploration to Promotion

After you create a flex table, you can quickly load data, including social media content in JSON, log files, delimited files, and other information. Previously, working with such data required significant schema design and preparation. Now, you can load and query flex tables in a few steps.

Creating flex tables is similar to creating other tables, except column definitions are optional. When you create flex tables, with or without column definitions, Vertica implicitly adds a real column to your table, called __raw__. This column stores loaded data. The __raw__ column is a LONG VARBINARY column with a NOT NULL constraint. It contains the documented limits for its data type (see Long Data Types in the SQL Reference Manual. The __raw__ column's default maximum width is 130,000 bytes (with an absolute maximum of 32,000,000 bytes). You can change the default width with the FlexTablesRawSize configuration parameter.

If you create a flex table without other column definitions, the table includes a second default column, __identity__, declared as an auto-incrementing IDENTITY (1,1) column. When no other columns are defined, flex tables use the __identity__ column for segmentation and sort order.

Loading data into a flex table encodes the record into a VMap type and populates the __raw__ column. The VMap is a standard dictionary type, pairing keys with string values as virtual columns.

Flex Table Terms

This guide uses the following terms when describing how you work with flex tables to explore and analyze flexible data:

VMap: An internal map data format.
Virtual Columns: Key-value pairs contained in a flex table __raw__ column.
Real Columns: Fully featured columns in flex or columnar tables.
Promoted Columns : Virtual columns that have been materialized to real columns.
Map Keys: Map keys are the virtual column names within VMap data.

Is There Structure in a Flex Table?

The term unstructured data (sometimes called semi-structured or Dark Data) does not indicate that the data you load into flex tables is entirely without structure. However, you may not know the data's composition or the inconsistencies of its design. In some cases, the data may not be relational.

Your data may have some structure (like JSON and delimited data). Data may be semi-structured or stringently structured, but in ways that you either do not know about or do not expect. In this guide, the term flexible data encompasses these and other kinds of data. You can load your flexible data directly into a flex table, and query its contents with your favorite SQL SELECT or other statements.

To summarize, you can load data first, without knowing its structure, and then query its content after a few simple transformations. In some cases, you already know the data structure, such as some tweet map keys, like user.lang, user.screen_name, and user.url. If so, you can query these values explicitly as soon as you load the data.

Storing Flex Table Data

While you can store unstructured data in a flex table __raw__ column, that column is implemented as a real column.

Vertica compresses __raw__ column data by about one half (1/2). While this factor is less than the compression rate for real columns, the reduction is significant for large amounts ( more than 1TB) of unstructured data. After compression is complete, Vertica writes the data to disk (ROS). This approach maintains K-safety in your cluster and supports standard recovery processes should node failures occur. Flex tables are included in full backups (or, if you choose, in object-level backups).

What Happens When You Create Flex Tables?

Whenever you execute a CREATE FLEX TABLE statement, Vertica creates three objects, as follows:

The flexible table (flex_table)
An associated keys table (flex_table_keys)
A default view for the main table (flex_table_view)

The _keys and _view objects are dependents of the parent, flex_table. Dropping the flex table also removes its dependents, although you can drop the _keys or _view objects independently.

You can create a flex table without specifying any column definitions (such asdarkdata, in the next example). When you do so, Vertica automatically creates two tables, the named flex table (such as darkdata) and its associated keys table, darkdata_keys:

=> CREATE flex table darkdata();
CREATE TABLE
=> \dt dark*
                List of tables
 Schema |     Name      | Kind  |  Owner  | Comment
--------+---------------+-------+---------+--------- 
 public | darkdata      | table | dbadmin |
 public | darkdata_keys | table | dbadmin |
(2 rows)

Each flex table has two default columns, __raw__ and __identity__. The __raw__ column exists in every flex table to hold the data you load. The __identity__ column is auto-incrementing. Vertica uses the __identity__ column for segmentation and sort order when no other column definitions exist. The flex keys table (darkdata_keys) has three columns, as shown:

=> SELECT * FROM darkdata;
 __identity__ | __raw__
--------------+---------
(0 rows)


=> SELECT * FROM darkdata_keys;
 key_name | frequency | data_type_guess
----------+-----------+-----------------
(0 rows)

Creating a flex table with column definitions (such as darkdata1, in the next example) automatically generates a table with the __raw__ column. However, the table has no __identity__ column because columns are specified for segmentation and sort order. Two tables are created automatically, as shown in the following example:

=> CREATE FLEX TABLE darkdata1 (name VARCHAR);
CREATE TABLE

=> SELECT * FROM darkdata1;
 __raw__ | name
---------+------
(0 rows)

=> \d darkdata1*
                                            List of Fields by Tables
 Schema |   Table   | Column  |          Type          |  Size  | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+------------------------+--------+---------+----------+-------------+-------------
 public | darkdata1 | __raw__ | long varbinary(130000) | 130000 |         | t        | f           |
 public | darkdata1 | name    | varchar(80)            |     80 |         | f        | f           |
(2 rows)

=> \dt darkdata1*
                   List of tables
 Schema |      Name      | Kind  |  Owner  | Comment
--------+----------------+-------+---------+---------
 public | darkdata1      | table | dbadmin |
 public | darkdata1_keys | table | dbadmin |
(2 rows)

Creating a flex table with at least one column definition (darkdata1 in the next example) also generates a table with the __raw__ column, but not an __identity__ column. Instead, the specified columns are used for segmentation and sort order. Two tables are also created automatically, as shown in the following example:

=> CREATE FLEX TABLE darkdata1 (name VARCHAR);
CREATE TABLE

=> \d darkdata1*
                               List of Fields by Tables
 Schema |   Table   | Column  |          Type          |  Size  | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+------------------------+--------+---------+----------+-------------+-------------
 public | darkdata1 | __raw__ | long varbinary(130000) | 130000 |         | t        | f           |
 public | darkdata1 | name    | varchar(80)            |     80 |         | f        | f           |
(2 rows)

=> \dt darkdata1*
                   List of tables
 Schema |      Name      | Kind  |  Owner  | Comment
--------+----------------+-------+---------+---------
 public | darkdata1      | table | dbadmin |
 public | darkdata1_keys | table | dbadmin |
(2 rows)

For more examples, see Creating Flex Tables.

Creating Superprojections Automatically

In addition to creating two tables for each flex table, Vertica creates superprojections for both the main flex table and its associated keys table. Using the \dj command, you can display the projections created automatically for the darkdata and darkdata1 tables in this set of examples:

=> \dj darkdata* 
                           List of projections
 Schema |          Name           |  Owner  |       Node       | Comment
--------+-------------------------+---------+------------------+---------
 public | darkdata1_b0            | dbadmin |                  |
 public | darkdata1_b1            | dbadmin |                  |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0001 |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0003 |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0004 |
 public | darkdata_b0             | dbadmin |                  |
 public | darkdata_b1             | dbadmin |                  |
 public | darkdata_keys__super    | dbadmin | v_vmart_node0001 |
 public | darkdata_keys_super     | dbadmin | v_vmart_node0003 |
 public | darkdata_keys_super     | dbadmin | v_vmart_node0004 |
(10 rows)

Default Flex Table View

When you create a flex table, you also create a default view. This default view uses the table name with a _view suffix, as listed in the next example, which shows the list of views for darkdata and darkdata1. If you query the default view, you are prompted to use the COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW function. This view enables you to update the view after you load data so that it includes all keys and values.

=> \dv darkdata*
                  List of View Fields
 Schema |      View      | Column |     Type     | Size
--------+----------------+--------+--------------+------
 public | darkdata_view  | status | varchar(124) |  124
 public | darkdata1_view | status | varchar(124) |  124
(2 rows)

For more information, see Updating Flex Table Views.

Flex Functions

There are three sets of functions to support flex tables and extracting data into VMaps. See the following sections for more information:

Data (helper) functions (Flex Data Functions Reference)
Extractor functions (Flex Extractor Functions Reference )
Map functions (Flex Map Functions Reference)

Using Clients with Flex Tables

You can use the Vertica supported client drivers with flex tables as follows:

To load data into a flex table, you can use the INSERT statement or COPY LOCAL with the appropriate flex table parser.
The driver metadata APIs return only real columns. For example, using a SELECT * FROM myflex; statement, when myflex has a single materialized column (name), returns the __raw__ and name columns. However, it does not return virtual columns from within __raw__. To access virtual columns and their values, query the associated flextable_keys table, just as you would in vsql.