Understanding Flex Tables
You can create flex tables and then manage them with their associated helper, data, and map functions. Flex tables:
- Do not require schema definitions
- Do not need column definitions
- Have full Unicode support
- Support SQL queries
You can use flex tables to promote data directly from exploration to analytic operations. Flex table features include:
- Ability to load different formats into one flex table, which lets you handle changing structure over time
- Full support of delimited and JSON data
- Extensive SQL queries and built-in analytics for the data you load
- Usability functions, which let you explore your unstructured data and then use built-in functions to materialize the data
Exploration to Promotion
After you create a flex table, you can quickly load data, including social media content in JSON, log files, delimited files, and other information. Previously, working with such data required significant schema design and preparation. Now, you can load and query flex tables in a few steps.
Creating flex tables is similar to creating other tables, except column definitions are optional. When you create flex tables, with or without column definitions, Vertica implicitly adds a real column to your table, called __raw__
. This column stores loaded data. The __raw__
column is a LONG VARBINARY
column with a NOT NULL
constraint. It contains the documented limits for its data type (see Long Data Types in the SQL Reference Manual. The __raw__
column's default maximum width is 130,000 bytes (with an absolute maximum of 32,000,000 bytes). You can change the default width with the FlexTablesRawSize
configuration parameter.
If you create a flex table without other column definitions, the table includes a second default column, __identity__
, declared as an auto-incrementing IDENTITY (1,1) column. When no other columns are defined, flex tables use the __identity__
column for segmentation and sort order.
Loading data into a flex table encodes the record into a VMap type and populates the __raw__
column. The VMap is a standard dictionary type, pairing keys with string values as virtual columns.
Flex Table Terms
This guide uses the following terms when describing how you work with flex tables to explore and analyze flexible data:
- VMap: An internal map data format.
-
Virtual Columns: Key-value pairs contained in a flex table
__raw__
column. - Real Columns: Fully featured columns in flex or columnar tables.
- Promoted Columns : Virtual columns that have been materialized to real columns.
- Map Keys: Map keys are the virtual column names within VMap data.
Is There Structure in a Flex Table?
The term unstructured data (sometimes called semi-structured or Dark Data) does not indicate that the data you load into flex tables is entirely without structure. However, you may not know the data's composition or the inconsistencies of its design. In some cases, the data may not be relational.
Your data may have some structure (like JSON and delimited data). Data may be semi-structured or stringently structured, but in ways that you either do not know about or do not expect. In this guide, the term flexible data encompasses these and other kinds of data. You can load your flexible data directly into a flex table, and query its contents with your favorite SQL SELECT or other statements.
To summarize, you can load data first, without knowing its structure, and then query its content after a few simple transformations. In some cases, you already know the data structure, such as some tweet map keys, like user.lang, user.screen_name
, and user.url
. If so, you can query these values explicitly as soon as you load the data.
Storing Flex Table Data
While you can store unstructured data in a flex table __raw__
column, that column is implemented as a real column.
Vertica compresses __raw__
column data by about one half (1/2). While this factor is less than the compression rate for real columns, the reduction is significant for large amounts ( more than 1TB) of unstructured data. After compression is complete, Vertica writes the data to disk (ROS). This approach maintains K-safety in your cluster and supports standard recovery processes should node failures occur. Flex tables are included in full backups (or, if you choose, in object-level backups).
What Happens When You Create Flex Tables?
Whenever you execute a CREATE FLEX TABLE
statement, Vertica creates three objects, as follows:
- The flexible table (flex_table)
- An associated keys table (flex_table_keys)
- A default view for the main table (flex_table_view)
The _keys and _view objects are dependents of the parent, flex_table. Dropping the flex table also removes its dependents, although you can drop the _keys
or _view
objects independently.
You can create a flex table without specifying any column definitions (such asdarkdata
, in the next example). When you do so, Vertica automatically creates two tables, the named flex table (such as darkdata
) and its associated keys table, darkdata_keys
:
=> CREATE flex table darkdata();
CREATE TABLE
=> \dt dark* List of tables Schema | Name | Kind | Owner | Comment
--------+---------------+-------+---------+---------
public | darkdata | table | dbadmin |
public | darkdata_keys | table | dbadmin | (2 rows)
Each flex table has two default columns, __raw__
and __identity__
. The __raw__
column exists in every flex table to hold the data you load. The __identity__
column is auto-incrementing. Vertica uses the __identity__
column for segmentation and sort order when no other column definitions exist. The flex keys table (darkdata_keys
) has three columns, as shown:
=> SELECT * FROM darkdata; __identity__ | __raw__ --------------+--------- (0 rows)
=> SELECT * FROM darkdata_keys; key_name | frequency | data_type_guess ----------+-----------+----------------- (0 rows)
Creating a flex table with column definitions (such as darkdata1
, in the next example) automatically generates a table with the __raw__
column. However, the table has no __identity__
column because columns are specified for segmentation and sort order. Two tables are created automatically, as shown in the following example:
=> CREATE FLEX TABLE darkdata1 (name VARCHAR); CREATE TABLE => SELECT * FROM darkdata1; __raw__ | name ---------+------ (0 rows)
=> \d darkdata1* List of Fields by Tables Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key --------+-----------+---------+------------------------+--------+---------+----------+-------------+------------- public | darkdata1 | __raw__ | long varbinary(130000) | 130000 | | t | f | public | darkdata1 | name | varchar(80) | 80 | | f | f | (2 rows)
=> \dt darkdata1* List of tables Schema | Name | Kind | Owner | Comment --------+----------------+-------+---------+--------- public | darkdata1 | table | dbadmin | public | darkdata1_keys | table | dbadmin | (2 rows)
Creating a flex table with at least one column definition (darkdata1
in the next example) also generates a table with the __raw__
column, but not an __identity__
column. Instead, the specified columns are used for segmentation and sort order. Two tables are also created automatically, as shown in the following example:
=> CREATE FLEX TABLE darkdata1 (name VARCHAR); CREATE TABLE
=> \d darkdata1* List of Fields by Tables Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key --------+-----------+---------+------------------------+--------+---------+----------+-------------+------------- public | darkdata1 | __raw__ | long varbinary(130000) | 130000 | | t | f | public | darkdata1 | name | varchar(80) | 80 | | f | f | (2 rows)
=> \dt darkdata1* List of tables Schema | Name | Kind | Owner | Comment --------+----------------+-------+---------+--------- public | darkdata1 | table | dbadmin | public | darkdata1_keys | table | dbadmin | (2 rows)
For more examples, see Creating Flex Tables.
Creating Superprojections Automatically
In addition to creating two tables for each flex table, Vertica creates superprojections for both the main flex table and its associated keys table. Using the \dj
command, you can display the projections created automatically for the darkdata
and darkdata1
tables in this set of examples:
=> \dj darkdata* List of projections Schema | Name | Owner | Node | Comment --------+-------------------------+---------+------------------+--------- public | darkdata1_b0 | dbadmin | | public | darkdata1_b1 | dbadmin | | public | darkdata1_keys_super | dbadmin | v_vmart_node0001 | public | darkdata1_keys_super | dbadmin | v_vmart_node0003 | public | darkdata1_keys_super | dbadmin | v_vmart_node0004 | public | darkdata_b0 | dbadmin | | public | darkdata_b1 | dbadmin | | public | darkdata_keys__super | dbadmin | v_vmart_node0001 | public | darkdata_keys_super | dbadmin | v_vmart_node0003 | public | darkdata_keys_super | dbadmin | v_vmart_node0004 | (10 rows)
Default Flex Table View
When you create a flex table, you also create a default view. This default view uses the table name with a _view
suffix, as listed in the next example, which shows the list of views for darkdata and darkdata1. If you query the default view, you are prompted to use the COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW function. This view enables you to update the view after you load data so that it includes all keys and values.
=> \dv darkdata* List of View Fields Schema | View | Column | Type | Size --------+----------------+--------+--------------+------ public | darkdata_view | status | varchar(124) | 124 public | darkdata1_view | status | varchar(124) | 124 (2 rows)
For more information, see Updating Flex Table Views.
Flex Functions
There are three sets of functions to support flex tables and extracting data into VMaps. See the following sections for more information:
- Data (helper) functions (Flex Data Functions Reference)
- Extractor functions (Flex Extractor Functions Reference )
- Map functions (Flex Map Functions Reference)
Using Clients with Flex Tables
You can use the Vertica supported client drivers with flex tables as follows:
- To load data into a flex table, you can use the
INSERT
statement orCOPY LOCAL
with the appropriate flex table parser. - The driver metadata APIs return only real columns. For example, using a
SELECT * FROM myflex;
statement, when myflex has a single materialized column (name
), returns the__raw__
andname
columns. However, it does not return virtual columns from within__raw__
. To access virtual columns and their values, query the associatedflextable_keys
table, just as you would in vsql.