Vertica Analytics Platform Version 9.3.x Documentation

Getting Started

Getting Started describes the basics of creating, exploring, and using flex tables. The rest of this guide presents beyond the basics details using simple examples.

Create a Simple JSON File

Use this JSON data for the exercises in the rest of this section: 

{"name": "Everest", "type":"mountain", "height":29029, "hike_safety": 34.1}
{"name": "Mt St Helens", "type":"volcano", "height":29029, "hike_safety": 15.4}
{"name": "Denali", "type":"mountain", "height":17000, "hike_safety": 12.2}
{"name": "Kilimanjaro", "type":"mountain", "height":14000 }
{"name": "Mt Washington", "type":"mountain", "hike_safety": 50.6}
  1. Copy and paste the JSON data into your favorite editor.
  2. Save the file in any convenient location for loading into your Vertica database.

Create a Flex Table and Load Data

  1. Create a flex table called mountains:

    => CREATE flex table mountains();
    
  2. Load the JSON file you saved, using the flex table parser fjsonparser:

    => COPY mountains from '/home/dbadmin/data/flex/mountains.json' 
    parser fjsonparser();
     Rows Loaded
    -------------
               5
    (1 row)
    
  3. Query values from the sample file:

    => SELECT name, type, height from mountains;
         name      |   type   | height
    ---------------+----------+--------
     Everest       | mountain | 29029
     Mt St Helens  | volcano  | 29029
     Denali        | mountain | 17000
     Kilimanjaro   | mountain | 14000
     Mt Washington | mountain |
    (5 rows)
    

You have now created a flex table and loaded data. Next, learn more about using flex table data in your database.

Query More of Your Flex Table

  1. Query your flex table to see the data you loaded as it is stored in the __raw__ column. The example illustrates the table contents, with Return characters added for illustration: 

    => \x
    Expanded display is on.
    => SELECT * from mountains;
    [ RECORD 1 ]+----------------------------------------------------------------------------
    __identity__ | 1
    __raw__      | \001\000\000\000,\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
    035\000\000\000$\000\000\0002902934.1Everestmountain\004\000\000\000\024\000\000\000\032\000\
    000\000%\000\000\000)\000\000\000heighthike_safetynametype
    [ RECORD 2 ]+----------------------------------------------------------------------------
    __identity__ | 2
    __raw__      | \001\000\000\0000\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
    035\000\000\000)\000\000\0002902915.4Mt St Helensvolcano\004\000\000\000\024\000\000\000\032\000\
    000\000%\000\000\000)\000\000\000heighthike_safetynametype
    [ RECORD 3 ]+----------------------------------------------------------------------------
    __identity__ | 3
    __raw__      | \001\000\000\000+\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
    035\000\000\000#\000\000\0001700012.2Denalimountain\004\000\000\000\024\000\000\000\032\000\000
    \000%\000\000\000)\000\000\000heighthike_safetynametype
    [ RECORD 4 ]+----------------------------------------------------------------------------
    __identity__ | 4
    __raw__      | \001\000\000\000(\000\000\000\003\000\000\000\020\000\000\000\025\000\000\000\
    000\000\00014000Kilimanjaromountain\003\000\000\000\020\000\000\000\026\000\000\000\032\000\
    000\000heightnametype
    [ RECORD 5 ]+----------------------------------------------------------------------------
    __identity__ | 5
    __raw__      | \001\000\000\000)\000\000\000\003\000\000\000\020\000\000\000\024\000\000\000\
    000\000\00050.6Mt Washingtonmountain\003\000\000\000\020\000\000\000\033\000\000\000\037\000\
    000\000hike_safetynametype
    
  2. Use the mapToString() function (with the __raw__ column of mountains) to inspect its contents in readable JSON text format: 

    => SELECT maptostring(__raw__) from mountains;
                                             MAPTOSTRING
    ------------------------------------------------------------------------------------------
      {
       "height" : "29029",
       "hike_safety" : "34.1",
       "name" : "Everest",
       "type" : "mountain"
    }
    
     {
       "height" : "29029",
       "hike_safety" : "15.4",
       "name" : "Mt St Helens",
       "type" : "volcano"
    }
    
     {
       "height" : "17000",
       "hike_safety" : "12.2",
       "name" : "Denali",
       "type" : "mountain"
    }
    
     {
       "height" : "14000",
       "name" : "Kilimanjaro",
       "type" : "mountain"
    }
    
     {
       "hike_safety" : "50.6",
       "name" : "Mt Washington",
       "type" : "mountain"
    }
    
    
  3. Now, use the compute_flextable_keys() function to populate the mountain_keys table. Vertica generates this table automatically when you create your flex table.

    => SELECT compute_flextable_keys('mountains');
    	compute_flextable_keys
    ---------------------------------------------------
    Please see public.mountains_keys for updated keys
    (1 row)
    
  4. Query the keys table (mountains_keys), and examine the results: 

    => SELECT * from public.mountains_keys;
      key_name   | frequency | data_type_guess
    -------------+-----------+-----------------
     hike_safety |         4 | varchar(20)
     name        |         5 | varchar(26)
     height      |         4 | varchar(20)
     type        |         5 | varchar(20)
    (4 rows)
    

Build a Flex Table View

  1. Use the build_flextable_view() function to populate a view generated from the mountains_keys table.

    => SELECT build_flextable_view('mountains');
                     build_flextable_view
    ------------------------------------------------------
     The view public.mountains_view is ready for querying
    (1 row)
    
  2. Query the view mountains_view

    => SELECT * from public.mountains_view;
     hike_safety |     name      |   type   | height
    -------------+---------------+----------+--------
     50.6        | Mt Washington | mountain |
     34.1        | Everest       | mountain | 29029
     22.8        | Kilimanjaro   | mountain | 14000
     15.4        | Mt St Helens  | volcano  | 29029
     12.2        | Denali        | mountain | 17000
    (5 rows)
  3. Use the view_columns system table to query the column_name and data_type columns for mountains_view

    => SELECT column_name, data_type from view_columns where table_name = 'mountains_view';
     column_name |  data_type
    -------------+-------------
     hike_safety | varchar(20)
     name        | varchar(26)
     type        | varchar(20)
     height      | varchar(20)
    (4 rows)
    
  4. Review the query results:

    • Notice the data_type column, its values and sizes. These are calculated when you compute keys for your flex table with compute_flextable_keys().
    • Did you also notice the data_type_guess column when you queried the mountains_keys table after invoking that function?
  5. With the data_type information from mountains_view, override the data_type_guess for hike_safety. Then, COMMIT the change, and rebuild the view with build_flextable_view()

    => UPDATE mountains_keys SET data_type_guess = 'float' where key_name = 'hike_safety';
     OUTPUT
    --------
          1
    (1 row)
    
    => commit;
    => SELECT build_flextable_view('mountains');
                     build_flextable_view
    ------------------------------------------------------
     The view public.mountains_view is ready for querying
    (1 row)
    
  6. Next, use the view_columns system table. Notice that hike_safety is now a float data type: 

    => SELECT column_name, data_type from view_columns where table_name = 'mountains_view';
     column_name |  data_type
    -------------+-------------
     hike_safety | float
     name        | varchar(26)
     type        | varchar(20)
     height      | varchar(20)
    (4 rows)
    

Create a Hybrid Flex Table

If you already know that some of the data you load and query regularly needs full Vertica performance and support, you can create a hybrid flex table. A hybrid flex table has one or more real columns that you define, and a __raw__ column to store any unstructured data you load. Querying real columns is faster than querying flexible data in the __raw__ column. You can define default values for the columns. 

  1. Create a hybrid flex table, and load the same sample JSON file: 

    => CREATE flex table mountains_hybrid(name varchar(41) default name::varchar(41), hike_safety float 
    default hike_safety::float);
    => COPY mountains_hybrid from '/home/dbadmin/Downloads/mountains.json' parser fjsonparser();
     Rows Loaded
    -------------
               5
    (1 row)
  2. Use the compute_flextable_keys_and_build_view() function to populate the keys table and build the view for mountains_hybrid:  

    => SELECT compute_flextable_keys_and_build_view('mountains_hybrid');
                                            compute_flextable_keys_and_build_view                        
    ------------------------------------------------------------------------------------------------
     Please see public.mountains_hybrid_keys for updated keys
    The view public.mountains_hybrid_view is ready for querying
    (1 row)
    
    
  3. Query the mountains_hybrid keys table. Review the data_type_guesses column values again. The types list the column definitions you declared when you created the hybrid table:  

    => SELECT * from mountains_hybrid_keys;
      key_name   | frequency | data_type_guess
    -------------+-----------+-----------------
     height      |         4 | varchar(20)
     name        |         5 | varchar(41)
     type        |         5 | varchar(20)
     hike_safety |         4 | float
    (4 rows)
    

If you create a basic flex table, and later find you want to promote one or more virtual columns to real columns, see Materializing Flex Tables to add columns.

Materialize Virtual Columns in a Hybrid Flex Table

After you explore your flex table data, you can promote one ore more virtual columns in your flex table to real columns. You do not need to create a separate columnar table.

  1. Invoke the materialize_flextable_columns() function on the hybrid table, specifying the number of virtual columns to materialize: 

    => SELECT materialize_flextable_columns('mountains_hybrid', 3);
                                     materialize_flextable_columns                                                              
    --------------------------------------------------------------------------------------------
     The following columns were added to the table public.mountains_hybrid:
            type
    For more details, run the following query:
    SELECT * FROM v_catalog.materialize_flextable_columns_results WHERE 
    table_schema = 'public' and table_name = 'mountains_hybrid';
    
    (1 row)
    
  2. You specified three (3) columns to materialize, but the table was created with two real columns (name and hike_safety). To fulfill your three-column specification, the function promotes only one other column, type. The next example has expanded display to list the columns vertically. Notice the ADDED status for the column that was just materialized, rather than EXISTS for the two columns you defined when creating the table: 

    => \x
    Expanded display is on.
    => SELECT * from materialize_flextable_columns_results where table_name = 'mountains_hybrid';
    -[ RECORD 1 ]-+-------------------------------------------------------
    table_id      | 45035996273766044
    table_schema  | public
    table_name    | mountains_hybrid
    creation_time | 2013-11-30 20:09:37.765257-05
    key_name      | type
    status        | ADDED
    message       | Added successfully
    -[ RECORD 2 ]-+-------------------------------------------------------
    table_id      | 45035996273766044
    table_schema  | public
    table_name    | mountains_hybrid
    creation_time | 2013-11-30 20:09:37.765284-05
    key_name      | hike_safety
    status        | EXISTS
    message       | Column of same name already exists in table definition
    -[ RECORD 3 ]-+-------------------------------------------------------
    table_id      | 45035996273766044
    table_schema  | public
    table_name    | mountains_hybrid
    creation_time | 2013-11-30 20:09:37.765296-05
    key_name      | name
    status        | EXISTS
    message       | Column of same name already exists in table definition
    
  3. Now, display the hybrid table definition, listing the __raw__ column and the three materialized columns. Flex table data types are derived from the associated keys tables, so you can update them as necessary. Notice that the __raw__ column has a default NOT NULL constraint:  

    => \d mountains_hybrid
    List of Fields by Tables
    -[ RECORD 1 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | __raw__
    Type        | long varbinary(130000)
    Size        | 130000
    Default     |
    Not Null    | t
    Primary Key | f
    Foreign Key |
    -[ RECORD 2 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | name
    Type        | varchar(41)
    Size        | 41
    Default     | (MapLookup(mountains_hybrid.__raw__, 'name'))::varchar(41)
    Not Null    | f
    Primary Key | f
    Foreign Key |
    -[ RECORD 3 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | hike_safety
    Type        | float
    Size        | 8
    Default     | (MapLookup(mountains_hybrid.__raw__, 'hike_safety'))::float
    Not Null    | f
    Primary Key | f
    Foreign Key |
    -[ RECORD 4 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | type
    Type        | varchar(20)
    Size        | 20
    Default     | (MapLookup(mountains_hybrid.__raw__, 'type'))::varchar(20)
    Not Null    | f
    Primary Key | f
    Foreign Key |

You have now completed getting started with flex table basics, hybrid flex tables, and using flex functions.