Loading Matches from Regular Expressions

You can load flex or columnar tables with the matched results of a regular expression, using the fregexparser. This section describes some examples of using the options that the flex parsers support.

Sample Regular Expression

These examples use the following regular expression, which searches information that includes the timestamp, date, thread_name, and thread_id strings.

For display purposes, this sample regular expression adds new line characters to split long lines of text. To use this expression in a query, first copy and edit the example to remove any new line characters.

This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).

'^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)] 
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'

Using Regular Expression Matches for a Flex Table

You can load the results from a regular expression into a flex table, using the fregexparser. For a complete example of doing so, see FREGEXPARSER.

Using fregexparser for Columnar Tables

This section illustrates how to load the results of a regular expression used with a sample log file for a Vertica database. By using an external table definition, the section presents an example of using fregexparser to load data into a columnar table. Using a flex table parser for a columnar tables gives you the capability to mix data loads in one table. For example, you can load the results of a regular expression in one session, and JSON data in another.

The following basic examples illustrate this usage.

  1. Create a columnar table, vlog, with the following columns:  
  2. => CREATE TABLE vlog (
    	"text" 			varchar(2322), 
    	thread_id 		varchar(28), 
    	thread_name 	varchar(44), 
    	"time" 			varchar(46), 
    	component 		varchar(30), 
    	level 			varchar(20), 
    	transaction_id 	varchar(32), 
    	elevel 			varchar(20), 
    	enode 			varchar(34) 
    );
    
  3. Use COPY to load parts of a log file using the sample regular expression presented above, with the fregexparser. Be sure to remove any line characters from this expression example before trying it yourself: 
  4. => COPY v_log FROM '/home/dbadmin/data/flex/vertica.log' PARSER 
    FRegexParser(pattern='^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
     (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
    -?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)] 
    \<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
    ?(?<text>.*)') rejected data as table fregex_reject;
    
  5. Query the time column: 
  6. => SELECT time FROM flogs limit 10;
              time
    -------------------------
     2014-04-02 04:02:02.613
     2014-04-02 04:02:02.613
     2014-04-02 04:02:02.614
     2014-04-02 04:02:51.008
     2014-04-02 04:02:51.010
     2014-04-02 04:02:51.012
     2014-04-02 04:02:51.012
     2014-04-02 04:02:51.013
     2014-04-02 04:02:51.014
     2014-04-02 04:02:51.017
    (10 rows)
    

Using External Tables with fregexparser

By creating an external columnar table for your Vertica log file, querying the table will return updated log information. The following basic example illustrate this usage.

  1. Create a columnar table, vertica_log, using the AS COPY clause and fregexparser to load matched results from the regular expression. For illustrative purposes, this regular expression has new line characters to split long text lines. Remove any line returns before testing with this expression:   
  2. => CREATE EXTERNAL TABLE public.vertica_log
    (
        "text" varchar(2322),
        thread_id varchar(28),
        thread_name varchar(44),
        "time" varchar(46),
        component varchar(30),
        level varchar(20),
        transaction_id varchar(32),
        elevel varchar(20),
        enode varchar(34)
    )
    AS COPY
    FROM '/home/dbadmin/data/vertica.log'
    PARSER FRegexParser(pattern='^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
     (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
    -?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)] 
    \<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
    ?(?<text>.*)');
  3. Query from the external table to get updated results: 
  4. => SELECT component, thread_id, time FROM vertica_log limit 10;
     component | thread_id  |          time
    -----------+------------+-------------------------
     Init      | 0x16321430 | 2014-04-02 04:02:02.613
     Init      | 0x16321430 | 2014-04-02 04:02:02.613
     Init      | 0x16321430 | 2014-04-02 04:02:02.613
     Init      | 0x16321430 | 2014-04-02 04:02:02.613
     Init      | 0x16321430 | 2014-04-02 04:02:02.613
     Init      | 0x16321430 | 2014-04-02 04:02:02.613
     Init      | 0x16321430 | 2014-04-02 04:02:02.613
               | 0x16321430 | 2014-04-02 04:02:02.614
               | 0x16321430 | 2014-04-02 04:02:02.614
               | 0x16321430 | 2014-04-02 04:02:02.614
    (10 rows)