Rewriting Queries

You can use directed queries to change the semantics of a given query—that is, substitute one query for another. This can be especially important when you have little or no control over the content and format of input queries that your Vertica database processes. You can map these queries to directed queries that rewrite the original input for optimal execution.

The following sections describe two use cases:

Rewriting Join Queries

Many of your input queries join multiple tables. With the recent introduction of flattened tables, you've determined that in many cases, it would be more efficient to denormalize much of your data in several wide tables and query those tables directly. You cannot revise the input queries themselves. However, you can use directed queries to map these queries to queries on the flattened table data.

For example, the following query aggregates regional sales of white wine products, by joining three tables in the VMart database:

=> SELECT SD.store_region AS 'Sales Region', 
      SD.store_city AS 'City', 
      SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
  JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%' AND P.product_description ILIKE '%white%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY 1,3 DESC;

You can consolidate the joined table data in a single flattened table, and query this table instead. By doing so, you can access the same data faster. You can create this table with the following SQL:

CREATE TABLE store.store_sales_wide AS SELECT * FROM store.store_sales_fact;
ALTER TABLE store.store_sales_wide ADD COLUMN store_name VARCHAR(64) 
  SET USING (SELECT store_name FROM store.store_dimension 
  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
ALTER TABLE store.store_sales_wide ADD COLUMN store_city varchar(64) 
  SET USING (SELECT store_city FROM store.store_dimension 
  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
ALTER TABLE store.store_sales_wide ADD COLUMN store_state char(2) 
  SET USING (SELECT store_state char FROM store.store_dimension 
  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
ALTER TABLE store.store_sales_wide ADD COLUMN store_region varchar(64) 
  SET USING (SELECT store_region FROM store.store_dimension 
  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
ALTER TABLE store.store_sales_wide ADD column product_description VARCHAR(128) 
  SET USING (SELECT product_description FROM public.product_dimension 
  WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
ALTER TABLE store.store_sales_wide ADD COLUMN sku_number char(32) 
  SET USING (SELECT sku_number char FROM product_dimension 
  WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);

SELECT REFRESH_COLUMNS ('store.store_sales_wide','', 'rebuild');

After creating this table and refreshing its SET USING columns, you can rewrite the earlier query as follows:

=> SELECT store_region AS 'Sales Region', 
      store_city AS 'City', 
      SUM(gross_profit_dollar_amount) AS Total
   FROM store.store_sales_wide
   WHERE product_description ILIKE '%wine%' AND product_description ILIKE '%white%'
   GROUP BY ROLLUP (store_region, store_city)
   ORDER BY 1,3 DESC;

 Sales Region |       City       |  Total
--------------+------------------+---------
 East         |                  |  762632
 East         | Sterling Heights |   67391
 East         | Allentown        |   64528
 East         | Elizabeth        |   58784
 East         | New Haven        |   57730
 East         | Boston           |   57315
 East         | Lowell           |   42734
 East         | Charlotte        |   40757
 East         | Erie             |   39070
 East         | Washington       |   38792
 East         | Waterbury        |   35369
 East         | Hartford         |   27684
 East         | Clarksville      |   25953
 East         | Stamford         |   25657
 East         | Memphis          |   25629
 East         | Baltimore        |   23999
 East         | Columbia         |   22573
 East         | Manchester       |   22496
 East         | Nashville        |   22353
 East         | Cambridge        |   14385
 East         | Philadelphia     |   13017
 East         | Alexandria       |   12273
 East         | New York         |   12261
 East         | Portsmouth       |   11882
 MidWest      |                  |  494182
 MidWest      | Lansing          |   69157
 MidWest      | Livonia          |   62269
 ...

Querying the flattened table is more efficient; however, you still must account for input queries that continue to use the earlier join syntax. You can do so by creating a custom directed query, which redirects these input queries to the desired syntax:

  1. Save the input query:

    => SAVE QUERY SELECT SD.store_region AS 'Sales Region', 
          SD.store_city AS 'City', 
          SUM(SF.gross_profit_dollar_amount) Total
       FROM store.store_sales_fact SF
       JOIN store.store_dimension SD ON SF.store_key=SD.store_key
       JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
       WHERE P.product_description ILIKE '%wine%' AND P.product_description ILIKE '%white%'
       GROUP BY ROLLUP (SD.store_region, SD.store_city)
       ORDER BY 1,3 DESC;
    SAVE QUERY
  2. Map the saved query to a directed query with the desired syntax, and activate the directed query:

    => CREATE DIRECTED QUERY CUSTOM 'RegionalSalesWhiteWine' 
        SELECT store_region AS 'Sales Region', 
          store_city AS 'City', 
          SUM(gross_profit_dollar_amount) AS Total
        FROM store.store_sales_wide
        WHERE product_description ILIKE '%wine%' AND product_description ILIKE '%white%'
        GROUP BY ROLLUP (store_region, store_city)
        ORDER BY 1,3 DESC;
    CREATE DIRECTED QUERY
    
    => ACTIVATE DIRECTED QUERY RegionalSalesWhiteWine;
    ACTIVATE DIRECTED QUERY

When directed query RegionalSalesWhiteWine is active, the query optimizer maps all queries that match the original input format to the directed query, as shown in the following query plan:

=> EXPLAIN SELECT SD.store_region AS 'Sales Region', 
       SD.store_city AS 'City', 
       SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
     WHERE P.product_description ILIKE '%wine%' AND P.product_description ILIKE '%white%'
     GROUP BY ROLLUP (SD.store_region, SD.store_city)
     ORDER BY 1,3 DESC;

 The following active directed query(query name: RegionalSalesWhiteWine) is being executed:
 SELECT store_sales_wide.store_region AS "Sales Region", 
   store_sales_wide.store_city AS City,
   sum(store_sales_wide.gross_profit_dollar_amount) AS Total FROM store.store_sales_wide 
   WHERE ((store_sales_wide.product_description ~~* '%wine%'::varchar(6)) 
   AND (store_sales_wide.product_description ~~* '%white%'::varchar(7)))
   GROUP BY GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ()) 
   ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 76K, Rows: 100] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 76K, Rows: 100] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 23K, Rows: 2M] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_super
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: ((store_sales_wide.product_description ~~* '%wine%') AND (store_sales_wide.product_description ~~* '%white%'))
 | | |      Execute on: All Nodes

To compare the costs of executing the directed query and executing the original input query, deactivate the directed query and use EXPLAIN on the original input query. The optimizer reverts to creating a plan for the input query that incurs significantly greater cost, as shown in the following query plan:

=> DEACTIVATE DIRECTED QUERY RegionalSalesWhiteWine;
DEACTIVATE DIRECTED QUERY

=> EXPLAIN SELECT SD.store_region AS 'Sales Region', 
       SD.store_city AS 'City', 
       SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
     WHERE P.product_description ILIKE '%wine%' AND P.product_description ILIKE '%white%'
     GROUP BY ROLLUP (SD.store_region, SD.store_city)
     ORDER BY 1,3 DESC;

 Access Path:
 +-SORT [Cost: 192K, Rows: 100] (PATH ID: 1)
 |  Order: SD.store_region ASC, sum(SF.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 192K, Rows: 100] (PATH ID: 2)
 | |      Aggregates: sum(SF.gross_profit_dollar_amount)
 | |      Group By: SD.store_region, SD.store_city
 | |      Grouping Sets: (SD.store_region, SD.store_city, <SVAR>), (SD.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> JOIN HASH [Cost: 17K, Rows: 5M] (PATH ID: 3) Inner (BROADCAST)
 | | |      Join Cond: (concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))
 | | |      Materialize at Input: SF.product_key, SF.product_version
 | | |      Materialize at Output: SF.gross_profit_dollar_amount
 | | |      Execute on: All Nodes
 | | | +-- Outer -> JOIN HASH [Cost: 5K, Rows: 5M] (PATH ID: 4) Inner (BROADCAST)
 | | | |      Join Cond: (SF.store_key = SD.store_key)
 | | | |      Execute on: All Nodes
 | | | | +-- Outer -> STORAGE ACCESS for SF [Cost: 4K, Rows: 5M] (PATH ID: 5)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: SF.store_key
 | | | | |      Execute on: All Nodes
 | | | | |      Runtime Filters: (SIP2(HashJoin): SF.store_key), (SIP1(HashJoin): concat((SF.product_key)::varchar, (SF.product_version)::varchar))
 | | | | +-- Inner -> STORAGE ACCESS for SD [Cost: 49, Rows: 250] (PATH ID: 6)
 | | | | |      Projection: store.store_dimension_super
 | | | | |      Materialize: SD.store_key, SD.store_city, SD.store_region
 | | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for P [Cost: 378, Rows: 60K] (PATH ID: 7)
 | | | |      Projection: public.product_dimension_super
 | | | |      Materialize: P.product_key, P.product_version
 | | | |      Filter: ((P.product_description ~~* '%wine%') AND (P.product_description ~~* '%white%'))
 | | | |      Execute on: All Nodes

Using Query Templates

You can use directed queries to implement multiple queries that are identical except for the predicate strings on which query results are filtered. For example, directed query RegionalSalesWhiteWine only handles input queries that filter on product_description values containing the strings wine and white. You can create a modified version of this directed query that matches the syntax of multiple input queries, which differ only in their pairs of input values—for example, wine and red.

You create this query template in the following steps:

  1. Use the input query on the flattened table to create an optimized-generated directed query:

    => CREATE DIRECTED QUERY optimizer RegionalSalesTemp 
        SELECT store_region AS 'Sales Region', 
          store_city AS 'City', 
          SUM(gross_profit_dollar_amount) AS Total
        FROM store.store_sales_wide
        WHERE product_description ILIKE '%wine%' 
          AND product_description ILIKE '%white%'
        GROUP BY ROLLUP (store_region, store_city)
        ORDER BY 1,3 DESC;
    CREATE DIRECTED QUERY
  2. Modify the original join input query with IGNORECONST hints and save it:

    => SAVE QUERY SELECT SD.store_region AS 'Sales Region', 
          SD.store_city AS 'City', 
          SUM(SF.gross_profit_dollar_amount) Total
        FROM store.store_sales_fact SF
        JOIN store.store_dimension SD ON SF.store_key=SD.store_key
        JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
        WHERE P.product_description ILIKE 'desc-string1' /*+IGNORECONST(1)*/ 
          AND P.product_description ILIKE 'desc-string2' /*+IGNORECONST(2)*/
        GROUP BY ROLLUP (SD.store_region, SD.store_city)
        ORDER BY 1,3 DESC;
    SAVE QUERY
  3. Get the optimizer-generated query from system table DIRECTED_QUERIES:

    => \x
    => SELECT annotated_query FROM DIRECTED_QUERIES WHERE query_name='RegionalSalesTemp';
    -[ RECORD 1 ]---+-------------------------------------------------------------------------
    annotated_query | SELECT /*+verbatim*/ store_sales_wide.store_region AS "Sales Region", 
      store_sales_wide.store_city AS City, 
      sum(store_sales_wide.gross_profit_dollar_amount) AS Total
    FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/
    WHERE ((store_sales_wide.product_description ~~* '%wine%'::varchar(6)) 
       AND (store_sales_wide.product_description ~~* '%white%'::varchar(7)))
    GROUP BY  /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ())
    ORDER BY 1 ASC, 3 DESC
    (1 row)
    
  4. Create a modified version of the annotated query that qualifies the input values with IGNORECONST hints. Use this version to create a custom directed query, and then activate it:

    => CREATE DIRECTED QUERY CUSTOM RegionalSales
    SELECT /*+verbatim*/ store_sales_wide.store_region AS "Sales Region", 
      store_sales_wide.store_city AS City, 
      sum(store_sales_wide.gross_profit_dollar_amount) AS Total
    FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/
    WHERE ((store_sales_wide.product_description ~~* 'desc-str'/*+IGNORECONST(1)*/) 
       AND (store_sales_wide.product_description ~~* 'desc-str'/*+IGNORECONST(2)*/))
    GROUP BY  /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ())
    ORDER BY 1 ASC, 3 DESC;
    CREATE DIRECTED QUERY
    
    => ACTIVATE DIRECTED QUERY RegionalSales;
    

After activating this directed query, Vertica can use it for input queries that match the template. These queries can use any pair of strings to filter the result set. For example, the following input query filters on the strings chicken and frozen:

EXPLAIN SELECT SD.store_region AS 'Sales Region', 
      SD.store_city AS 'City', 
      SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
    WHERE P.product_description ILIKE '%chicken%' AND P.product_description ILIKE '%frozen%'
    GROUP BY ROLLUP (SD.store_region, SD.store_city)
    ORDER BY 1,3 DESC;

 The following active directed query(query name: RegionalSales) is being executed:
 SELECT /*+verbatim*/  store_sales_wide.store_region AS "Sales Region", 
   store_sales_wide.store_city AS City, 
   sum(store_sales_wide.gross_profit_dollar_amount) AS Total 
 FROM store.store_sales_wide store_sales_wide/*+projs('store.store_sales_wide')*/ 
 WHERE ((store_sales_wide.product_description ~~* '%chicken%'::varchar(9)) 
    AND (store_sales_wide.product_description ~~* '%frozen%'::varchar(8))) 
 GROUP BY /*+GByType(Hash)*/  GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ()) 
 ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 214K, Rows: 100] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 214K, Rows: 100] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 39K, Rows: 5M] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_super
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: ((store_sales_wide.product_description ~~* '%chicken%') AND (store_sales_wide.product_description ~~* '%frozen%'))
 | | |      Execute on: All Nodes

When you execute this query, it returns with the following results:

 Sales Region |       City       |  Total
--------------+------------------+---------
 East         |                  | 1421528
 East         | Sterling Heights |  122205
 East         | Elizabeth        |  121061
 East         | Boston           |  113776
 East         | Allentown        |  109808
 East         | Lowell           |  101147
 East         | New Haven        |   95566
 East         | Waterbury        |   76076
 East         | Erie             |   68937
 East         | Charlotte        |   68032
 East         | Washington       |   64503
 East         | Clarksville      |   50772
 East         | Columbia         |   48207
 East         | Memphis          |   47014
 East         | Stamford         |   46150
 East         | Baltimore        |   45588
 East         | Nashville        |   44868
 East         | Manchester       |   43682
 East         | Hartford         |   43605
 East         | Philadelphia     |   24460
 East         | New York         |   21851
 East         | Alexandria       |   21794
 East         | Cambridge        |   21750
 East         | Portsmouth       |   20676
 MidWest      |                  |  937225
 MidWest      | Lansing          |  130754
 MidWest      | Livonia          |  121409
...