CREATE PROJECTION (UDTFs)

Creates metadata in the Vertica catalog for projections that invoke user-defined transform functions (UDTFs).

Currently, live aggregate projections can only reference UDTFs that are developed in C++.

Syntax

CREATE PROJECTION [ IF NOT EXISTS ] [[database.]schema.]projection
   [ ( 
       { projection-columns | grouped-clause
          [ ENCODING encoding-type ] 
          [ ACCESSRANK integer ] 
       } [,…] 
   ) ]
AS {
      batch-query FROM { prepass-query sq-results | table [ [AS] ALIAS] }
      | prepass-query
   }

batch-query

SELECT { table-column | expr-with-table-columns }[,…], batch-udtf(batch-args)
   OVER (PARTITION BATCH BY partition-column-expr[,…] )
   [ AS (batch-output-columns) ]

                    
prepass-query

SELECT { table-column | expr-with-table-columns }[,…], prepass-udtf(prepass-args)
   OVER (PARTITION PREPASS BY partition-column-expr[,…] )
   [ AS (prepass-output-columns) ] FROM table-ref

Parameters

IF NOT EXISTS

Specifies to generate an informational message if an object already exists under the specified name. If you omit this option and the object exists, Vertica generates a ROLLBACK error message. In both cases, the object is not created.

The IF NOT EXISTS clause is useful for SQL scripts where you want to create an object if it does not already exist, and reuse the existing object if it does.

For related information, see ON_ERROR_STOP.

[database.]schema Specifies the schema for this projection and its anchor table, where schema must be the same for both. If you specify a database, it must be the current database.
projection

Identifies the projection to create, where projection conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. The projection is created in the same schema as the anchor table.

projection‑column

The name of a projection column.

If you do not specify projection column names, Vertica uses the anchor table column names that are specified in the SELECT statement.

grouped‑clause

See GROUPED Clause.

ENCODING encoding‑type

Specifies the column encoding type, by default set to AUTO.

ACCESSRANK integer

Overrides the default access rank for a column. Use this parameter to increase or decrease the speed at which Vertica accesses a column. For more information, see Overriding Default Column Ranking.

table-column
| expr-with-table-columns

A table column or expression of table columns to include in the projection.

batch-udtf(batch‑args)

The batch UDTF to invoke each time the following events occur:

  • Tuple mover mergeout
  • Queries on the projection
  • If invoked singly, on data load operations

If the projection definition includes a pre-pass subquery, batch‑args must exactly match the pre-pass UDTF output columns, in name and order.

prepass-udtf(prepass‑args)

The pre-pass UDTF to invoke on each load operation such as COPY or INSERT.

If specified in a subquery, the pre-pass UDTF returns transformed data to the batch query for further processing. Otherwise, the pre-pass query results are added to projection data storage.

OVER (PARTITION BATCH BY partition‑col‑expr[,…]

Specifies the UDTF type and how to partition the data it returns:

In both cases, the OVER clause specifies partitioning with one or more column expressions from the SELECT list. The first partition-column-expr is the first column expression in the SELECT list, the second partition-column-expr is the second column expression in the SELECT list, and so on.

The projection is implicitly segmented and ordered on PARTITION BY columns.

AS (batch-output-columns)
AS (prepass-output-columns)

Optionally names columns that are returned by the UDTF.

If a pre-pass subquery omits this clause, the outer batch query UDTF arguments (batch‑args) must reference the column names as they are defined in the pre-pass UDTF.

table [ [AS] ALIAS] Specifies the projection's anchor table, optionally qualified by an alias.
sq‑results Subquery result set that is returned to the outer batch UDTF.

Privileges

Superuser, or the following:

  • Anchor table owner
  • CREATE privilege on the schema
  • EXECUTE privileges on all UDTFs that are referenced by the projection

Restrictions

Vertica does not regard live aggregate projections as superprojections, even one that includes all table columns.

UDTF Types

CREATE PROJECTION can define live aggregate projections that invoke user-defined transform functions (UDTFs). Vertica invokes UDTFs at multiple points of projection processing:

  • Pre-pass UDTFs: Invoked when data is loaded into the projection's anchor table—for example through COPY or INSERT statements. A pre-pass UDTF transforms the new data before it is stored in the projection's ROS containers. You identify a pre-pass UDTF in the projection's PARTITION BY clause, through the keyword PREPASS.
  • Batch UDTFs: Invoked on three events: the Vertica tuple mover consolidates stored projection data (mergeout); the projection is queried; and if invoked singly, data load operations. In all cases, the UDTF aggregates projection data and stores the aggregated results. Aggregation is cumulative across mergeout and load operations, and is completed (if necessary) on query execution. You identify a batch UDTF in the projection's PARTITION BY clause, through the keyword BATCH.

Vertica stores all UDTF results in projection ROS containers, thereby enabling faster response time when you query the projection.

UDTF Specification Options

A projection definition can specify up to two UDTFs, in any of the following ways:

  • Single pre-pass UDTF: The pre-pass UDTF transforms newly loaded data and stores it in the projection. Use the following syntax:
    => CREATE PROJECTION projection-name 
        ({ projection-column | grouped-clause 
    	   [ ENCODING encoding-type ] 
              [ ACCESSRANK integer ] })
        AS SELECT ..., udtf(args)
        OVER(PARTITION PREPASS BY partition-columns) AS (prepass-output-columns) FROM table-ref;
  • Single batch UDTF: The batch UDTF transforms and aggregates projection data on mergeout, insert, and query operations. Use the following syntax:
    => CREATE PROJECTION projection-name 
        ({ projection-column | grouped-clause 
    	   [ ENCODING encoding-type ] 
              [ ACCESSRANK integer ] }) 
        AS SELECT ..., udtf(args)
        OVER(PARTITION BATCH BY partition-columns) AS (batch-output-columns) FROM table-ref;
  • Pre-pass and batch UDTFs: You can define a projection with a subquery that invokes a pre-pass UDTF. The pre-pass UDTF returns transformed data to the top-level batch query. The batch UDTF then iteratively aggregates all projection data. Use the following syntax:
    => CREATE PROJECTION projection-name 
        ({ projection-column | grouped-clause 
    	   [ ENCODING encoding-type ] 
              [ ACCESSRANK integer ] }) 
        AS SELECT ..., batch-udtf(batch-args)
        OVER ( PARTITION BATCH BY partition-columns ) AS (batch-output-columns)
        FROM ( SELECT ..., prepass-udtf(prepass-args) 
            OVER ( PARTITION PREPASS BY partition-columns) AS (prepass-output-columns)
            FROM table-ref ) sq-ref;

Examples

See Examples in Pre-Aggregating UDTF Results.

See Also

Pre-Aggregating UDTF Results