EXPORT TO PARQUET

Exports a table, columns from a table, or query results to files in the Parquet format. You can use an OVER() clause to partition the data before export. Partitioning data can improve query performance by enabling partition pruning; see Improving Query Performance for Data Stored in HDFS.

You can export data stored in Vertica in ROS format and data from external tables.

The exported files are owned by the superuser.

EXPORT TO PARQUET returns the number of rows written.

During the export, Vertica writes files to a temporary directory in the same location as the destination and renames the directory when the export is complete. Do not attempt to use the files in the temporary directory.

Syntax

EXPORT TO PARQUET ( directory = 'directory'
		    [, compression = 'compression_type' ]
		    [, rowGroupSizeMB = 'size' ] )
	[ OVER (over-clause ) ] 
	AS SELECT query-expression;

Parameters

directory

The destination directory for the Parquet files. The directory must not exist, and the current user must have permission to write it. The directory can be in HDFS or on the local file system.

compression_type

Column compression type, one of Snappy or Uncompressed. The default is Snappy.

size

The uncompressed size of exported row groups, in MB. The maximum value is 512 and the default is 64. The row groups in the exported files are smaller because Parquet files are compressed on write. For best performance, set size to be smaller than the HDFS block size.

Arguments

over-clause

Specifies how to partition table data using PARTITION BY. Within partitions you can sort by using ORDER BY. See SQL Analytics. This clause may contain column references but not expressions.

If you partition data, Vertica creates a Hive-style partition directory structure. See Using Partition Columns for a description of the structure.

If you omit this clause, Vertica optimizes for maximum parallelism.

query-expression

Specifies the data to be exported. See SELECT for the syntax.

Privileges

Notes

Examples

The following example demonstrates exporting all columns from the T1 table in the public schema, using Snappy compression (the default).

=> EXPORT TO PARQUET(directory = 'hdfs:///user1/data') 
  AS SELECT * FROM public.T1;

 Rows Exported
---------------
         87436
(1 row)

The following example demonstrates exporting the results of a query using more than one table.

=> EXPORT TO PARQUET(directory='hdfs:///data/sales_by_region') 
   AS SELECT sale.price, sale.date, store.region 
   FROM public.sales sale 
   JOIN public.vendor store ON sale.distribID = store.ID;
 Rows Exported
---------------
         23301
(1 row)

The following example demonstrates partitioning and exporting data. EXPORT TO PARQUET first partitions the data on 'b' and then, within each partition, sorts by 'd'.

=> EXPORT TO PARQUET(directory = 'hdfs:///user2/data') 
  OVER(PARTITION BY b ORDER BY d) AS SELECT b, d FROM public.T2;

 Rows Exported
---------------
        120931
(1 row)

The following example uses an alias column label for a selected column target that is an expression.

=> EXPORT TO PARQUET(directory = 'hdfs:///user3/data') 
  OVER(ORDER BY col1) AS SELECT col1 + col1 AS A, col2
  FROM public.T3;
 Rows Exported
---------------
         14336
(1 row)