EXPORT TO ORC
Exports a table, columns from a table, or query results to files in the ORC format.
You can use an OVER() clause to partition the data before export. You can partition data instead of or in addition to exporting the column data. Partitioning data can improve query performance by enabling partition pruning. See Improving Query Performance.
There are some limitations on the queries you can use in an export statement. See Query Restrictions.
You can export data stored in Vertica in ROS format and data from external tables.
This statement returns the number of rows written and logs information about exported files in a system table. See Monitoring Exports.
During an export to HDFS or an NFS mount point, Vertica writes files to a temporary directory in the same location as the destination and renames the directory when the export is complete. Do not attempt to use the files in the temporary directory. During an export to S3, GCS, or Azure, Vertica writes files directly to the destination path, so you must wait for the export to finish before reading the files. For more information, see Exporting to Object Stores.
Syntax
EXPORT [ /*+LABEL (label‑string)*/ ] TO ORC ( directory='path'[, param=value [,...] ] ) [ OVER (over‑clause ) ] AS SELECT query‑expression
Parameters
LABEL
|
Assigns a label to a statement to identify it for profiling and debugging. |
directory
|
The destination directory for the output files. The directory must not exist, and the current user must have permission to write it. The destination can be on any of the following file systems:
|
compression
|
Column compression type, one of:
|
stripeSizeMB
|
The uncompressed size of exported stripes in MB, an integer value between 1 and 1024, inclusive. Default: 250 |
rowIndexStride
|
Integer that specifies how frequently the exporter builds indexing statistics in the output, between 1 and 1000000 (1 million), inclusive. A value of 0 disables indexing. The exporter builds statistics after every Default: 1000 |
fileSizeMB
|
The maximum file size of a single output file. This value is a hint, not a hard limit. A value of 0 specifies no limit. This value affects the size of individual output files, not the total output size. For smaller values, Vertica divides the output into more files; all data is still exported. Default: 10GB |
fileMode
|
For writes to HDFS only, permission to apply to all exported files. You can specify the value in Unix octal format (such as Valid octal values range between When writing files to any destination other than HDFS, this parameter has no effect. Default: |
dirMode
|
For writes to HDFS only, permission to apply to all exported directories. Values follow the same rules as those for fileMode. Further, you must give the Vertica HDFS user full permission, at least When writing files to any destination other than HDFS, this parameter has no effect. Default: |
Arguments
over‑clause |
Specifies how to partition table data using PARTITION BY. Within partitions you can sort by using ORDER BY. See SQL Analytics. This clause may contain column references but not expressions. If you partition data, Vertica creates a Hive-style partition directory structure, transforming column names to lowercase. See Using Partition Columns for a description of the directory structure. If you omit this clause, Vertica optimizes for maximum parallelism. |
query‑expression |
Specifies the data to export. See Query Restrictions for important limitations. |
Privileges
Non-superusers:
- Source table: SELECT
- Source table schema: USAGE
- Destination directory: Write
Query Restrictions
You must provide an alias column label for selected column targets that are expressions.
The query can contain only a single outer SELECT statement. For example, you cannot use UNION as in the following example.
=> EXPORT TO ORC(directory = '/mnt/shared_nfs/accounts/rm') OVER(PARTITION BY hash) AS SELECT 1 as account_id, '{}' as json, 0 hash UNION ALL SELECT 2 as account_id, '{}' as json, 1 hash; ERROR 8975: Only a single outer SELECT statement is supported HINT: Please use a subquery for multiple outer SELECT statements
Instead, rewrite the query to use a subquery:
=> EXPORT TO ORC(directory = '/mnt/shared_nfs/accounts/rm') OVER(PARTITION BY hash) AS SELECT account_id, json FROM ( SELECT 1 as account_id, '{}' as json, 0 hash UNION ALL SELECT 2 as account_id, '{}' as json, 1 hash ) a; Rows Exported --------------- 2 (1 row)
To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.
Data Types
EXPORT TO ORC converts Vertica data types to Hive data types as shown in the following table.
Vertica Data Type | Hive Data Type |
---|---|
INTEGER |
BIGINT |
FLOAT |
Corresponding Hive type |
VARCHAR LONG VARCHAR |
VARCHAR (max 64KB) or STRING (can be read as either) |
BINARY |
BINARY |
DATE |
DATE if supported by your version of Hive, otherwise INT96 (can be read as TIMESTAMP) |
TIMESTAMP |
TIMESTAMP. Vertica does not convert TIMESTAMP values to UTC. To avoid problems arising from time zones, use TIMESTAMPTZ instead of TIMESTAMP. |
TIME TIMEZ INTERVAL UUID |
Not supported |
ARRAY | ARRAY |
SET ROW |
Not supported |
Decimal precision must be <= 38.
The exported Hive types might not be identical to the Vertica types. For example, a Vertica INT is exported as a Hive BIGINT. When defining Hive external tables to read exported data, you might have to adjust column definitions.
This operation exports raw Flex columns as binary data.
You cannot export data from an external table that contains external, non-native data types, such as ROW. See Complex Types. This restriction applies even if your SELECT statement does not include those columns.
Output
The export operation always creates an output directory, even if the query produces zero rows.
Output file names follow the pattern: [8-character-hash]-[nodename]-[thread-id].orc
.
Column names in partition directories are lowercase.
Files exported to a local file system by any Vertica user are owned by the Vertica superuser. Files exported to HDFS or object stores are owned by the Vertica user who exported the data.
The following requirements and restrictions apply:
- Avoid concurrent exports to the same output destination. Doing so is an error on any file system and can produce incorrect results.
- Use a shared file location for output. If you use a directory in the local file system, it must be an NFS-mounted directory.
- For output to the local file system, you must have a USER storage location.
- When exporting to the local file system, the permission mode is 700 for directories and 600 for files. You cannot override these values.
For restrictions specific to object stores (S3, GCS, or Azure), see Exporting to Object Stores.
Examples
The following example demonstrates partitioning and exporting data. EXPORT TO ORC first partitions the data on region and then, within each partition, sorts by store.
=> EXPORT TO ORC(directory='gs://DataLake/user2/data') OVER(PARTITION BY store.region ORDER BY store.ID) AS SELECT sale.price, sale.date, store.ID FROM public.sales sale JOIN public.vendor store ON sale.distribID = store.ID;
For more examples, see EXPORT TO PARQUET, which (aside from a few parameters) behaves the same as EXPORT TO ORC.