Reading ORC and Parquet Formats

You can create external tables for data in any format that COPY supports. Among them, Vertica is optimized for two columnar formats, ORC (Optimized Row Columnar) and Parquet. These formats are common among Hadoop users but are not restricted to Hadoop; you can place Parquet files on S3, for example.

ORC and Parquet, like ROS in Vertica, are columnar formats. The files contain metadata that allows Vertica to read only the portions that are needed for a query and to skip entire files. External tables with ORC or Parquet data therefore generally provide better performance then ones using delimited or other formats where the entire file must be scanned.

If you have ORC or Parquet data, you can take advantage of optimizations including partition pruning and predicate pushdown. If you export data from Vertica, consider exporting to one of these formats so that you can take advantage of their performance benefits when using external tables. See Exporting Data.

Requirements

ORC or Parquet files must not use complex data types. Vertica supports all simple data types supported in Hive version 0.11 or later.

Files compressed by Hive or Impala require Zlib (GZIP) or Snappy compression. Vertica does not support LZO compression for these formats.

In This Section