From Vertica 12.0.4 onwards, we support creating external tables with parquet files in Apache Iceberg Table format. Iceberg is a high-performance table format which uses advanced filtering techniques and provides better performance by avoiding file listing. The striking feature of the external Iceberg tables is that there is no need to specify columns when defining the DDL and it is automatically read from the Iceberg metadata during query time.
How to create Iceberg external table in Vertica?
Apache Iceberg uses 2 layers: Metadata layer and Data layer. Metadata Layer consists of metadata and manifest files. It has information about the table schema, data files pertaining to that table, and so on. Data Layer consists of actual parquet files. You can create an external table by just passing the path of the desired metadata.json or the base location of the table. If you pass the base location, Vertica always uses the latest metadata file.
vertica1204db=> CREATE EXTERNAL TABLE test stored by Iceberg location '/home/dbadmin/warehouse/db/table/metadata/v2.metadata.json';
vertica1204db=> select * from test;
id | data
1 | a
2 | b
3 | c
Do the files need to exist in Vertica cluster to create Iceberg external tables?
No, it is not mandatory. Metadata and data files can be on S3, HDFS, or on the Vertica nodes.
For more information, see CREATE EXTERNAL TABLE ICEBERG in the Vertica documentation.