Quick Tip: EXPORT TO PARQUET Compression with GZIP, Brotli, and ZSTD

Posted May 25, 2021 by James, Vertica Field Chief Technologist

Helpful Tips message on post-it note

EXPORT TO PARQUET exports a table, columns from a table, or query results to files in the Parquet format. These Parquet files use Snappy compression by default.

Starting in Vertica 10.1.1, EXPORT TO PARQUET supports the GZIP, Brotli, and ZSTD compression types!

Let’s see how these compression types compare in disk usage:

verticademos=> SELECT COUNT(*) FROM big;
   COUNT
-----------
134217728
(1 row)

Snappy compression:

verticademos=> EXPORT TO PARQUET (directory = '/home/dbadmin/parq_snappy') AS SELECT * FROM big;
Rows Exported
---------------
     134217728
(1 row)

verticademos=> \! du --summarize -h /home/dbadmin/parq_snappy
2.6G    /home/dbadmin/parq_snappy

GZIP compression:

verticademos=> EXPORT TO PARQUET (directory = '/home/dbadmin/parq_gzip', compression='GZIP') AS SELECT * FROM big;
Rows Exported
---------------
     134217728
(1 row)

verticademos=> \! du --summarize -h /home/dbadmin/parq_gzip
1.9G    /home/dbadmin/parq_gzip

Brotli compression:

verticademos=> EXPORT TO PARQUET (directory = '/home/dbadmin/parq_Brotli', compression='Brotli') AS SELECT * FROM big;
Rows Exported
---------------
    134217728
(1 row)

verticademos=> \! du --summarize -h /home/dbadmin/parq_Brotli
1.7G    /home/dbadmin/parq_Brotli

ZSTD compression:

verticademos=> EXPORT TO PARQUET (directory = '/home/dbadmin/parq_ZSTD', compression='ZSTD') AS SELECT * FROM big;
Rows Exported
---------------
     134217728
(1 row)

verticademos=> \! du --summarize -h /home/dbadmin/parq_ZSTD
1.7G    /home/dbadmin/parq_ZSTD

Hint: Although we can see in the example that the Brotli and ZSTD compression methods offer similar savings on disk space, there are other factors to keep in mind about using these methods in practice; in reality, the ZSTD performs much, much better than Brotli.

Have fun!