Monitoring Exports

You can review information about exports, including numbers of row groups, file sizes, and file names.

EXPORT TO PARQUET is a UDx. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (__RAW__), which holds a VMap of whatever additional data an individual UDx logged. Parquet export logs details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.

The following statement defines a view showing only the events from EXPORT TO PARQUET, materializing the VMap values.

=> CREATE VIEW parquet_export_events AS
   SELECT
    report_time,
    node_name,
    session_id,
    user_id,
    user_name,
    transaction_id,
    statement_id,
    request_id,
    udx_name,
    file,
    created,
    closed,
    rows,
    row_groups,
    size_mb
   FROM
    v_monitor.udx_events
   WHERE
    udx_name ilike 'ParquetExport%';

EXPORT TO PARQUET reports the following UDx-specific columns:

Column Name Data Type Description
FILE VARCHAR Name of the output file.
CREATED TIMESTAMPTZ

When the file was created.

CLOSED TIMESTAMPTZ

When the file was closed after writing.

ROWS INTEGER

The total number of rows in the file.

ROW_GROUPS INTEGER The number of row groups.
SIZE_MB FLOAT File size.

The following example shows the results of a single export.

=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
                                  file                                 | rows  | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
 /data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1          | 0.667203
 /data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1          | 0.660922
(2 rows)

In this table, the output directory name (/data/out) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.