Vertica 8.1.1 introduces an optional parameter to the FCSVPARSER function. The FCSVPARSER specifies how to load data into Vertica from a CSV data source. The new parameter allows you to define or override column names in the target file for data loaded from a CSV data source. Previously, the parser used default column names from the CSV header line or default column names when no header line was shown in the source data.
This functionality is useful if you want to load headerless CSV data and then specify the header names for each column. You can also override predefined header names in your CSV data and replace them with another list of header names. Vertica has similar functionality for MAPDELIMITEDEXTRACTOR and has extended the functionality to the FCSVPARSER() function.
The syntax for the function now looks like the following, where is a constant string of the header names:
The string is parsed the same way as any other CSV data source record.
Let’s take a look at an example:
In this chart, we see two different data sources – one that contains header-less data and another that contains a header line that we want to replace. As part of the COPY command for the cities1 data source, we name the headers City, State, and Zip using the new header_names parameter. For the cities2 data source, let’s replace City_Name, ST, and ZCODE with City, State and Zip.
The goal is to combine the two sources and align the data. This is a common operation that is executed on heterogeneous sources during the extraction, transformation, and load process. In the final step, we see both arrows point to the same target table, with the column data associated with our new header names. Clearly, using the header_names parameter makes these columns more readable and helps standardize column names at loading time, without executing additional SQL commands or custom scripts.
For more information, see FCSVPARSER in the Vertica documentation.