Common Use Cases
Vertica supports a variety of use cases for reading data. Some of the most common are summarized here with links for more information. This is not a complete list of capabilities.
The COPY statement is central to loading data. See Introduction to the COPY Statement for an overview of its use.
Loading Data from Files
You might have data, perhaps quite a bit of data, that you want to load into Vertica. These files might reside on shared storage, in the cloud, or on local nodes, and might be in a variety of formats.
For information about source locations, see Specifying Where to Load Data From. To handle data in different formats you specify a parser; for more information about the options, see Parsers for Various Data Formats.
You are not limited to loading data "as-is"; you can also transform it during load. See Transforming Data During Loads.
Loading Data from Other Services
Apache Kafka is a platform for streaming data. Vertica supports streaming data to and from Kafka. See Integrating with Apache Kafka .
Apache Spark is a cluster-computing framework for distributed data. Vertica supports connecting to Spark for data. See Integrating with Apache Spark.
You can copy data directly from another Vertica cluster, instead of exporting to files and then loading those files. See Copying Data Between Vertica Databases.
Read Data Where It Exists (Don't Import)
Instead of loading data into Vertica, you can read it in place using external tables. External tables can be advantageous in the following cases:
- If you want to explore data, such as in a data lake, before selecting data to load into Vertica.
- If you are one of several consumers sharing the same data, for example in a data lake, then reading it in place eliminates concerns about whether query results are up to date. There's only one copy, so all consumers see the same data.
- If your data changes rapidly but you do not want to stream it into Vertica, you can instead query the latest updates automatically.
- If you have lower-priority data in Vertica that you still want to be able to query.
When you query an external table, Vertica loads the data it needs from the external source. The Parquet and ORC columnar formats are optimized for this kind of load, so using external tables does not necessarily have a large effect on performance compared to loading data into Vertica-managed tables.
For more information about using external tables, see Working with External Data.
Some data formats support complex types such as arrays and structs (sets of property-value pairs). Depending on how you want to use them, Vertica provides several options:
- You can define external tables using arrays, structs, and maps reading data from Parquet data. See Reading Complex Types from Parquet Files.
- You can define columnar tables using one-dimensional arrays of scalar types. You can load data using the Parquet and delimited (default) parsers. See PARQUET (Parser) and Loading Delimited Data.
- You can define columnar tables using flexible (schemaless) complex types, allowing you to load types and type structures (such as arrays of structs or maps of arrays) that you would not be able to describe directly. You can load flexible complex types in the Parquet, JSON, and Avro formats. See Using Flexible Complex Types.
Sometimes data is not clean; values might not match the declared data types, or required values might be missing, or the parser might not be able to interpret a row for other reasons. You might still want to be able to load and explore this data. You can specify how error-tolerant to be and where to record information about rejected data using parameters to the COPY statement. For more information, see Handling Messy Data.
You load data into a table, and to define a table you need to know what the schema is (columns and their data types). Sometimes, however, you do not know this or you need to support heterogeneous data in one table. Vertica Flex tables support loading data without a full schema. Loading into a Flex table is generally like loading into any other table, but for some formats you need to use a different parser. For more information about Flex tables and Flex-specific parsers, see Using Flex Tables.