Vertica Blog

Vertica Blog

Integrations

Visual data flow graph showing parallel spark Vertica data sharing

Integrating with Apache Spark

The Vertica Connector for Apache Spark is a fast parallel connector that allows you to use Apache Spark for pre-processing data. Apache Spark is an open-source, general purpose, cluster-computing framework. The Spark framework is based on Resilient Distributed Datasets (RDDs), which are logical collections of data partitioned across machines. For more information, see the Apache...