What’s New in Vertica 8.1: the Connector for Apache Spark

Posted April 20, 2017 by Gary Gray, Principal Information Developer

The Vertica 8.1 release includes updates for the Vertica Connector for Apache Spark. The biggest new feature is Apache Spark 2.0 compatibility. This makes the connector compatible with the version of Spark included with most recent Hadoop distributions.

In addition, the connector supports multiple versions of the Scala programming language. Most Hadoop distributions that support Spark 2.0 use Scala version 2.11. However, the connector also supports using Scala 2.10 with Spark 2.0. See the Vertica Integration for Apache Spark section of the Vertica 8.1.x Supported Platforms document for more information.

Vertica supplies multiple Spark connector JAR files to support the different combinations of Spark and Scala. Each connector JAR file is compatible with a specific combination of Vertica, Spark, and Scala versions. We’ve made it easy for you to pick the right connector file—each version combination is listed under the Apache Spark Connector portion of the Vertica 8.1.x download page. In addition, the connector JAR files have names that make it clear which version of each component they support. For example, suppose you need the connector that supports Vertica 8.1, Spark 2.0, and Scala 2.11. Then you want the connector file named vertica-8.1.0-spark2.0_scala2.11.jar.

SaveMode.Append Now Creates Missing Tables

Another change in version 8.1 is how the Vertica Connector for Apache Spark handles attempts to append data to a non-existent table. When you save data from Spark to Vertica, you tell the connector how it should save the data using the SaveMode parameter. If you specify SaveMode.Append when writing the data, the connector appends the data to an existing table.

Previous versions of the Spark connector returned an error if the table you told it to append to didn’t exist. Starting with version 8.1.0, the Spark connector now creates the table if it doesn’t exist. This new behavior means you do not have to check whether the target table exists before you tell the connector to append data.

Have Vertica 8.0.x? Connect to Spark 2.0 Right Now!

Version 8.1.x of the Vertica Connector for Apache Spark is backwards compatible with Vertica version 8.0.x. Just download the new connector and use it to connect Spark to your existing 8.0.x database. You can begin integrating your existing Vertica 8.0.x database with Spark 2.0 even before you upgrade to the latest version of Vertica!

For more information about using the Vertica Connector for Apache Spark, see the Vertica documentation.