Getting the Spark Connector

The Vertica Connector for Apache Spark is packaged as a JAR file. You install this file on your Spark cluster to enable Spark and Vertica to exchange data. In addition to the Connector JAR file, you also need the Vertica JDBC client library. The Connector uses this library to connect to the Vertica database.

Both of these libraries are installed with the Vertica server and are available on all nodes in the Vertica cluster in the following locations:

  • The Spark Connector files are located in /opt/vertica/packages/SparkConnector/lib.
  • The JDBC client library is /opt/vertica/java/vertica-jdbc.jar.

Choosing the Correct Connector Version

Vertica supplies multiple versions of the Spark Connector JAR files. Each file is compatible one or more versions of Apache Spark and a specific version of Scala. The Connector file you need depends on the version of Apache Spark and Scala you have installed. You can determine your Spark and Scala version by starting a Spark shell:

$ spark-shell
SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://node01:4040
Spark context available as 'sc' (master = local[*], app id = local-1488824765565).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0.2.6.0.3-8
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.

The startup messages contain the version numbers of both Spark and Scala (shown in bold in the previous example for clarity).

The list in Vertica Integration for Apache Spark tells you which version of the Spark Connector JAR file you need for each combination of Spark and Scala. Note that some versions of the Spark Connector are compatible with multiple versions of Spark. For example, the connector for Spark 2.1 is also compatible with Spark 2.2.

Prior to Vertica version 9.1, the Spark Connector was distributed on the vertica.com website. If you have a version of Vertica prior to 9.1, you must download the correct connector file from the portal.