Saving an Apache Spark DataFrame to a Vertica Table

Before you save an Apache Spark DataFrame to a Vertica table, make sure that you have the following setup: • Vertica cluster • Spark cluster • HDFS cluster. The Vertica Spark connector uses HDFS as an intermediate storage before it writes the DataFrame to Vertica. This checklist identifies potential problems you might encounter when using the Vertica Spark connector.

Problem	Solution
You have a bad Vertica and Hadoop configuration.	Verify that you have configured Vertica correctly to talk to HDFS. To configure Vertica Nodes for HDFS access, follow the Vertica and Hadoop configuration instructions found in Configuring the hdfs Scheme.
You are using a connector that is not compatible with the Spark and Scala version combination in your environment.	If you see one of the following errors, your Vertica Spark connector is not compatible with the Spark and Scala version combination in your environment: • java.lang.ClassNotFoundException • java.lang.AbstractMethodError Verify that you are using the right connector for your specific Spark and Scala combination. As of Vertica 8.1.1, there are five connectors that support the following environments:• Apache Spark 1.6/Scala 2.10 • Apache Spark 2.0/Scala 2.10 • Apache Spark 2.0/Scala 2.11 • Apache Spark 2.1/Scala 2.10 • Apache Spark 2.1/Scala 2.11 These connectors are available at https://my.vertica.com.
When loading Vertica data into Spark, your Spark script fails with a java.lang.IllegalArgumentException error.	Vertica can store numeric values with a higher precision than the column definition. When you create a DataFrame for a table that has NUMERIC columns, every NUMERIC column in the DataFrame is assigned the maximum precision supported in Spark. If your script tries to load data into the DataFrame column that exceeds the Spark maximum numeric precision, the script fails with the following error: `java.lang.IllegalArgumentException: requirement failed: Decimal precision 41 exceeds max precision 38` There is no workaround for this. For more information, see Loading Vertica Data into a Spark DataFrame or RDD in the Vertica documentation.

Learn More

For complete details about integrating Vertica with Spark, see Integrating with Spark in the Vertica documentation.

About the Author

Soniya Shah
Information Developer

Currently, a first year law student with a background in science and technology. Experienced technical writer, with specializations in software documentation, big data, blog development, and website development. I build user-centered content to communicate complex and technical information more easily.

I used to work for Vertica full time for about 3 years. I still work at Vertica part time while going to law school.

Update: Soniya is now doing her law internship, and no longer working at Vertica. Good luck, Soniya!

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Saving an Apache Spark DataFrame to a Vertica Table

Learn More

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Saving an Apache Spark DataFrame to a Vertica Table

Learn More

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Integrations Posts