Vertica Blog

Vertica Blog

Spark

Programmer

Saving an Apache Spark DataFrame to a Vertica Table

Before you save an Apache Spark DataFrame to a Vertica table, make sure that you have the following setup: • Vertica cluster • Spark cluster • HDFS cluster. The Vertica Spark connector uses HDFS as an intermediate storage before it writes the DataFrame to Vertica. This checklist identifies potential problems you might encounter when using...

What’s New in Vertica 9.1?

This blog post was authored by Soniya Shah. In Vertica 9.1 we introduce new functionality including: • Eon Mode, now available in production environments • Machine Learning Enhancements • Management Console Updates • Voltage SecureData Integration • Python UDTF • AWS Licensing Updates • Security Updates • Upgrade and Installation Changes • S3 Session Parameter...

Introducing the Parallel Streaming Transformation Loader (PSTL) Solution

This blog post was authored by Soniya Shah. At Vertica, we understand how important it is that our customers can make decisions in near real time. Being able to do this not only requires the massive parallel processing that Vertica offers, but the ability to transform and ingest your data into Vertica as quickly as...
Visual data flow graph showing parallel spark Vertica data sharing

Integrating with Apache Spark

The Vertica Connector for Apache Spark is a fast parallel connector that allows you to use Apache Spark for pre-processing data. Apache Spark is an open-source, general purpose, cluster-computing framework. The Spark framework is based on Resilient Distributed Datasets (RDDs), which are logical collections of data partitioned across machines. For more information, see the ....

Introducing the Connector for Apache Spark

In Vertica version 8.0.0, we added integration for Apache Spark through our Vertica Connector for Apache Spark. This is a fast parallel connector that allows you to transfer data between Apache Spark and Vertica.