Vertica Blog

Vertica Blog

Spark

spark plus vertica hands touching

Unleash the Power of Vertica and Apache Spark Using the Upgraded Spark Connector

This post is authored by Alex Le What is Apache Spark? Apache Spark is a distributed compute engine that provides a robust API for data science, machine learning, or to work with big data. It is fast, scalable, simple, and supports multiple languages, including Python, SQL, Scala, Java, and R. Backed by the Apache 2.0...
data lakehouse example

Going Beyond a Data Lakehouse

The way things get named in this industry can be pretty odd.  A lot of people have been hearing noise about how cool a “Data Lakehouse” is since it combines the power of a data lake with the power of a data warehouse. When I think of a lakehouse, I think of a great place...

Announcing Vertica Version 11

Vertica Unified Analytics Platform version 11 delivers GA support for Docker containers and Kubernetes, advanced machine learning and time series capabilities, and increased analytical performance.
Heavy rocket blasting above the clouds

Vertica Version 10 Launches Today!

There’s big news today, on the first day of the Virtual Vertica Big Data Conference 2020 – a major new version of Vertica blasts off! Vertica version 10 soars straight into the greatest deployment flexibility of any analytical data warehouse on the market. This fast and extremely scalable platform gives you more power than ever...
Programmer

Saving an Apache Spark DataFrame to a Vertica Table

Before you save an Apache Spark DataFrame to a Vertica table, make sure that you have the following setup: • Vertica cluster • Spark cluster • HDFS cluster. The Vertica Spark connector uses HDFS as an intermediate storage before it writes the DataFrame to Vertica. This checklist identifies potential problems you might encounter when using...

What’s New in Vertica 9.1?

This blog post was authored by Soniya Shah. In Vertica 9.1 we introduce new functionality including: • Eon Mode, now available in production environments • Machine Learning Enhancements • Management Console Updates • Voltage SecureData Integration • Python UDTF • AWS Licensing Updates • Security Updates • Upgrade and Installation Changes • S3 Session Parameter...

Introducing the Parallel Streaming Transformation Loader (PSTL) Solution

This blog post was authored by Soniya Shah. At Vertica, we understand how important it is that our customers can make decisions in near real time. Being able to do this not only requires the massive parallel processing that Vertica offers, but the ability to transform and ingest your data into Vertica as quickly as...
Visual data flow graph showing parallel spark Vertica data sharing

Integrating with Apache Spark

The Vertica Connector for Apache Spark is a fast parallel connector that allows you to use Apache Spark for pre-processing data. Apache Spark is an open-source, general purpose, cluster-computing framework. The Spark framework is based on Resilient Distributed Datasets (RDDs), which are logical collections of data partitioned across machines. For more information, see the Apache...

Introducing the Connector for Apache Spark

In Vertica version 8.0.0, we added integration for Apache Spark through our Vertica Connector for Apache Spark. This is a fast parallel connector that allows you to transfer data between Apache Spark and Vertica.