Derive maximum value from your Hadoop data lake with Vertica

Analyze your data in place to expand your Hadoop investment and unlock more timely insights

Many data-driven companies have adopted HDFS to collect, store, and manage large volumes of varying forms of data. However, multiple copies of everything, inadequate concurrent data access, and an overall lack of analytical performance has resulted in limited business value.

Vertica provides a Unified Analytics Warehouse so organizations can finally bring the data lake and data warehouse together. You can analyze Parquet and ORC including complex data types in place, and optimize key data sets for blazing fast analytics, all using HDFS for safe distributed storage. With Vertica, your BI and data science teams can analyze data at once to squeeze maximum value out of your Hadoop investment.

Read the data sheet

Advanced analytics

Gain full-functionality ANSI SQL capability, not a subset of commands. Run 100% of TPC-DS benchmark queries with no modification.

More options for querying HDFS-managed data

Run Vertica as a SQL on Hadoop query engine to data stored in any major Hadoop distribution, including Cloudera and HPE MapR. And, with Vertica in Eon Mode for HDFS communal storage, Vertica ROS data is stored in HDFS for you to apply the full functionality of Vertica’s advanced analytics and machine learning to this data.

Broad data format and complex data type support

Query data across Parquet, ORC, JSON, and many other data formats and analyze complex data types in Parquet formats on HDFS and S3.

Comprehensive analysis

Use External Tables to analyze data in ORC and Parquet on the same HDFS nodes and execute JOINs between ROS data and External Tables for a more comprehensive view of all your data.

“Vertica 10 introduced expanded support for the analysis of semi-structured data types, especially the complex data types found in Maps, Arrays, and Structs in Parquet data. Instead of replicating the data in Vertica to run a query, Vertica can access the data directly in HDFS or S3 object storage. This eliminates the need for data storage duplication and enables much quicker answers to questions requiring both data stored in Vertica and in other data platforms.”

– John Santaferraro, Analyst, Enterprise Management Associates

Read the EMA White Paper

Maximize your Hadoop investments

If you need to keep some or all of your big data analytics on-premises, or on your Hadoop installation using commodity hardware, Vertica is the unified analytics warehouse you need. Because Vertica runs independently of your infrastructure, you can create a variety of hybrid deployments, including a mix of cloud, on-prem, and Hadoop resources.

Want to get technical? Read the “Hadoop Integration Guide”

Unifying the Data Warehouse and the Data Lake

Vertica offers the fastest way to perform SQL queries on your Hadoop data. Vertica SQL on Apache Hadoop® supports data discovery on your Hadoop data lake as well as highly optimized analytics for the most demanding SLAs. You can use HDFS as a shared object storage layer, and import data from HDFS to Vertica on-premises, as needed, via Vertica in Eon Mode for HDFS communal storage. You can even combine that data with AWS S3 data as well for an extensive hybrid environment that is as flexible as your big data storage and compute deployment needs to be.

Read about Vertica SQL on Apache Hadoop


Why you need an analytical database for big data on Hadoop

Although they offer an inexpensive way to store data, Hadoop-based solutions are no match for Vertica’s unified analytical warehouse, specially designed for big data analytics. Learn how fast and complete the Vertica SQL on Hadoop engine is, as we put the TPC-DS benchmarks to the test against Impala, Hive on Tez, and Apache Spark. Read the results in this report.

Read the benchmark study

Vertica in Eon Mode support for HDFS

Are you looking for more alternatives to cloud storage and compute for your big data analytics? Vertica in Eon Mode now supports HDFS communal storage for additional on-premises deployment with a durable master copy of ROS files on HDFS. This extends Vertica’s deployment flexibility as the only analytical data warehouse that separates compute from storage for both on-premises data centers and multiple public clouds.

Read more about Vertica in Eon Mode

Hadoop Transition Service

Are you experiencing slow queries or a high incidence of query failures? The culprit is likely your open source query engine. Impala, Hive, and Presto are fine for ad-hoc data exploration, but they were designed for small teams of data scientists, not for enterprise organizations that require optimal performance for hundreds of concurrent users. With the Hadoop Transition Service, your organization can easily migrate your open source query engine tools over to Vertica, deriving even greater value from your HDFS data lake.

This streamlined service combines technical experts, proven migration methodology, and an end-to-end transition based on industry best practices.

Read about Vertica Transition Services for Hive, Impala, and Presto

How can you get the value out of Hadoop that you were promised?

Since Hadoop’s initial release 14 years ago, untold volumes of data have been stored in HDFS (Hadoop Distributed File System). Spread across a virtual landscape of data-inspired organizations, those data lakes are wide, and deep. Companies have made tremendous investments in Hadoop over the years, and data continues to pour into their data lakes.

You might ask, have those been wise investments? We think so. Despite what the naysayers are claiming about Hadoop itself these days, it’s still true that vast quantities of data from useful sources can reveal lucrative patterns that make massive data collection worthwhile. Unfortunately, we believe most of those revelations are still out there to be made. The problem is that many analytics teams are using open-source query engines that were designed to work with their Hadoop distros. Those query engines are simply not providing the insights that are possible from HDFS data lakes.

Read more