Announcing Vertica Version 9.2.1 – Take Analytics Efficiency to the Next Level

Posted May 10, 2019 by Paige Roberts, Vertica Open Source Relations Manager

Passenger balloon floating in blue sky
Photo by Aaron Burden on Unsplash

This week, some very cool things have gone into the newest version of the Vertica Analytics Platform that make your analytics fly and save on your AWS budget. The theme for this release is Improving Vertica in Eon Mode Integration with Amazon S3, and there are a whole bunch of new features designed to do exactly that. But, there are also some great new features in this release for folks who aren’t using Vertica in Eon Mode, or aren’t even on the Cloud. I’m not going to dive into everything, as it’s a pretty extensive list. But I want to highlight for you some of the high-flying new capabilities.

Vertica in Eon Mode Usability Enhancements and Faster Performance

First off, for folks using Vertica in Eon Mode on Amazon, we’ve made multiple changes to improve performance and usability, and provide better economics. For example, Amazon S3 provides inexpensive large scale storage, but then charges for access to that data. To optimize for more efficient data writes, we’ve added support for writing to S3 directly, or via a temporary data location.

One of the big differences between Vertica in Eon Mode and Enterprise Mode is that, in general, you add more nodes to Enterprise deployments to speed up query performance. In Eon Mode, you generally add more nodes to expand the number of concurrent queries you can do at once. In this version, we’ve added crunch scaling to Vertica in Eon Mode, so when your existing queries need to go faster, you can add more compute nodes, and they will fly.

Analyzing Statistics by Partition for Optimized Query Execution

Performance improvements aren’t limited to Eon Mode, though. One key improvement allows the analysis of statistics by partition, rather than by whole table. A lot of our customers have extremely large tables, and keeping track of statistics like minimums, maximums, and such for each chunk of that data, so queries only read the relevant chunks, vastly improves query performance. The trouble is, data never stays static, so you constantly have to recalculate those statistics when the data changes. When your tables get really huge, just re-calculating statistics for the whole table becomes a lengthy process. Since, in general, the data that changes most often is the most recent data, we provided the capability to just recalculate statistics on certain partitions, rather than the whole table. That keeps the statistics current and the queries fast and efficient, without taking the time to recalculate statistics for large sections of the data that haven’t changed in a while.

Monitoring Sub-Cluster Activity with Enhanced Management Console

We have added some helpful improvements to the Management Console, as well. Among others, you can now monitor and analyze sub-cluster activity. That should help with keeping better track of deployments where different sub-clusters have been designated for different departments or purposes.

Example Correlation Matrix

Finding Patterns More Easily with In-Database Machine Learning –  Correlation Matrix Support

One of the coolest new additions is for people performing in-database machine learning in Vertica. It’s available regardless of which mode you prefer, or if you’re deployment is on premises or in the Clouds. Correlation Matrix support makes summarizing large amounts of data in a machine learning project to find interesting patterns far easier. You can use the matrix as input for other analyses, for exploratory factor analysis. You can also use the matrix as a diagnostic to check other analyses. For example, if you’re applying linear regression and there are a lot of correlations, that suggests that the estimates might be unreliable.

Optimizing Your Amazon Budget with Support for 5th Generation Amazon Instance Types

One more cool feature you should really know about isn’t Eon Mode specific, but it is aimed at saving our Amazon customers money. Each time Amazon adds support for a more advanced Intel chipset, the new processors tend to do the same work faster, using less energy. For example, the r-series instance types, which are memory optimized and often used for Vertica, have been running 4th Generation Intel processors, which are significantly better than Gen 3. This means Amazon charges the customer less for getting the same amount of work done, but only if your software can support the newest generation of chips. We are pleased to announce that this new version of Vertica will support 5th Generation Amazon instance types. Now, take off, and go get your analytics work done faster for less money on the Amazon cloud!

There’s a lot more in this release. Here’s a quick summary:

  • Vertica in Eon Mode improvements for AWS
    • Using temp space to write to S3 improvements
    • Better tracking of S3 connections
    • Support for optional temp data location on S3
    • Direct writes to S3
    • UDXs usability fix
    • Support for AWS EC2 5th gen instances
  • Performance improvements
    • Flattened Table enhancement to rebuild by partition
    • Performance improvement for wide varchar (VLTS)
    • Improvement to Analyze stats by partition
  • Client
    • Load balancing improvement to support alter statements
    • Prepared statements for Python Client
    • SSL connection using CONNECT statement
  • Analytics
    • Approximate Count Distinct (ACD) improvements
    • Correlation Matrix
  • Management Console
    • New instance types
    • Sub-cluster activity monitoring
    • Configurable keystore & trustore for JDBC SSL settings
    • Support SSL certificate when making JDBC calls to Vertica DB
  • Supported platform updates (add Ubuntu 18.04, Kafka 2.1, Mac 10.13, Windows Server 2019, CDH 6.0, etc.)

 

Try the New Vertica for Yourself!

And check out the upcoming webinar on the Vertica Management Console: Four Use Cases for Optimal Resource Management!