Unified advanced analytics database features advancements in in-database Machine Learning, direct querying of Parquet data on AWS S3, support for Google Cloud Platform and Azure Power BI, and beta release of cloud optimized separation of compute and storage

Read the Press Release and check out our recent News Coverage

 

The latest release includes improvements across Vertica’s four core areas of platform development, as well as the beta release of Vertica in Eon Mode, which enables organizations to evaluate the separation of compute and storage for Amazon Web Services (AWS) deployments:

  • Analyze in the Right Place
  • Machine Learning and Advanced Analytics
  • Freedom from Underlying Infrastructure
  • Performance at Exabyte Scale

Vertica 9 will be generally available in October 2017. Questions about Vertica 9? Check out the FAQ Page and Solution Brief

Analyze in the Right Place

 

Query Parquet Data from S3 Data Lake

  • Challenge: Highly concurrent, interactive analysis of the exploding volume of data stored on AWS S3
  • Solution: External tables allow customers to query Parquet data stored in AWS S3 directly from Vertica
  • Benefits: Ability to explore S3 Data Lake with advanced SQL analytics and extreme performance

Data Access Restriction with Security Realms

  • Challenge: Managing security concerns on Hadoop so different groups of users have access to select sets of data
  • Solution: Leverage Kerberos Realms to support different data access policies for various groups
  • Benefit: Enhanced control and data access restriction over different groups of users

Integration with Apache Sentry

  • Challenge: Companies that have centralized the management of user permissions want Hadoop usernames pass through when using Vertica to access data on HDFS
  • Solution: Vertica integrated with Apache Sentry so that the privileges associated with a Cloudera username can govern access control in Vertica
  • Benefit: Reduced operational burden with centralized security policies

Machine Learning and Advanced Analytics

 

Convert Categorical Data to Numerical Data

  • Challenge: When preparing data for statistical analysis, users must manually convert categorical data to numerical data
  • Solution: Built-in function automatically converts categorical data to numerical data with one-hot encoding columns, or dummy variables, from a given column in Vertica
  • Benefit: Less time spent on manual data preparation and custom conversion scripts

Cross-Validate Machine Learning Models

  • Challenge: Data scientists need to review the comparative performance of various algorithms and choose from multiple possible hyper-parameter values
  • Solution: Cross-validation function that enables more accurate evaluation of a model’s performance by training the model with more varied subsets of data
  • Benefit: Data scientists can more easily compare various models and avoid overfitting

Export Machine Learning Models

  • Challenge: Users with multiple Vertica clusters or separate development and production clusters need to train a machine learning model in one cluster and then move it to other clusters for scoring
  • Solution: Ability to import/export machine learning models across Vertica clusters
  • Benefit: Less time spent duplicating model building and training

Vertica’s new in-database machine learning capabilities are like gold! We are extremely excited to train our Machine Learning models on our data in Vertica and ship them with our platform to run on our customers’ clusters. This is something that is much harder with any other tool. Vertica’s in-database machine learning will improve our ability to offer new predictive analytics features quickly and easily to our growing customer base. It will improve our competitive positioning.

- Abhishek Sharma, Data Scientist, Fidelis Cybersecurity

Freedom from Underlying Infrastructure

 

 

Available on Google Cloud Marketplace “Launcher”

  • Challenge: Companies want more freedom from underlying infrastructure and to avoid being locked into one cloud
  • Solution: Regular publication of Vertica template images in the Google Cloud Marketplace that launch a guided provisioning process
  • Benefits: Vertica users running in the AWS or Azure clouds, or on premises, can more easily deploy their workloads to Google Cloud

 Microsoft Power BI Certification

  • Challenge: Previous integration between Microsoft Power BI and Vertica limited the scale and performance of data loading and analytics
  • Solution: Power BI now connects to Vertica via a new DirectConnect approach
  • Benefits: Faster, more scalable and more secure data analytics with Vertica and Microsoft Power BI

Cloud Provisioning with Management Console (MC)

  • Challenge: Scripting requirements of cloud vendors’ provisioning tools make it difficult to get started in the cloud
  • Solution: Augment Cloud Service Provider (CSP) provisioning tools with a user-friendly GUI wizard in the Vertica Management Console, including post-provisioning steps such as data loading and querying
  • Benefits: Easier to get Vertica up and running in the cloud, saving time and resources

Eon Mode Beta: Separation of Compute and Storage

  • Challenge: Variable-demand workloads need to scale for peak demand, and reduce size during low activity
  • Solution: Separation of compute and storage so that compute can be reduced during low-demand periods
  • Benefits: Rapid elasticity, blazingly fast query performance and reduced infrastructure spending

To join the Vertica Eon Mode Beta Program, visit the sign up page here

 

Performance at Exabyte Scale

 

 Hierarchical Partition Management

  • Challenge: Partitioning data into slices can significantly improve query execution because the Vertica optimizer can isolate the relevant storage containers, and eliminate the rest
  • Solution: Users can now create a custom, hierarchical definition of partition structure
  • Benefits: Faster queries at Petabyte scale

Universal Unique Identifier (UUID) Data Type

  • Challenge: Storing UUID data as text strings is an inefficient use of space
  • Solution: Allow customers to store UUID columns as a more space-efficient data type
  • Benefits: Data stored more efficiently

Flattened Tables

  • Challenge: Many queries involve joins between a large fact table and multiple dimension tables, which increases query overhead and reduces performance
  • Solution: Flattened tables include columns that get their values by querying other tables, offering a flattened view of data stored in complex schemas
  • Benefits: Less overhead and faster query performance