Vertica for Time Series Analytics and IoT

Posted May 3, 2022 by Steve Sarsfield, Vertica Senior Product Marketing Manager – Partner Ecosystem

When you want to make a prediction – stocking levels for items in your store, predicting energy usage so supply is ready, or being proactive about equipment failures – those predictions often rely on time-series data, a series of data points measuring the same signal or process through time. By analyzing that data, companies can drive revenue and efficiency in new ways.

However, time series and Internet of Things (IoT) data is not without its technical challenges. It doesn’t always come in at perfect intervals, and different sets of data that need to work together may not come in the same intervals.  Time series data may lack the data quality needed.  Challenges exist around performance of analytics, too, given that this data tends to be high volume. The database industry has even seen the emergence of the category of time-series databases to solve these specific problems.

According to Verified Market Research, the Global Time Series Database Software Market was valued at USD 273.56 Million in 2020 and is projected to reach USD 575.03 Million by 2028. There are a growing number of analytical use cases behind this demand that require time series analysis, including seasonality detection in retail sales, weather prediction and forecasting, change-point detection in stock prices, sensor data monitoring, asset tracking, and more.

Through a combination of time-series-specific functions and SQL extensions, Vertica has been the go-to solution for time series analysis for many companies for many years.

What to look for in a time-series database:

We’ve seen the emergence of many time-series databases in the market, but how effective are they when it comes to handling massive data sets like IoT messages, sensor data, clickstream data, or machine logs?

  • Irregular Data Interval Handling – Your time-series database should have the ability to store values at any interval. Some databases deal with data at fixed intervals only.
  • SQL and Python Support – Some solutions lack SQL support, making you learn a new query language. Standardization means you can more easily find staff. Other databases don’t give you the freedom to work in Python and Jupyter notebooks for easy interactive experimentation like Vertica provides.
  • Concurrency – Some solutions have severe limitations on how many users or jobs can execute simultaneously. This reduces the value you can get from the data.
  • Elasticity – Modern databases allow you to elastically scale compute up and down easily. With many time-series solutions, basic cluster support is an extra expense.
  • Modern Architectures – There is a strong movement to put new workloads on a cloud, but existing workloads and data sets are often on-premises. Hybrid cloud architectures are the most common by far, and a database that functions equally well on-premises as in any cloud gives you the freedom to deploy where it makes sense, and to have your teams learn a single way to do analytics that works anywhere.
  • Ecosystem – Make sure your time-series database can connect to ETL and business intelligence (BI) tools that you use. A lot of time-series data streams come in via Apache Kafka and other streaming platforms like Apache Spark and Amazon Kinesis. A good time-series database should integrate smoothly with the entire analytics ecosystem.
  • Data Variety – Time-series data comes in a lot of forms – from CSV to JSON to log files to sensor readings, or a pile of data stored in Parquet, or compressed in gzip. A database that can accept data, and add structure later, automatically parse data, and analyze data in external formats, or import them easily, is required.
  • Windowing – Often, you need to understand time-series data in terms of time window, moving averages, and other similar time slicing.

When evaluating an analytical database for time-series data analysis, remember to choose an option that supports industry standards and supports modern capabilities.

What makes Vertica an ideal choice for time-series analytics?

  • Vertica is designed to handle massive amounts of data at extremely high ingestion rates.
  • With Vertica, you can instantly expand your clusters by adding new nodes (elastic scaling).
  • Time series IoT data analytics capabilities in Vertica are built-in and do not rely on external or supplementing offerings.
  • Vertica supports high availability and high concurrency.
  • Vertica’s integration with BI tools, ETL tools, Kafka, Spark, etc. is simple and robust.

While these capabilities make Vertica an ideal choice for processing time-series data, what makes Vertica stand out from niche time-series databases is when your analytical workloads rely on additional forms of data. Here are some examples and benefits of this expanded approach:

  • Data Quality/Data Preparation – Through features such as gap filling and interpolation, Vertica can join irregular interval or different interval data sets on an interpolated key, then fill in the gaps using your choice of method. It can also merge time-series data with contextual data such as weather or atmospheric pressure at the time the reading was retrieved.
  • Finding and matching patterns in data – Through event-based windows and event pattern matching, Vertica enables you to divide time-series data into windows that correspond to major occurrences. This is useful for example, with financial data, where specific occurrences can be triggers for additional activities.
  • Clickstream analytics – Vertica has a high-performance mechanism for sending and receiving data from third-party message buses. Vertica offers sessionization, a feature that is frequently used to analyze clickstreams, such as recognizing web browsing sessions from recorded web clicks.
  • Broad Analytic Capabilities – Vertica has a wide range of other time-series analysis functions, as well as data preparation, BI, and data science capabilities, with over 650 built-in functions.

Vertica Finishes Where Time-Series Databases Just Start

More and more companies from financial services to utilities to telecom to technology rely on Vertica for time series and IoT data analytics.

Vertica’s columnar, massively parallel processing (MPP) architecture combined with more than 650 built-in advanced analytics and machine learning capabilities – all delivered with elastic scalability – provide organizations with proven high volume performance advantages. So, when you have a time-series workload, consider Vertica to make managing and analyzing time-series data faster and more efficient.