1 - What are Complex Data Types? | OpenText™ Vertica™

Cubes with binary ones and zeroes on them clustered together floating in blue space with light shining out

Co-authored by James Clampffer, Deepak Majeti. – First in a series

A lot of projects require querying ORC or Parquet files, or other data that may have internal types that are a bit more complex than usual. It’s a little hard to explain what we mean when we say, “complex.”

So, this is the first in a series of blog posts on complex types. Over the course of this series, we’ll talk about everything from the high level, “What are complex data types?” and “What are the benefits of querying complex data types?” down to the technical nitty gritty of, “How do I query complex data types in a SQL analytics warehouse like Vertica?” We will also keep you updated as to which types Vertica supports, and what we’re adding next.

What are Complex Data Types?

Let’s start with a definition. Complex types are nested data structures composed of primitive data types. These data structures can also be composed of other complex types. Some examples of complex types include struct(row), array/list, map and union.

Complex types are supported by most programming languages including Python, C++ and Java. They are also supported by databases like PostgreSQL which introduced composite (struct) type in 8.0.

And, they are supported in Vertica. What we are adding is a richer, more complete support for these types, not just querying complex types in ORC and Parquet, but doing so rapidly, without converting the file to a far larger row and column format first. We are even adding the capability to store data in efficient complex types in Vertica’s own ROS storage for even faster processing.

Examples of Complex Types

Array – Array is a homogenous collection of elements of the same data type. These can be int, float, or any kind of string (string, char, varchar, etc.) or another complex type. The Array data type is quite similar to Java’s arraylist which is a strongly typed list but also lets you nest the lists to create arrays.

Example of an array could be PhoneNumbers where each number is defined as a string data type. All elements in this array would have the same data type, ie. string.

Code to define an array, and an example.

Map – Map data types are key-value pairs. This type associates a key with a value. For example, a city could be associated with a zipcode, or an address with a phone number.

In Vertica, the data types for the key and the value in a Map do not have to be the same. The keys can only be made of the base types (not a complex type). Map values can be any type that Array supports. For example, the key could be a string and the value could be an int.

Array and Map complex types are also called “collections.”

Struct – Struct is a composite data type that can contain other complex and simple data types. The data types in a struct do not need to be the same (as they do in arrays.) This flexibility allows users to combine multiple data types together under a single name. While many other member variables of similar or different data types can be part of a struct, all these variables need to be defined at the time the struct is created.

An example of struct below shows a SimpleCustomer struct which contains the name (varchar) and id (int) of the customer. Additional fields can similarly be added at the time of struct definition.

Nested Complex Types

Just when you thought things were complex enough, you can also create complex types by nesting other complex types.

For example, we might want to create a struct with all the webevents tracked by customers. For storing the phone numbers of the customers, we can create an ARRAY PhoneNumbers. Now, we can use this array of phone numbers that we have when creating another complex data type to store customer records. Instead of recreating the array, we can instead define the type of “contact” in Customer struct as type PhoneNumbers which is based on the Array data type.

Subsequently, when creating a webEvents struct, we can use basic data types like int and timestamp and define customer record based on the struct Customer we created in the previous step. We also defined another variable http_requests based on a complex data type (MAP) that we already created.

code showing previous examples combined to make a nested Struct

Now, you see why they’re called complex types. Yet, you can also see how using nested complex types like this actually simplifies code and saves time.

Be sure to check out the next post in this series:

2 – What Are the Benefits of Using Complex Data Types in SQL?

And don’t miss the on-demand webinar: Data Preparation for Machine Learning at Scale

About the Author

Waqas Dhillon
Product Manager - Machine Learning, Vertica

Waqas is the product management lead for machine learning with Vertica. In his current role, he drives the strategy and implementation of advanced analytics and machine learning features in the Vertica MPP platform. Waqas holds a bachelor’s degree in computer software engineering from NUST and a master’s degree in management from Harvard University.
Prior to his current role, Waqas has worked in multiple positions where he applied data analytics and machine learning for consumer research and revenue growth for companies in consumer packaged goods and telecommunication industries.

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

1 – What are Complex Data Types?

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

1 – What are Complex Data Types?

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Under the Hood Posts