Co-authored by James Clampffer, Deepak Majeti.
A lot of projects require querying ORC or Parquet files, or other data that may have internal types that are a bit more complex than usual. We support some of those complex types now, and are dedicated to adding full support for all complex types in Vertica.
It’s a little hard to explain what we mean when we say, “complex.” It can be an even bigger leap to explain why this is great news for people who use Vertica.
So, this is the first in a series of blog posts on complex types. Over the course of this series, we’ll talk about everything from the high level, “What is a complex type?” and “What can it do for me?” down to the technical nitty gritty of, “How do I use that in Vertica?” And, as we go, we’ll keep you updated as to which types we support, and what we’re adding next.
What are Complex Data Types?
Let’s start with a definition. Complex types are nested data structures composed of primitive data types. These data structures can also be composed of other complex types. Some examples of complex types include struct(row), array/list, map and union.
Complex types are supported by most programming languages including Python, C++ and Java. They are also supported by databases like PostgreSQL which introduced composite (struct) type in 8.0.
And, they are supported in Vertica. What we are adding is a richer, more complete support for these types.
Examples of Complex Types
Array – Array is a homogenous collection of elements of the same data type. These can be int, float, or any kind of string (string, char, varchar, etc.) or another complex type. The Array data type is quite similar to Java’s arraylist which is a strongly typed list but also lets you nest the lists to create arrays.
Example of an array could be PhoneNumbers where each number is defined as a string data type. All elements in this array would have the same data type, ie. string.
Map – Map data types are key-value pairs. This type associates a key with a value. For example, a city could be associated with a zipcode, or an address with a phone number.
In Vertica, the data types for the key and the value in a Map do not have to be the same. The keys can only be made of the base types (not a complex type). Map values can be any type that Array supports. For example, the key could be a string and the value could be an int.
Array and Map complex types are also called “collections.”
Struct – Struct is a composite data type that can contain other complex and simple data types. The data types in a struct do not need to be the same (as they do in arrays.) This flexibility allows users to combine multiple data types together under a single name. While many other member variables of similar or different data types can be part of a struct, all these variables need to be defined at the time the struct is created.
An example of struct below shows a SimpleCustomer struct which contains the name (varchar) and id (int) of the customer. Additional fields can similarly be added at the time of struct definition.
Nested Complex Types
Just when you thought things were complex enough, you can also create complex types by nesting other complex types.
For example, we might want to create a struct with all the webevents tracked by customers. For storing the phone numbers of the customers, we can create an ARRAY PhoneNumbers. Now, we can use this array of phone numbers that we have when creating another complex data type to store customer records. Instead of recreating the array, we can instead define the type of “contact” in Customer struct as type PhoneNumbers which is based on the Array data type.
Subsequently, when creating a webEvents struct, we can use basic data types like int and timestamp and define customer record based on the struct Customer we created in the previous step. We also defined another variable http_requests based on a complex data type (MAP) that we already created.
Now, you see why they’re called complex types. Yet, you can also see how using nested complex types like this actually simplifies code and saves time.
Be sure to check out the next post in this series:
2 – Benefits of Using Complex Types in SQL
And don’t miss the upcoming webinar: Data Preparation for Machine Learning at Scale
Quick Tip: Changing the Data Type of a Column in an External Table
In-Database Machine Learning 1 – Why?
In-Database Machine Learning 2: Calculate a Correlation Matrix – A Data Exploration Post
Quick Tip: Exporting to Parquet
Announcing Vertica Version 9.3 – Ride the Winds of Change