Vertica on Amazon Elastic Kubernetes Service (EKS)

Introduction

In my previous blog, I showed you the steps to run Vertica on Kubernetes (K8s) on your laptop. That’s fine if you want to try things out on a small scale, but what if you want to run it on a larger system? In this blog, I will teach you how to deploy Vertica on Amazon Elastic Kubernetes Service (EKS). I will highlight some of the new features in the latest version of the operator that make connecting Vertica to an AWS S3 bucket simple.

This tutorial assumes that you have the following:

Some EKS experience
K8s cluster running on EKS
CLI access to your K8s cluster
Helm CLI local installation

Deploying the Operator

First, we must deploy the operator. Deploying the operator creates a Deployment object that runs a pod with our operator in it. It also registers a few custom resource definitions (CRDs) that provide an API to deploy and manage Vertica clusters in K8s.

We have two deployment options for the operator: The official Vertica Helm chart or the Operator Lifecycle Manager (OLM). In this blog, I chose to install using the official Vertica Helm chart.

Run these commands to deploy the operator:

$ helm repo add vertica-charts https://vertica.github.io/charts 
$ helm repo update vertica-charts 
$ helm install vdb-op --wait --namespace blog --create-namespace vertica-charts/verticadb-operator

These commands install the operator in a new namespace called blog. The operator only works with objects in a single namespace, so you need to deploy the operator multiple times if you want to have multiple Vertica clusters in different namespaces.

The command will return when the operator pod is up and running.

Communal Storage Authentication Options

Now, we can deploy a Vertica database. The operator installs a few CRDs that allow you to create and manage a Vertica database. You create instances of these CRDs, called custom resources (CR), just like you would for other K8s objects. You can see the Vertica-specific CRs with the following commands:

$ kubectl api-resources | grep -E '^NAME|vertica.com' 
NAME SHORTNAMES APIVERSION NAMESPACED KIND 
verticaautoscalers vas vertica.com/v1beta1 true VerticaAutoscaler
verticadbs vdb vertica.com/v1beta1 true VerticaDB

The CR named VerticaDB creates and manages an entire Vertica cluster. The other CR, VerticaAutoscaler, manages autoscaling of a VerticaDB. This blog focuses exclusively on the VerticaDB CR. For information on the VerticaAutoscaler, refer to the documentation.

Before we start up a Vertica cluster, we must decide on how Vertica will access the AWS S3 communal storage. You can authenticate to AWS S3 with one of the following options:

Secret key and access key
IAM role attached to the EC2 instances
IAM role attached to a ServiceAccount (IRSA)

Secret and access keys require that you store the credentials in Vertica, so it is harder to maintain because of key rotation. An IAM role attached to the EC2 instances is better because you don’t have to rotate keys, but any pod running on the K8s node has access to the IAM role. This privilege can be very broad. The IRSA option gives you the most control over who has access to the IAM role because it only applies to pods that run with a specific ServiceAccount.

I will only teach you how to set up a Vertica cluster using the IAM role approaches. You can refer to the documentation to set up with a secret key and access key.

IAM Role Attached to the EC2 Instance

There isn’t any special setup in Vertica when choosing this option. When you create the node group in EKS, select a Node IAM role that has access to the S3 bucket that you want to use for communal storage.

IAM Role Attached to a ServiceAccount (IRSA)

The approach is new to version 12.0.3 of the Vertica server. You must use an image in the VerticaDB CR that is 12.0.3 or later.

The node group that you set up for this approach should not use a Node IAM role that gives access to your bucket. Instead, the IAM role that has bucket access is attached to a ServiceAccount. There are a few commands you need to run to set that up. If you are going to authenticate with this approach, I encourage you to read this AWS blog that describes what you need to do and why.

The default ServiceAccount is named verticadb-operator-controller-manager. You can change this by setting the serviceAccountNameOverride Helm chart parameter. The steps here assume you are using the default ServiceAccount.

Here are the commands taken from the AWS blog that are adapted for the ServiceAccount and namespace we are using. Be sure to substitute in the name of your EKS cluster for , and the name of the IAM role you want to attach to the ServiceAccount for <s3-arn-policy-name>:

$ eksctl utils associate-iam-oidc-provider --cluster <cluster-name> --approve
$ eksctl create iamserviceaccount --name verticadb-operator-controller-manager --namespace blog --cluster <cluster-name> --attach-policy-arn <s3-arn-policy-name> --approve --override-existing-serviceaccounts

The creation of the IAM service account uses the option –override-existing-serviceaccounts because the service account was created when the operator was deployed.

Deploy VerticaDB

The next step is to deploy the Vertica cluster. We do that by creating an instance of the VerticaDB. Here is a sample YAML manifest you can use–substitute your S3 bucket for <your-bucket>.

$ cat << EOF > vdb.yaml
apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
  name: v
spec:
  # A version 12.0.3 or higher is needed to use IRSA
  image: vertica/vertica-k8s:12.0.3-0-minimal
  communal:
    path: "s3://<your-bucket>"
    endpoint: https://s3.amazonaws.com
  subclusters:
  - name: sc
EOF
$ kubectl apply --namespace blog -f vdb.yaml

The operator will react to this new CR and begin to create a new database with a 3-node subcluster name sc. This create happens asynchronously to the kubectl command. To wait for the DB creation, you can use this command:

$ kubectl wait --for=condition=DBInitialized=True --namespace blog vdb/v

Client Access

Now, you can allow client access to your database. When deploying Vertica in K8s, all client access goes through Service objects. Generally, we create one Service object for each subcluster defined in the VerticaDB. You can find the service objects that clients can connect to with the following command:

$ kubectl get service --selector app.kubernetes.io/instance=v,vertica.com/svc-type=external

By default, the service objects are created with the type ClusterIP. This means they can only be accessed by pods running inside the same K8s cluster. If you have external clients that need to connect to the database, you need a service object with the LoadBalancer type. This can be set by doing a patch of our previous VerticaDB instance:

$ kubectl patch vdb/v --namespace blog --type=json --patch='[{"op": replace, "path": /spec/subclusters/0/serviceType, "value": LoadBalancer}]'

The kubectl command will return immediately while the operator applies the update to the service object in the background. You can wait for the update by re-issuing the kubectl get service command from before:

$ kubectl get service --selector app.kubernetes.io/instance=v,vertica.com/svc-type=external

The output should look something like this when the update is completed. Notice that it has filled in the external-IP, which is the fully qualified domain name (FQDN) that clients can use to connect to it

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

v-sc LoadBalancer 172.20.50.35 a9f8495728b99485fad6cea81dacb35c-1847055740.us-east-1.elb.amazonaws.com 5433:32002/TCP,5444:30316/TCP 22m

AWS allows additional configurations to the load balancer, such as restricted CIDR IP address. Set these in the CR with the spec.subclusters[i].serviceAnnotations field.

Conclusion

We successfully deployed Vertica on EKS and set it up so that it can be accessed by clients outside of k8s. There are many features we did not touch on in this blog. You can find more information about what the operator can do by reading the documentation.

About the Author

Matt Spilchen
Senior Systems Software Engineer

I have over 20 years' experience as a developer on various database engines. Most recently with Vertica, I led the development of Vertica’s integration with Kubernetes. I am the key contributor to Vertica's open-source Kubernetes operator (https://github.com/vertica/vertica-kubernetes), which aims to automate the management of the database.

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Vertica on Amazon Elastic Kubernetes Service (EKS)

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Vertica on Amazon Elastic Kubernetes Service (EKS)

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Under the Hood Posts