Vertica on Kubernetes

Posted January 26, 2023 by Matt Spilchen, Senior Systems Software Engineer

Kubernetes and containers

This blog post has been updated from an earlier one to use new features that make deploying on your own system easier.

Vertica released the VerticaDB operator in August, 2021, which began Vertica’s integration with Kubernetes. The operator automates many Vertica administrator tasks, such as restarting Vertica if any of the nodes go down, upgrading Vertica to a new version while keeping the database online, and integrating with the Kubernetes HorizontalPodAutoscaler.

This blog post shows you how easy it is to get Vertica up and running inside Kubernetes.

You don’t need a full understanding of Kubernetes to navigate this blog. It does not contain much detail about some of the Kubernetes concepts. There are plenty of great resources available online that provide additional information.

Kubernetes Setup

Typically, developers think that they must run Kubernetes inside a large multi-node cluster. However, for testing purposes, Kubernetes is small enough that you can run it on your own computer. That is what we are going to do in this tutorial.

There are a few ways to run Kubernetes locally. This tutorial uses kind, which stands for “Kubernetes IN Docker”.

In addition to kind, you must install the following:

  • Docker
  • kubectl
  • Helm

First, download the kind binary. If you are running Linux, you can copy and paste the commands below to download it. For other operating systems, follow the instructions at the kind quick start page.

$ curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.14.0/kind-linux-amd64
$ chmod +x ./kind
$ sudo mv ./kind /usr/local/bin

The next tool is kubectl, the CLI that you use to talk to a Kubernetes cluster. If you are running Linux, you can copy the commands below. For other operating systems, see the kubectl download page.

$ curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
$ sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

And the last tool you need is Helm. This is a package manager for Kubernetes, and we will use it to install the operator. You can download its installer with these commands:

$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh 

After you download these tools, you can create the Kubernetes cluster. First, create a config file. You can copy/paste the command below in your shell or create a config file using your favorite editor:

$ cat << EOF > kind.yaml 
kind: Cluster 
apiVersion: kind.x-k8s.io/v1alpha4 
nodes: 
- role: control-plane 
  extraPortMappings: 
  - containerPort: 32001 
    hostPort: 32001   
  extraMounts: 
    - hostPath: /tmp 
      containerPath: /host 
EOF

One thing that you might want to change in the config file is the extraMounts.hostPath. We use hostPath to store the communal data for the database. If you pick a path other than /tmp, make sure it is writable by uid 5000. This is the uid we use for dbadmin in the container.

After you create the config, you can create the cluster:

$ kind create cluster --config kind.yaml
Creating cluster "kind" …
✓ Ensuring node image (kindest/node:v1.24.0) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜 

Deploy the Operator

There are two official ways to install the operator. This tutorial installs the operator using Helm. You can also deploy it with the operator lifecycle manager. For additional details, see the official documentation.

Run this command to install using the official Helm chart:

$ helm repo add vertica-charts https://vertica.github.io/charts
$ helm repo update
$ helm install vdb-op --wait --namespace my-verticadb-operator --create-namespace vertica-charts/verticadb-operator

This installs the operator in the Kubernetes namespace my-verticadb-operator.

This command waits until the pod running the operator is ready.

VerticaDB

At this point, we have created the Kubernetes cluster and deployed the VerticaDB operator. Now, we are ready to create the Vertica database.

To create a Vertica database, we must create an instance of the VerticaDB CR. When this happens, the operator reacts to this and creates the necessary objects and bootstraps the database by creating a new one.

We use the following sample CR:

$ cat << EOF > vdb.yaml 
apiVersion: vertica.com/v1beta1 
kind: VerticaDB 
metadata: 
  name: verticadb-sample 
spec: 
  annotations: 
    VERTICA_MEMDEBUG: “2”  # Required if running macOS with an arm based chip 
  communal: 
    path: "/communal/vertica-db-tutorial" 
    includeUIDInPath: true 
  subclusters: 
    - name: sc 
  volumes: 
  - name: hostpath 
    hostPath: 
      path: /host 
  volumeMounts: 
  - name: hostpath 
    mountPath: /communal 
EOF

NOTE: There are many more parameters for the CR—the above is the required minimum. Refer to the Vertica documentation for a complete list.

This CR creates a new database using a host path on your system for communal storage. This only works because we created a single-node Kubernetes cluster. If you run on a multi-node Kubernetes cluster, you need to use a communal source accessible by all nodes. The database has a single three-node subcluster, named sc. This subcluster uses the community edition license, so you are restricted to at most three Vertica pods.

To begin creating the database, you can apply the CR with the following command:$ kubectl apply --namespace my-verticadb-operator -f vdb.yaml

The above command returns immediately, but it will take a few minutes to download the Vertica image and create the database. You can wait for this to complete with the following command:

$ kubectl wait --for=condition=DBInitialized=True --namespace my-verticadb-operator vdb/verticadb-sample --timeout=10m 

This command does not show any output until everything is set up. There are a few things to check to monitor the progress. If it is busy downloading the Vertica containers, you can view its progress by issuing this command:

$ kubectl get pods --namespace my-verticadb-operator --selector app.kubernetes.io/instance=verticadb-sample 
NAME                                   READY   STATUS              RESTARTS   AGE 
verticadb-sample-sc-0   0/1    ContainerCreating   0          97s 
verticadb-sample-sc-1   0/1    ContainerCreating   0          97s 
verticadb-sample-sc-2   0/1    ContainerCreating   0          97s 

If it is creating the database, the operator logs events as it calls various commands under the hood. You can view the events to gain more visibility into what the operator is doing. To see the events, run the describe command against your CR:

$ kubectl describe --namespace my-verticadb-operator vdb/verticadb-sample
…
Events:
Type    Reason             Age    From                Message
----    ------             ----   ----                -------
Normal  CreateDBStart      2m59s  verticadb-operator  Calling 'admintools -t create_db'
Normal  CreateDBSucceeded  113s   verticadb-operator  Successfully created database with subcluster sc. It took 1m6.188642254s

You can also see a quick status line about the VerticaDB by issuing the following command:

$ kubectl --namespace my-verticadb-operator get vdb
NAME               AGE     SUBCLUSTERS   INSTALLED   DBADDED   UP
verticadb-sample   9m51s   1             3           3         3

Client Access

Now you have a database—how can you access it? For ad-hoc queries, you can run vsql directly from one of the pods:

$ kubectl exec -it --namespace my-verticadb-operator verticadb-sample-sc-0 -- vsql

However, this isn’t application-friendly: you need to know the name of the pod and it must be in the UP state. The best way to connect to Vertica is through service objects. By default, we create one service object for each subcluster. You can see the service objects with this command:

$ kubectl --namespace my-verticadb-operator get service --selector vertica.com/svc-type=external 
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE 
verticadb-sc       ClusterIP   10.96.110.181           5433/TCP,5444/TCP   24m  

For easier discovery, the name of service object contains both the VerticaDB name and subcluster name. The networking model for Kubernetes is a flat structure, which means that any pod can access any other pod. The name of service objects is stored in DNS servers so it can be accessed with a fully qualified domain name. Service objects also load balance among a set of pods and only pick the pods that are in the “Ready” state. For pods running Vertica, this means that the server is UP and connectable. However, the default behavior is that service objects are accessible only from within Kubernetes on a cluster-internal IP, known as ClusterIP.

If you want to access the service from outside Kubernetes, there are a few options that depend on where Kubernetes is deployed. All major cloud vendors provide a LoadBalancer type that exposes the service externally through the cloud vendor’s load balancer. This isn’t available out-of-the-box with kind. Our only option is to use NodePort, which exposes the service through a port on all Kubernetes nodes. When we created the kind cluster, there was a port mapping added for this purpose.

To use NodePort so that you can access Vertica from your host machine, run this patch:

$ kubectl patch vdb --namespace my-verticadb-operator verticadb-sample --type=merge --patch '{"spec": {"subclusters": [{"name": "sc", "serviceType": "NodePort", "nodePort": 32001}]}}'
verticadb.vertica.com/verticadb-sample patched

This forces the operator to change the service object type. Now, you can access Vertica from your host machine with localhost using port 32001:$ vsql -U dbadmin -h localhost -p 32001

Cleanup

When you are done with your system, you must delete your K8s cluster and delete any files in the hostPath that was set in the config file (/tmp, unless changed).

To delete the K8s cluster, issue the following command:

$ kind delete cluster
Deleting cluster "kind" … 

This should be enough to get you started using Vertica in Kubernetes. The CR that we used is minimal—there are many more parameters that are available to customize your database. For more information, visit the GitHub page that hosts the operator, or our official documentation.