Vertica on Kubernetes

Posted June 6, 2022 by Matt Spilchen, Senior Systems Software Engineer

Kubernetes and containers

Vertica released the VerticaDB operator in August, 2021, which began Vertica’s integration with Kubernetes. The operator automates many Vertica administrator tasks, such as restarting Vertica if any of the nodes go down, upgrading Vertica to a new version while keeping the database online, and integrating with the Kubernetes horizontal pod autoscaler.

This blog post shows you how easy it is to get Vertica up and running inside Kubernetes.

You don’t need a full understanding of Kubernetes to navigate this blog. It does not contain much detail about some of the Kubernetes concepts. There are plenty of great resources available online that provide additional information.

Kubernetes Setup

Typically, when you first think of Kubernetes, you think of it running inside of a large multi-node cluster. However, for testing purposes, Kubernetes is small enough that you can run it on your own computer. That is what we are going to do in this tutorial.

There are a few ways to run Kubernetes locally. This tutorial uses kind, which stands for “Kubernetes IN Docker”.

As a prerequisite to kind, you will need to install docker.

Next, you need to download the kind binary. If you are running Linux, you can copy and paste the command below to download it. For other operating systems, you need to follow the instructions at the kind quick start page. Note, Vertica only runs on Intel chips, so this tutorial will not work if you are running arm-based macOS.

$ curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.14.0/kind-linux-amd64
$ chmod +x ./kind
$ sudo mv ./kind /usr/local/bin

Another tool that you need to download is kubectl, the CLI that you use to talk to a Kubernetes cluster. I have copied the commands below if you are installing with Linux. For installation instructions for other operating systems, see the kubectl download page.

$ curl -LO "https://dl.k8s.io/release/$(curl -L -s
https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
$ sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

With the tools downloaded, you can create the Kubernetes cluster. First, create a config file. You can copy/paste the command below in your shell or create a config file using your favorite editor:

$ cat << EOF > kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
  extraPortMappings:
  - containerPort: 32001
    hostPort: 32001
- role: worker
EOF

With the config file created, you can then create the cluster:

$ kind create cluster --config kind.yaml
Creating cluster "kind" …
✓ Ensuring node image (kindest/node:v1.24.0) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜 

Deploy Operator

There are two official ways to install the operator. This tutorial installs the operator using the operator lifecycle manager (OLM). There is also a Helm chart that deploys the operator. For additional details, see the official documentation.

Run this command to setup OLM in your Kubernetes instance:

$ curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.21.2/install.sh | bash -s v0.21.2

Now you are ready to deploy the operator. You can start that with this command:

$ kubectl create -f https://operatorhub.io/install/verticadb-operator.yaml

This installs the operator in the Kubernetes namespace my-verticadb-operator.

This command returns quickly, but the operator is still installing it in the background. Run this command to view the installation progress:

$ kubectl get csv --namespace my-verticadb-operator --watch

This command does not return as it watches the installation status. Instead, it prints out a line each time the installation status changes. The output looks like this:

NAME                        DISPLAY              VERSION   REPLACES                    PHASE
verticadb-operator.v1.4.0   VerticaDB Operator   1.4.0     verticadb-operator.v1.3.1
verticadb-operator.v1.4.0   VerticaDB Operator   1.4.0     verticadb-operator.v1.3.1   Pending
verticadb-operator.v1.4.0   VerticaDB Operator   1.4.0     verticadb-operator.v1.3.1   InstallReady
verticadb-operator.v1.4.0   VerticaDB Operator   1.4.0     verticadb-operator.v1.3.1   Installing
verticadb-operator.v1.4.0   VerticaDB Operator   1.4.0     verticadb-operator.v1.3.1   Succeeded

The installation is finished when the PHASE column displays Succeeded, and you can hit CTRL+C to return to your command prompt.

Communal Storage

Vertica on Kubernetes only deploys Eon Mode databases. Before you can run the database, you first need to set up some communal storage. The communal storage does not have to be inside Kubernetes, but it must be accessible from Kubernetes. The operator has feature parity with the Vertica server—you can use any of the major supported cloud vendors for communal storage (AWS, Azure and GCP), in addition to HDFS or any S3-compatible endpoint, like minIO.

In this tutorial, we set up minIO because it is easy and free. MinIO has its own operator, and we install it in much the same way we installed the VerticaDB operator. First, begin the minIO operator installation with this command:

$ kubectl create -f https://operatorhub.io/install/minio-operator.yaml

This returns quickly, but the installation is happening in the background. Run this command to view the installation progress:

$ kubectl get csv --namespace operators --watch

Like the installation of the Vertica operator, this command does not return because we use the –watch option. The installation is finished when the PHASE column displays Succeeded, and you can hit CTRL+C to return to your command prompt. The last line should look like this when it is done:

NAME                     DISPLAY          VERSION   REPLACES   PHASE
...
minio-operator.v4.4.20   Minio Operator   4.4.20               Succeeded

With the minIO operator installed, we can create an S3-compatible endpoint. We do this by creating an instance of a Tenant custom resource (CR). This is a CR that the minIO operator uses. To create the tenant, run the following command:

$ kubectl apply --namespace my-verticadb-operator -f https://raw.githubusercontent.com/vertica/vertica-kubernetes/main/config/samples/minio.yaml

After this command is run, things are happening in the background to complete setup. You can use the following command to wait for this to finish:

$ kubectl wait --for=condition=Complete=True --namespace my-verticadb-operator job/create-s3-bucket --timeout=5m
job.batch/create-s3-bucket condition met

VerticaDB

At this point, we have created the Kubernetes cluster, deployed the VerticaDB operator, and set up the communal storage. We are now ready to create the Vertica database.

To create a Vertica database, we must create an instance of the VerticaDB CR. When this happens, the operator reacts to this and creates the necessary objects and bootstrap the database by creating a new one.

We use the following sample CR:

apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
  name: verticadb-sample
spec:
  communal:
    path: "s3://nimbusdb/db"
    endpoint: http://minio
    credentialSecret: s3-auth
  subclusters:
    - name: defaultsubcluster

There are a lot more parameters for the CR—the above is the required minimum. Refer to the Vertica documentation for a complete list. This CR creates a new database using minIO communal storage in s3://nimbusdb/db. The database has a single subcluster, named defaultsubcluster, of size 3. This uses the community edition license, so you are restricted to at most 3 Vertica pods.

To begin creating the database, you can apply the CR with the following command:

$ kubectl apply --namespace my-verticadb-operator -f https://raw.githubusercontent.com/vertica/vertica-kubernetes/main/config/samples/v1beta1_verticadb.yaml
verticadb.vertica.com/verticadb-sample created

The above command returns immediately, but it will take a few minutes to download the Vertica image and create the database. You can wait for this to be completed with the following command:

$ kubectl wait --for=condition=DBInitialized=True --namespace my-verticadb-operator vdb/verticadb-sample --timeout=10m
verticadb.vertica.com/verticadb-sample condition met

This command does not show any output until everything is set up. There are a few things to check to monitor the progress. If it is busy downloading the Vertica containers, you can view its progress by issuing this command:

$ kubectl get pods --namespace my-verticadb-operator --selector app.kubernetes.io/instance=verticadb-sample
NAME                                   READY   STATUS              RESTARTS   AGE
verticadb-sample-defaultsubcluster-0   0/1     ContainerCreating   0          97s
verticadb-sample-defaultsubcluster-1   0/1     ContainerCreating   0          97s
verticadb-sample-defaultsubcluster-2   0/1     ContainerCreating   0          97s

If it is creating the database, the operator logs events as it calls various commands under the hood. You can view the events to gain more visibility into what the operator is doing. To see the events, run the describe command against your CR:

$ kubectl describe --namespace my-verticadb-operator vdb/verticadb-sample
…
Events:
Type    Reason             Age    From                Message
----    ------             ----   ----                -------
Normal  CreateDBStart      2m59s  verticadb-operator  Calling 'admintools -t create_db'
Normal  CreateDBSucceeded  113s   verticadb-operator  Successfully created database with subcluster 'defaultsubcluster'. It took 1m6.188642254s

You can also see a quick status line about the VerticaDB by issuing the following command:

$ kubectl --namespace my-verticadb-operator get vdb
NAME               AGE     SUBCLUSTERS   INSTALLED   DBADDED   UP
verticadb-sample   9m51s   1             3           3         3

Client Access

Now you have a database—how can you access it? For ad-hoc queries, you can run vsql directly from one of the pods:

$ kubectl exec -it --namespace my-verticadb-operator verticadb-sample-defaultsubcluster-0 -- vsql

However, this isn’t application-friendly: you need to know the name of the pod and it must be in the UP state. The best way to connect to Vertica is through service objects. By default, we create one service object for each subcluster. You can see the service objects with this command:

$ kubectl --namespace my-verticadb-operator get service --selector vertica.com/svc-type=external
NAME                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
verticadb-sample-defaultsubcluster   ClusterIP   10.96.110.181           5433/TCP,5444/TCP   24m

Notice that the name of service object contains both the VerticaDB name and subcluster name for easier discovery. The networking model for Kubernetes is a flat structure, which means that any pod can access any other pod. The name of service objects is stored in DNS servers so it can be accessed with a fully qualified domain name. Service objects also load balance among a set of pods and only pick the pods that are in the “Ready” state. For pods running Vertica, this means that the server is UP and connectable. However, the default behavior is that service objects are accessible only from within Kubernetes on a cluster-internal IP, known as ClusterIP.

If you want to access the service from outside Kubernetes, there are a few options that depend on where Kubernetes is deployed. All major cloud vendors provide a LoadBalancer type that exposes the service externally through the cloud vendor’s load balancer. This isn’t available out of the box when running kind, as in this tutorial. Our only option is to use NodePort, which exposes the service through a port on all the Kubernetes nodes. When we created the kind cluster, there was a port mapping added for this purpose.

To use NodePort so that you can access Vertica from your host machine, run this patch:

$ kubectl patch vdb --namespace my-verticadb-operator verticadb-sample --type=merge --patch '{"spec": {"subclusters": [{"name": "defaultsubcluster", "serviceType": "NodePort", "nodePort": 32001}]}}'
verticadb.vertica.com/verticadb-sample patched

This forces the operator to change the service object type. Now, from your host machine you can access Vertica if you connect to your localhost using port 32001:

$ vsql -U dbadmin -h localhost -p 32001

Cleanup

This should be enough to get you started using Vertica in Kubernetes. The CR that we used is minimal—there are a lot more parameters that are available to customize your database. For more information, visit the GitHub page that hosts the operator, or our official documentation.

When you are done with your system, you can issue the following command to delete the Kubernetes cluster:

$ kind delete cluster
Deleting cluster "kind" ...