Containerization is an idea that seems tasty and wonderful at first. Automated deployment on any platform is a particularly enticing advantage, especially for Vertica, a database with infrastructure freedom as a core principle. Fast deployment without human intervention makes containers incredibly useful, especially in environments that require dynamic scaling. And containers are small, quick to deploy, efficient to execute. They don’t have to carry around an entire operating system like virtual machines.
But the way microservices work is completely counter to the way databases work. You have to break down a big “monolith” application into independent little microservices that each do only one action, and don’t depend on other microservices to work.
Two key aspects of microservices:
- Each container instance (pod) does its part, without worrying about the other containers. It doesn’t keep track, just does it, then vanishes. There’s no preservation of anything about where it started or ended. Another way to say this is that microservices are stateless.
- After a container copy does its job and disappears, nothing had to stay. It didn’t have to, for instance, store data after it was gone.
This concept is great and works beautifully for many kinds of applications.
Then there’s Vertica, which does a lot of things, but at its heart, it’s a database.
What two things does a database need to do?
- Databases are the opposite of stateless – they need to keep careful track of every change, and the current and past states of things. Databases are stateful.
- Databases need to store and persist data. If no one deletes it, that data should always be kept.
So, that means databases are not good candidates to be broken into microservices.
However, databases need to scale efficiently, and to deploy pretty much anywhere, and to be deployed rapidly. Databases need the advantages containerization brings, especially a database like Vertica that has deployment freedom as one of its core principles.
So, a few years back, with the help of some folks at GoodData, especially principle software engineer, Jan Soubusta, who needed to containerize Vertica to make it easier to manage deployment for hundreds of users, the Vertica developers led by Deepak Majeti came up with a concept of how this could be done. Deepak and I gave a lecture on the concept at Strata and Jan talked about how he’d implemented it at GoodData in a Data Disruptors webinar.
But that was independent and theoretical and not officially supported. In the latest release of Vertica, 10.1.1, theory became officially supported reality.
Vertica in containers is based on this different way to deploy containers in Kubernetes called StatefulSets. StatefulSets assigns unique identifiers to each container copy, or pod. It provides the capability to store and track data in a persistent data volume (PV) that is completely separate from the pods. The PV functions somewhat similar to how the depot works for Eon mode. It retrieves data needed for analysis from the main storage, and writes back changes as needed. The PV is connected to a particular pod ID by a PVC, a persistent volume claim. When the ephemeral pods vanish, the data persists in the PV assigned to that pod. If a new pod is created, it is assigned the appropriate identifier and the PVC so it can connect back to the same data in the same PV.
The Vertica container implementation is now internally tested, has been open sourced, and is out on Dockerhub and Github for beta testing by whoever wants to give it a good run. Vertica is shooting for GA in the next release, but it’s out there in the open source world now for you to test out.
Vertica CEO, Colin Mahony, and GoodData’s CEO, Roman Stanek, got on a fireside chat webinar with Vertica VP of Product and Go-to-Market Strategy, Joy King, to discuss the future of unified analytics and data as a service helps companies become data-driven. The brilliance of Vertica’s developer team, help from smart partners like GoodData, and Kubernetes StatefulSets means Vertica users now have all the advantages of containerized deployment and get to keep their data, too.