One on One with Davin Potts: 5. Vertica, KNIME, Python and the New Uber Created Vertica-Python Interface

Posted March 11, 2019 by Paige Roberts, Vertica Open Source Relations Manager

Vertica One on One with Davin Potts, CEO Appliomics, Founder KNIME, Core Python Committer
At the recent Data Day Texas event, I sat down with Davin Potts and had a long conversation about a wide variety of subjects. I divided the conversation into multiple chunks by subject, and have been posting them one chunk at a time. In the first post, we discussed the wide variety of programming languages and tools in use for data science projects right now, and how he became a core Python committer. In the second post, we discussed the advantages of KNIME for a data science consultant like Potts, and the advantages of using SQL in a database to do data manipulation and analysis. In the third post, we dove into a cool new feature coming in the next version of Python. In the fourth post, Potts gave a few tips on how anyone who uses open source projects like Python can contribute in an important way without being an expert.

In this post, we discuss how open source and Vertica interact, with focus on the new open source Python interface for Vertica.



Paige Roberts: We’re actually incredibly grateful to Uber who is one of our customers. They created a new Python Interface for Vertica and open sourced it.

Davin Potts: Oh, I have not seen that. That’s awesome.

Roberts: I just started at Vertica a few months ago and I was surprised to find out they had an open source Python interface, as well as Python UDFs so you could use custom Python algorithms inside Vertica. I thought that was pretty neat.

Potts: Nifty. That is cool.

For the KNIME stuff, the integration that I showed where you don’t just call Python code from KNIME, but you can have Python code call into KNIME. If you’re a Python shop or love using Jupyter Notebooks. Fantastic. You can see your KNIME workflows from inside of Jupyter Notebooks. You can trigger the execution of them and effectively, it’s just another function that you call from Python, except that it runs other things that people created in KNIME.

I can see a lot of potential use cases for that and they say, “We really don’t like using SQL, but we know how to use KNIME to interact with databases. Do all your database things in KNIME, it’ll use the database in a really efficient way. And then, you can call it from Python and there’s your meeting point if that’s what you want to do.

Or you can use the Python-Vertica interface.

Whatever floats your boat because most data stores have some sort of Python interface whether or not that’s the first thing that people think of.

If you live in a Python universe, Python is your world, if a data store doesn’t have a Python interface, you’re not going to talk to it.

Yep.

And KNIME can be a bridge across that gap, but it’s even better if you have something that’s a little closer.

How well is Vertica promoting the existence of the Python open source interface?

The new Uber interface, Vertica-Python, came out recently. We’ve had a Python interface, and Python UDF framework for a while. But Vertica-Python is a pure Python interface. Our old one had C++ code that we built in-house for better performance and compiled to Python. But even internally, we’re moving all our support to the new one. We haven’t done a lot of promotion on it yet, but it’s one of the things I’d like to get the word out about a little more. About a third of our business is OEM. So, a lot of applications are Python or something else on the surface, with Vertica embedded inside.

That makes sense. I knew about the OEM business, but I didn’t know relatively speaking how much time and attention they got. That is very cool.



Don’t miss the earlier parts of this discussion with Davin Potts. In the first post, we discussed the wide variety of programming languages and tools in use for data science projects right now, and how he became a core Python committer. In the second post, we discussed the advantages of KNIME for a data science consultant like Potts, and the advantages of using SQL to do data manipulation and analysis. In the third post, Potts shared some exciting news about the upcoming Python 3.8 release. In the fourth post, Potts gives a few tips on how anyone who uses open source projects like Python can contribute in an important way without being an expert.

And stay tuned for our last and final post for this interview, coming up day after tomorrow. Davin Potts and I dive into some of the surprising advantages of doing machine learning inside a database.



Learn more about the open source Python-Vertica interface.

Learn more about the intersection between Vertica and open source.