At the recent Data Day Texas
event, I sat down with Davin Potts, who I have known for many years, and had a long conversation about a wide variety of subjects. I divided the conversation into multiple chunks by subject, and will post them up one subject at a time. In this first post, we discussed the wide variety of programming languages and tools in use right now for data science projects, and how he became a core Python committer.
Paige Roberts: Let’s start by introducing you to the blog readers who may not know who you are yet.
I’m Davin Potts
. I have my own consultancy based in Austin, Texas, Appliomics, where I mostly work on mathematical modeling, scientific software, and other data science related things. Sometimes that involves cool tools like KNIME
. Often it involves things like Python. But honestly, it covers a wide gamut depending upon whatever tools other people are choosing to use.
I’m happy to switch and adapt. My talk (Choosing Sides When Choosing Tools Hurts)
was all about that. Fortran, C, C++, Erlang. These are all fair game. And they’re being actively used by groups that I’ve done work for in just the last two years.
Roberts: Wow. That’s all over the place.
Roberts: Java, Python, R?
Java, Python and Scala, yes. R, I tend to hesitate with. I haven’t done anything with R in the last two years.
What’s the hesitation with R?
I think R is a fantastic tool. It’s made a lot of people highly effective in a short period of time and ggplot rocks. The thing that makes me hesitant to start new projects with R is because I’ve been asked too many times to help work on projects where clients built up a corpus of code in R, that they now have decided they need to move away from. A common theme is: as they were building up their code, they were not thinking about the architecture around it, and how to get that code to scale.
That’s not to say that R isn’t capable. It means that people have dug themselves this hole repeatedly. And more often than not, when they’re trying to switch to something else, from what I’ve seen which is a limited view of the world, they tend to want to either switch to Python or to the C or C++ stack. For that reason, if a group is already using R, fantastic, I’m not going to talk anybody out of anything. …
But you’re not going to start out using R code?
I’m not going to start up fresh with it because, if people don’t have the mindset from the beginning of planning ahead for taking the code to production, people have been getting surprised. Groups like Revolution Analytics have tried their darnedest to deliver tools to help people achieve performance with R, and that’s helped lots of groups, but it’s not able to help everyone.
Do you see people moving into SPARK or is it Python and C++ that are the two preferred tools to use?
Again, I don’t have an explanation for that, and it may purely be just what I happened to be exposed to. I haven’t met a group that decided “We need to move out of R into SPARK, or we need to move into Scala.” I don’t know why that is.
Facebook is, is one of the most publicly visible users of PHP. They have gone to the extent of writing their own compilers for it, because it was part of their framework from the ground up. They’ve invested a huge amount of effort to try and squeeze every last ounce of performance that they can out of it. They have publicly talked about different aspects of their efforts to transition from PHP to Python. And when you see big companies like that transition to Python, it probably does influence others to think, “Ooh, I’ve got to get me some of that.”
Or maybe the same reasons that drove Facebook to make that switch, others may be seeing those factors, also.
That’s the thought as well. Facebook clearly sat down and they thought about it. They probably had no shortage of arguments over what they should switch to before they finally made that choice. And stories like that are highly influential, especially for smaller groups that don’t have the time to put a dozen people on studying that sort of thing.
I was also interested to learn today that you are one of the core committers on Python. How did that come about?
Multiple different funny ways, but it came, first of all, from doing a little too much Python coding.
[laughs] You have to watch out for that.
Secondly, I took aside one of the existing long standing Python core committers, and said, “I really think something needs to be done about this particular thing.” That person knew me pretty well, so the response that I got was not just one of “Yeah, yeah.” It was more like, “You’re right. That really needs some TLC. Would you be interested in helping in a very serious way?”
And my initial answer was “No. No, no. Not at all.”
That was not the purpose of this conversation. The purpose was to make you aware of an issue.
To get you to fix it, not to make me fix it.
No, I was thinking of pitching in. But he was bringing up the notion of something much more long term. I was thinking, how can I help in a short-term way.
But you got volunteered.
The idea slowly grew on me. To become one of the core developers is not a highly formalized process. It’s a tight-knit group. One person could easily poison the well, so to speak. Finding people with the right style of insanity that are there to try and move Python in a positive direction is important. The core developers have a public perception of being highly approachable friendly, easy-to-talk-to folks on top of that. Trying to find that combination of characteristics is difficult, and not something that they flippantly do even on just one person’s recommendation.
I can see where that’s a challenge.
Don’t miss the next discussion with Davin Potts, we’ll dive in on how KNIME makes a data science consultant’s life easier, and the advantages of in-database analytics.
Learn more about Python on Github
Learn more about the Vertica-Python