The Vertica team is happy to share a milestone in our “VerticaPy journey”: We just reached 100 stars in our GitHub repo, and it’s growing every day. (Repo: That’s “repository” for those of you unfamiliar with GitHub.) Repos accumulate stars as an indication of user interest – think of them as bookmarks in a user’s profile. The more stars, the more evidence of a repo’s popularity and value to the community.
VerticaPy started back in 2018 as an open-source project to support the Vertica community’s Python users. “The idea behind VerticaPy is quite simple: Combine the scalability of Vertica with the flexibility of Python,” explains Vertica chief data scientist Badr Ouali. “For a while, we took one star at a time. But in these past few months, we have seen an increase in interest in our amazing Python API for Vertica Data Science at Scale.”
The development work that has gone into the VerticaPy repository is all based on a simple principle: Make data science easier and performant.
But as most GitHub enthusiasts know, it is not easy to gain wide adoption at first. Most successful Github projects are plugins or add-ons of already used technologies. Building something new and creating a new community can take years. Still, Ouali feels that since the adoption has increased in the past months, it will probably continue to increase into the next year, “especially as the VerticaPy team works to make the software easier to install and to use,” he says.
“I am very excited to be part of this amazing effort to democratize data science,” says data science developer Umar Farooq Ghumman, who has contributed significantly to the VerticaPy project. “This is an amazing project which is simplifying many complex data science tasks. We will continue to explore opportunities to keep making things simpler and user-friendly.”
Ghumman believes that VerticaPy has potential even beyond Vertica, because it incorporates some of the other Python libraries. “As new users come into Python, VerticaPy will remove friction for them as they start their journey into data science. These users could be new or entry level employees in our customer organizations, and they could even be students and researchers working with data.”
VerticaPy offers all types of algorithms – classification algorithms like Random Forest or XGBoost, regression algorithms like Linear Regression or SVM, clustering algorithms like KMeans or Bisecting KMeans, anomaly detection with algorithms such as Isolation Forest and Global ZScore and time series with ARIMA).
“It is a complete statistical package with everything for ML,” says Badr Ouali. “That includes data preparation (time series / geospatial joins, pattern matching, missing values imputation, and much more) and even data exploration (integration with Matplotlib and High Charts).”