
VerticaPy
VerticaPy
VerticaPy is a Python library that exposes scikit-like functionality to conduct data science projects on data stored in Vertica, taking advantage Vertica’s speed and built-in analytics and machine learning capabilities. It supports the entire data science life cycle, uses a ‘pipeline’ mechanism to sequentialize data transformation operations (called Virtual Dataframe), and offers several options for graphical rendering.
Advantages
-
Vertica Scalability
All computationally-heavy operations are pushed to Vertica. Get the advantages of a columnar MPP infrastructure custom-built for analytics. -
Python Flexibility
The API is coded entirely in Python and generates SQL code on-the-fly to interact with your database. -
Pandas like Functionalities
The vDataFrame object provides functionality similar to the DataFrame in pandas, but pushes all the heavy computations to Vertica, allowing a seamless transition from small to big data. -
In-Database Data Science
VerticaPy takes advantage of Vertica in-database ML and advanced SQL functionnality, providing plenty of abstractions to simplify the data science process.
Features
VerticaPy is the perfect blend of the scalability of Vertica and the flexibility of Python, bringing a unique and indispensible set of data science tools.
Explore your data.
Prepare your data with advanced Features Engineering using Advanced Analytical Functions and Moving Windows.
Create a model with the highly scalable Vertica ML. Effortlessly build and evaluate models that optimize for efficiency and performance using many of the in-database, scalable ML algorithims.
This all takes place where it should: your database. By aggregating your data with Vertica, you can build, analyze, and model anything without modifying your data.