VerticaPy

Python API for Vertica Data Science at Scale

Prerequires


Python 3.7-3.9


VerticaPy runs with Python 3.6-3.9 (it can work with other versions but it is not yet supported). You can install Python directly from their website.


Jupyter Lab


If you want to have a nice environment to play with, we recommend you to install the last Jupyter version. You can find all the information to install it in their website.


Vertica 8.1+


VerticaPy relies on Vertica 8.1 or any upper version.


Vertica is the most advanced analytics data warehouse based on a massively scalable architecture. It features the broadest set of analytical functions spanning event and time series, geospatial and end-to-end in-database machine learning, and pattern matching. Vertica lets you to easily apply these powerful functions to the largest and most demanding analytical workloads, arming you and your customers with predictive business insights faster than any other analytics data warehouse on the market.


Vertica provides a unified analytics platform across major public clouds and on-premises data centers and integrates data in cloud object storage and HDFS without forcing you to move any of your data.


To learn more about the Vertica database, check out the Vertica Official Website.


If you already have Vertica installed, you can skip this step. Otherwise, you have some options for trying out Vertica for free.

Install VerticaPy

To install VerticaPy with all the depenedencies, including some geospatial packages such as GeoPandas and Descartes, run the following pip command:

root@ubuntu:~$ pip3 install verticapy[all]

If you do not want to install extra dependencies, you can use the following command:

root@ubuntu:~$ pip3 install verticapy

To start playing with the API, create a new connection:

In [2]:
import verticapy as vp
vp.new_connection({"host": "10.211.55.14", 
                   "port": "5433", 
                   "database": "testdb", 
                   "password": "XxX", 
                   "user": "dbadmin"},
                   name = "My_New_Vertica_Connection")

After a connection is created, you can use the connect function to reconnect.

In [3]:
vp.connect("My_New_Vertica_Connection")

Create a VerticaPy schema for native VerticaPy models (that is, models available in VerticaPy, but not Vertica itself):

In [2]:
vp.create_verticapy_schema()

You can now start playing with the library! A good starting point is the Quick Start guide.