Python API for Vertica Data Science at Scale


Welcome to our lessons on VerticaPy. My name is Badr Ouali and I'll be guiding you through the data science life cycle and introducing you to the VerticaPy library.

During these lessons, we'll also work key aspects of the data science life cycle and how it relates to VerticaPy. Each lesson is completable in about 10 minutes.

Setting Up the environment

To begin, you must:

  • Have access to a machine that has Vertica installed
  • Install Python on your machine
  • Install VerticaPy

When everything is ready, you must connect to the Vertica database. You can do this with several database cursors:

  • Native (with vertica_python)
  • ODBC (with pyodbc)
  • JDBC (with jaydebeapi)

Each step is covered in our installation page.

First Steps

For example, let's create a connection using vertica_python.

In [6]:
import vertica_python

# Connection using all the DSN information
conn_info = {'host': "", 
             'port': 5433, 
             'user': "dbadmin", 
             'password': "XxX", 
             'database': "testdb"}
cur = vertica_python.connect(** conn_info).cursor()

Most functions in VerticaPy use a database cursor to perform queries. We can save this connection in the VerticaPy directory.

In [7]:
from verticapy.connect import *
# Save a new connection
new_auto_connection({"host": "", 
                     "port": "5433", 
                     "database": "testdb", 
                     "password": "XxX", 
                     "user": "dbadmin"},
                    name = "VerticaDSN")
# Set the main auto connection

When no cursor is defined, VerticaPy will automatically create a new one using the saved credentials. Keep in mind: when using an auto-connection like this, some connections might not get closed.

As you can see, the arguments for some parameters have to be formatted a certain way. If you forget how things should look, use the help module.

In [22]:
Help on function new_auto_connection in module verticapy.connect:

new_auto_connection(dsn:dict, name:str='DSN')
    Saves a connection to automatically create DB cursors. This will create a 
    used-as-needed file to automatically set up a connection, avoiding redundant 
    dsn: dict
            Dictionnary containing the information to set up the connection.
                    database : Database Name
                    host     : Server ID
                    password : User Password
                    port     : Database Port (optional, default: 5433)
                    user     : User ID (optional, default: dbadmin)
    name: str, optional
            Name of the auto connection.
    See Also
    change_auto_connection : Changes the current auto creation.
    read_auto_connect      : Automatically creates a connection.
    vertica_conn           : Creates a Vertica Database connection.

You can also use the vHelp function.

In [14]:
from verticapy.utilities import vHelp

📢 Welcome to the VERTICAPY help Module. You are about to use a new fantastic way to analyze your data !

You can learn quickly how to set up a connection, how to create a Virtual DataFrame and much more.

What do you want to know?

  • [Enter 0] Do you want to know why you should use this library ?
  • [Enter 1] You don't have data to play with and you want to load an available dataset ?
  • [Enter 2] Do you want to look at a quick example ?
  • [Enter 3] Do you want to get a link to the VERTICAPY github ?
  • [Enter 4] Do you want to know how to display the Virtual DataFrame SQL code generation and the time elapsed to run the query ?
  • [Enter 5] Do you want to know how to load your own dataset inside Vertica ?
  • [Enter 6] Do you want to know how you writing direct SQL queries in Jupyter ?
  • [Enter -1] Exit

And with that, you're ready to start using VerticaPy!