VerticaPy

Python API for Vertica Data Science at Scale

Welcome

Welcome to our lessons on VerticaPy. My name is Badr Ouali and I'll be guiding you through the data science life cycle and introducing you to the VerticaPy library.

During these lessons, we'll also work key aspects of the data science life cycle and how it relates to VerticaPy. Each lesson is completable in about 10 minutes.

Setting Up the environment

To begin, you must:

  • Have access to a machine that has Vertica installed
  • Install Python on your machine
  • Install VerticaPy

Each step is covered in our installation page.

First Steps

Throughout these tutorials, we will use the following VerticaPy version.

In [69]:
vp.__version__
Out[69]:
'0.9.0'

To get started, you must connect to a Vertica database.

If you have previously created a connection, use the connect function to reconnect.

In [66]:
vp.connect("Vertica_New_Connection")

To connect to a Vertica database for the first time, you can use the new_connection function.

In [70]:
import verticapy as vp
vp.new_connection({"host": "10.211.55.14", 
                   "port": "5433", 
                   "database": "testdb", 
                   "password": "XxX", 
                   "user": "dbadmin"},
                   name = "Vertica_New_Connection")

The arguments for some parameters require specific formatting. To view these requirements, you can use the help module.

In [67]:
help(vp.new_connection)
Help on function new_connection in module verticapy.connect:

new_connection(conn_info:dict, name:str='vertica_connection', auto:bool=True, overwrite:bool=True)
    ---------------------------------------------------------------------------
    Saves the new connection in the VerticaPy connection file.
    The function 'get_connection_file' returns the connection file path.
    
    Parameters
    ----------
    conn_info: dict
            Dictionnary containing the information to set up the connection.
                    database : Database Name.
                    host     : Server ID.
                    password : User Password.
                    port     : Database Port (optional, default: 5433).
                    user     : User ID (optional, default: dbadmin).
            ...
    name: str, optional
            Name of the connection.
    auto: bool, optional
        If set to True, the connection will become the new auto-connection.
    overwrite: bool, optional
        If set to True and the connection already exists, it will be 
        overwritten.

You can also use the help_start function.

In [68]:
vp.help_start()

📢 Welcome to the VerticaPy help module.

From here, you can learn how to connect to Vertica, create a Virtual DataFrame, load your data, and more.

  • [Enter 0] Overview of the library
  • [Enter 1] Load an example dataset
  • [Enter 2] View an example of data analysis with VerticaPy
  • [Enter 3] Contribute on GitHub
  • [Enter 4] View the SQL code generated by a vDataFrame and the time elapsed for the query
  • [Enter 5] Load your own dataset into Vertica
  • [Enter 6] Write SQL queries in Jupyter
  • [Enter -1] Exit
0

And with that, you're ready to start using VerticaPy!