VerticaPy

Python API for Vertica Data Science at Scale

Installation

If you already have Vertica and Python3 installed in your environment. Run the following command to install VerticaPy.


pip3 install verticapy

Vertica Installation

Vertica is the most advanced analytics data warehouse based on a massively scalable architecture. It features the broadest set of analytical functions spanning event and time series, geospatial and end-to-end in-database machine learning, and pattern matching. Vertica lets you to easily apply these powerful functions to the largest and most demanding analytical workloads, arming you and your customers with predictive business insights faster than any other analytics data warehouse on the market.


Vertica provides a unified analytics platform across major public clouds and on-premises data centers and integrates data in cloud object storage and HDFS without forcing you to move any of your data.


To learn more about the Vertica database, check out the Vertica Official Website.


If you already have Vertica installed, you can skip this step. Otherwise, you have some options for trying out Vertica for free.

Making the VM IP address static - Optional

If you want to work on your own OS (instead of through the VM), you'll have to open a connection between your OS and your VM. Open the parameters of your VM and create an additional adapter. In the Attached to: dropdown, select Bridged Adapter.



View the IP address of your machine using the ifconfig command in your terminal.



My local inet IP address is 192.168.1.8, so let's pick 192.168.1.150 as the static IP address for my VM (I could choose 192.168.1.x - the important thing is that the IP address I choose isn't already taken by another machine). Let's make our VM IP address static.

  1. In your VM, click on the system tray on the top right and select Wired Settings
  2. In the Network window, click on the gear icon.
  3. Click on IPv4.
  4. Enter your preferred static IP address for the VM in the Address box.
  5. Click Apply.
  6. In the Network window, turn the connection off and then back on.
  7. Verify that ipconfig reflects your VM's static IP address.




Now you can interact with your VM through your primary OS by connecting to it using SSH. In my case, I chose 192.168.1.150 for my VM's static IP address, so I would connect to: dbadmin@192.168.1.150

Create a DataBase DSN (Data Source Name) - Optional

Creating a DSN can save you a lot of time when working with databases. To create a DSN, you can follow the instructions in the Vertica documentation.


DSN configuration is a little bit more complicated in MacOS.


Vertica includes its own drivers for every major platform. Download the drivers from the Vertica website.


Select your version of Vertica and download the corresponding package to install in your operating system. Vertica client drivers after and 8.1.0 offer backwards compatiblity. So, for example, a Vertica 8.1.1 database is compatible with version 9 client drivers.

Install the package by double-clicking in the file. Since the package was downloaded from an unknown developer, MacOS will ask for permission to install it. To finish the installation, go to System Preferences > Security and Privacy > General and accept the installation.


By default, the drivers install to the directory "/Library/Vertica/ODBC/lib".


root@ubuntu:~$ ls /Library/Vertica/ODBC/lib/

There are two files:

After installing the drivers, we should modify some configuration files:

The easiest way to do this is to use ODBC Administrator, a GUI Tool to configure the odbc.ini and odbcinst.ini The ODBC Administrator can be downloaded from the Apple website.

After installing and running the ODBC Administrator, you can add a DSN:



We confirm that the DSN was added by checking the contents of odbc.ini.


root@ubuntu:~$ cat ~/Library/ODBC/odbc.ini

# Output
[ODBC Data Sources]
VMart = Vertica
MLTesting = Vertica
MyMLTesting = Vertica

[ODBC]
Trace = 0
TraceAutoStop = 0
TraceFile =
TraceLibrary =
?.

[MLTesting]
Driver = /Library/Vertica/ODBC/lib/libverticaodbc.dylib
Description = MLTesting on Azure
Servername = vazure
Database = MLTesting
UID = dbadmin
PWD =
Port = 5433

The vertica.ini configuration file is preconfigured, but you can copy it to the same directory as the other ODBC files.


root@ubuntu:~$ cp /Library/Vertica/ODBC/lib/vertica.ini ~/Library/ODBC/

# Change the encoding parameter from UTF-32 to UTF-16 in the copied vertica.ini.
[Driver]
ErrorMessagesPath=/Library/Vertica/ODBC/messages/
ODBCInstLib=/usr/lib/libiodbcinst.dylib
DriverManagerEncoding=UTF-16

Add the environment variables VERTICAINI and ODBCINI. You can add these to your bash_profile (or /etc/profile for a system-wide change).


root@ubuntu:~$ vim ~/.bash_profile

## Parameters for ODBC-Vertica compatibility
[?]
export ODBCINI=/Library/ODBC/odbc.ini
export VERTICAINI=/Library/ODBC/vertica.ini
[?]

Install Python3

Installing Python3 is as easy as downloading a file. Follow the instructions in the Python website.

Install Jupyter - Optional

Jupyter offers a really beautiful interface interact with Python. You can install Jupyter by following the instructions in the Jupyter website.

Install VerticaPy

To install VerticaPy, run the following pip command:

root@ubuntu:~$ pip3 install verticapy

To connect to Vertica, you'll need to install one of the following modules:

  • vertica_python (Native Python Client)
  • pyodbc (ODBC)
  • jaydebeapi (JDBC)

For example, to install the vertica_python module, run the following command:

root@ubuntu:~$ pip3 install vertica_python

If you have have a data source name (DSN), you can connect to your Vertica database with the following command.

In [1]:
#
#
# vertica_cursor
#
from verticapy import vertica_conn
cur = vertica_conn("VerticaDSN").cursor()

To connect with the Vertica Native Python Client vertica_python, you can either provide the full set of credentials or the DSN.

In [4]:
#
#
# vertica_python
#
import vertica_python

# Connecting with DSN credentials
conn_info = {'host': "10.211.55.14", 
             'port': 5433, 
             'user': "dbadmin", 
             'password': "XxX", 
             'database': "testdb"}
cur = vertica_python.connect(** conn_info).cursor()

# Connecting with the DSN
from verticapy.connections.connect import to_vertica_python_format 
dsn = "VerticaDSN"
cur = vertica_python.connect(** to_vertica_python_format(dsn), autocommit = True).cursor()

To create an ODBC connection with pyodbc, you can either provide the full set of credentials or the DSN.

In [ ]:
#
#
# pyodbc
#
import pyodbc

# Connecting with DSN credentials
driver = "/Library/Vertica/ODBC/lib/libverticaodbc.dylib"
server = "10.211.55.14"
database = "testdb"
port = "5433"
uid = "dbadmin"
pwd = "XxX"
dsn = ("DRIVER={}; SERVER={}; DATABASE={}; PORT={}; UID={}; PWD={};").format(driver, 
                                                                             server, 
                                                                             database, 
                                                                             port, 
                                                                             uid, 
                                                                             pwd)
cur = pyodbc.connect(dsn).cursor()

# Connecting with the DSN
dsn = ("DSN=VerticaDSN")
cur = pyodbc.connect(dsn, autocommit = True).cursor()

Setting up a JDBC connection with jaydebeapi.

In [ ]:
#
#
# jaydebeapi
#
import jaydebeapi

# Vertica Server Details
database = "testdb"
hostname = "10.211.55.14"
port = "5433"
uid = "dbadmin"
pwd = "XxX"

# Vertica JDBC class name
jdbc_driver_name = "com.vertica.jdbc.Driver"

# Vertica JDBC driver path
jdbc_driver_loc = "/Library/Vertica/JDBC/vertica-jdbc-9.3.1-0.jar"

# JDBC connection string
connection_string = 'jdbc:vertica://' + hostname + ':' + port + '/' + database
url = '{}:user={};password={}'.format(connection_string, uid, pwd)
conn = jaydebeapi.connect(jdbc_driver_name, 
                          connection_string, 
                          {'user': uid, 'password': pwd}, 
                          jars = jdbc_driver_loc)
cur = conn.cursor()

You can save your credentials in VerticaPy to prevent the creation of redundant cursors.

In [5]:
from verticapy.connections.connect import *
# Save a new connection
new_auto_connection({"host": "10.211.55.14", 
                     "port": "5433", 
                     "database": "testdb", 
                     "password": "XxX", 
                     "user": "dbadmin"},
                    name = "VerticaDSN")
# Set the primary auto-connection
change_auto_connection("VerticaDSN")