Connection#
Various Methods to Connect with VerticaPy
Discover the versatility of connecting to VerticaPy through a multitude of methods, providing users with flexibility tailored to their preferences. With a user-friendly approach, VerticaPy offers seamless connections, allowing users to effortlessly integrate their Python workflows with Vertica’s powerful backend. Whether you’re navigating the parameters of the vertica_python client, exploring auto connections with automatic credential handling, or opting for a more manual setup, VerticaPy caters to different levels of expertise and security preferences.
Important
In the next sections, we will use placeholders such as your_host
, your_user
, your_database
, etc. Please replace them with your actual connection details.
Vertica Python Client Parameters#
Important
This section introduces the vertica_python
client connector and covers all possible parameters. It is used to establish the foundation for the final VerticaPy connection. If you prefer to directly learn how to create a VerticaPy connection, you can skip this part.
The new_connection()
function in VerticaPy relies on the vertica_python
native client, a powerful tool designed for seamless integration with the VerticaPy ecosystem. This native client acts as a bridge between VerticaPy and the underlying Vertica database, facilitating efficient communication and data retrieval.
Note
Visit the official vertica_python GitHub for more information.
The conn_info
parameter utilized by the new_connection()
function is a dictionary that aligns with the parameters supported by the vertica_python
client. These parameters encompass various connection details, such as the host, port, database name, user credentials, and more. Each parameter in the conn_info
dictionary corresponds to a specific aspect of the connection setup, ensuring users have fine-grained control over their connectivity preferences.
Hint
The example below shows how to create a conn_info
dictionary:
conn_info = {
'host': 'your_host', # ex: 127.0.0.1
'port': 5433,
'user': 'your_user', # ex: dbadmin
'password': 'some_password',
'database': 'your_database', # ex: testdb
# autogenerated session label by default,
'session_label': 'some_label',
# default throw error on invalid UTF-8 results
'unicode_error': 'strict',
# SSL is disabled by default
'ssl': False,
# autocommit is off by default
'autocommit': True,
# using server-side prepared statements is disabled by default
'use_prepared_statements': False,
# connection timeout is not enabled by default
# 5 seconds timeout for a socket operation
# (Establishing a TCP connection or read/write operation)
'connection_timeout': 5,
}
Discover the details of the majority of supported parameters in the table below.
Connection Option |
Description |
host |
The server host of the connection. This can be a host name or an IP address. Default: “localhost” |
port |
The port of the connection. Default: 5433 |
user |
The database user name to use to connect to the database. Default: OS login user name |
password |
The password to use to log into the database. Default: “” |
database |
The database name. Default: “” |
autocommit |
Autocommit. Default: False |
backup_server_node |
Connection Failover. Default: [] |
binary_transfer |
Data Transfer Format. Default: False (use text format transfer) |
connection_load_balance |
Connection Load Balancing. Default: False (disabled) |
connection_timeout |
The number of seconds (can be a nonnegative floating point number) the client waits for a socket operation (Establishing a TCP connection or read/write operation). Default: None (no timeout) |
disable_copy_local |
Whether copying local files into the database is disabled. Default: False |
kerberos_host_name |
Kerberos Authentication. Default: the value of connection option host |
kerberos_service_name |
Kerberos Authentication. Default: “vertica” |
log_level |
Logging Level. |
log_path |
Logging Path. |
oauth_access_token |
To authenticate via OAuth, provide an OAuth Access Token that authorizes a user to the database. Default: “” |
request_complex_types |
SQL Data conversion to Python objects. Default: True |
session_label |
Sets a label for the connection on the server. This value appears in the client_label column of the v_monitor.sessions system table. Default: an auto-generated label with format of verticapy-{version}-{random_uuid} |
ssl |
TLS/SSL. Default: False (disabled) |
unicode_error |
UTF-8 encoding issues. Default: ‘ignore’ (ignore invalid UTF-8 results) |
use_prepared_statements |
Passing parameters to SQL queries. Default: False |
workload |
Sets the workload name associated with this session. Valid values are workload names that already exist in a workload routing rule on the server. If a workload name that doesn’t exist is entered, the server will reject it and it will be set to the default. Default: “” |
dsn |
Set Properties with Connection String. |
For additional examples and more in-depth information, please refer to the official vertica_python GitHub
Additionally, the new_connection()
function introduces an extra parameter, env
which holds significance in enhancing connection security. This parameter, discussed in more detail later, enables users to configure an environment variable within the conn_info
dictionary, offering an additional layer of control and customization for a secure and tailored connection experience.
Advanced Authentication Options (Kerberos & OAuth)#
VerticaPy supports authentication via OAuth and Kerberos, offering secure and versatile options for accessing your Vertica database. This section provides a step-by-step guide on setting up authentication using OAuth access tokens or Kerberos authentication.
OAuth Authentication#
To authenticate via OAuth, follow these steps:
Obtain OAuth Access Token: Acquire an OAuth access token from your OAuth provider, ensuring it has the necessary permissions to access your Vertica database.
Use OAuth Access Token in Connection: Incorporate the obtained access token into your connection dictionary when establishing a connection in VerticaPy.
connection_info = {
"host": "your_host", # ex: 127.0.0.1
"port": 5433,
"user": "your_user", # ex: dbadmin
"oauth_access_token": "your_oauth_access_token",
"database": "your_database", # ex: testdb
}
Important
You may need to install additional dependencies.
Note
For additional examples and more in-depth information, please refer to the official vertica_python GitHub
Kerberos Authentication#
To authenticate via Kerberos, follow these steps:
Set Up Kerberos Credentials: Ensure you have valid Kerberos credentials configured on your system.
Update Connection Dictionary for Kerberos: Modify your connection dictionary to include the Kerberos-related parameters.
connection_info = {
"host": "your_host", # ex: 127.0.0.1
"port": 5433,
"user": "your_user", # ex: dbadmin
"kerberos_host_name": "your_kerberos_host_name",
"kerberos_service_name": "your_kerberos_service_name",
"database": "your_database", # ex: testdb
}
Important
You may need to install additional dependencies.
Note
For additional examples and more in-depth information, please refer to the official vertica_python GitHub
Auto Connection (Recommended)#
Auto connections provide a hands-free option, simplifying the user experience by automatically managing credentials. This approach enhances ease of use but requires careful consideration, as credentials are stored in plaintext. Users can easily modify auto connections using functions like change_auto_connection()
and delete_connection()
, offering both flexibility and control over their connection setup. The connection details are stored in a file determined by the VERTICAPY_CONNECTION
environment variable or in a hidden folder (.vertica
) in the home directory by default. For those prioritizing security, the option to use environment variables within the input connection dictionary ensures a more robust and protected connection without compromising simplicity.
In the provided example, we establish a new auto connection named VerticaDSN
. By utilizing the setting of auto=True
(which is also the default), the connection is automatically saved to a designated directory. The path for saving this connection is determined by the VERTICAPY_CONNECTION
environment variable. If this variable exists, the associated path is used; otherwise, the file is stored in a hidden folder named .vertica located in the home directory. This streamlined approach ensures that connections are efficiently managed and, if desired, conveniently located based on user preferences or system configurations.
# Import VerticaPy.
import verticapy as vp
# Creating a vertica_python connection directory.
conn_info = {
"host": "your_host", # ex: 127.0.0.1
"port": "5433",
"database": "your_database", # ex: testdb
"password": "XxX",
"user": "your_user", # ex: dbadmin
}
# Creating a new auto connection.
vp.new_connection(
conn_info,
name = "VerticaDSN",
auto = True,
overwrite = True,
)
Hint
Execute the following code to determine the location of the credentials file.
from verticapy.connection import get_connection_file
get_connection_file()
Important
The information is stored in plain text, which might pose a security risk. If you share your workspace with other team members, we recommend considering one of the alternative connection methods below for enhanced security.
PROS:
Convenient: Auto connections with credential storage simplify the connection process, eliminating the need for manual input of credentials.
Ease of Retrieval: Connection details can be easily retrieved.
CONS:
Credential Exposure: While credentials are securely stored, the method of storage involves a file or folder that could potentially be accessed.
Not Ideal for All Scenarios: May not be recommended for scenarios where manual credential input or specific configurations are necessary.
VerticaPy introduces an additional parameter, env
within the vertica_python
connection dictionary. This parameter, of type bool
, serves as an indicator for whether the user and password should be substituted by their corresponding environment variables. When set to True
, VerticaPy retrieves the associated environment variables in lieu of explicitly writing and utilizing the username and password within the connection. For instance, the dictionary might look like this: {'user': 'ENV_USER', 'password': 'ENV_PASSWORD'}
.
It’s important to note that this functionality is applicable exclusively to the user and password parameters. The actual values of other variables remain stored in plaintext within the VerticaPy connection file. The utilization of environment variables offers an added layer of security, especially in scenarios where the local machine is shared, ensuring that the username and password details remain hidden.
In the provided example, we employ two environment variables, namely MYENVPASS
and MYENVUSER
, to securely store the user’s password and username, respectively. This approach enhances security by ensuring that sensitive credentials are accessed through environment variables rather than being directly specified in the code. This is particularly beneficial in scenarios where security and confidentiality are paramount concerns.
# Import VerticaPy.
import verticapy as vp
# Creating a vertica_python connection directory.
conn_info = {
"host": "your_host", # ex: 127.0.0.1
"port": "5433",
"database": "your_database", # ex: testdb
"password": "MYENVPASS",
"user": "MYENVUSER",
}
# Creating a new auto connection.
vp.new_connection(
conn_info,
name = "VerticaDSN",
auto = True,
overwrite = True,
)
Hint
The following code demonstrates how to set and retrieve an environment variable in Python:
# Import the OS module.
import os
# Setting the env variable.
os.environ["MYENVUSER"] = "dbadmin"
# Getting the env variable.
os.getenv("MYENVUSER")
Out[3]: 'dbadmin'
In this example, the code sets an environment variable named MYENVUSER
with the value ‘dbadmin’ and then retrieves its value. This is a common and straightforward way to work with environment variables in Python.
PROS:
Secure Storage: User credentials are stored securely, enhancing confidentiality.
Shared Environments: Ideal for shared environments where multiple team members may need to access the same database.
CONS:
Dependency on Environment Variables: The location of the connection file relies on the
VERTICAPY_CONNECTION
environment variable, which might not be set in all environments.Limited Flexibility: Auto connections may not be suitable for scenarios requiring specific or dynamic connection configurations.
Available Connections
Before proceeding with data analytics operations in VerticaPy, it’s helpful to inspect the available connections.
# Import VerticaPy.
import verticapy as vp
# Look at all available connection
vp.available_connections()
# Output should be ['VerticaDSN'] as we've
# set up this connection in the example.
Using an Existing Connections
VerticaPy provides flexibility in using existing connections, allowing you to seamlessly integrate established connections into your workflows.
# Connect using an existing connection.
vp.connect("VerticaDSN")
Changing an Auto-Connection
If you wish to designate this connection as your primary auto connection, follow these steps:
# 'VerticaDSN2' is now the new default auto connection.
vp.change_auto_connection("VerticaDSN2")
Accessing the Connection object
You can also access the current connector object if needed.
# Getting the current connector object.
cursor = vp.current_connection().cursor()
# Executing a query.
cursor.execute("SELECT VERSION();")
# Fetching the result.
cursor.fetchall()
Important
All these methods are designed to streamline the entire connection process. Choose your methodology wisely, and thoroughly understand the associated pros and cons.
Creating a Connection Directly from a Connector#
Whether leveraging vertica_python
, exploring supported protocols, or interfacing with ODBC/JDBC connectors, VerticaPy allows users to tailor their connection method to specific needs.
Explore the various tabs to learn how to establish a connection using the specific connection.
In certain scenarios, you may need to create a connection directly from a connector in VerticaPy. This section provides a step-by-step guide on establishing a connection using a connector, offering flexibility in managing connections within your Python environment.
To create a connection directly from a cursor, follow these steps:
# Import VerticaPy.
import verticapy as vp
# Import the client.
import vertica_python
# Creating a vertica_python connection directory.
conn_info = {
"host": "your_host", # ex: 127.0.0.1
"port": "5433",
"database": "your_database", # ex: testdb
"password": "XxX",
"user": "your_user", # ex: dbadmin
}
# Setting the connection.
conn = vertica_python.connect(** conn_info)
# Linking this connection to VerticaPy.
vp.set_connection(conn)
Now, the connection conn
will be utilized throughout the entire API.
Warning
As we are directly using the connector, the ENV
parameter is not supported.
Important
This connection is not saved by VerticaPy, and it cannot be automatically reused. In the event of a connection loss, manual recreation and reconfiguration are required.
The same process can be applied to set up an ODBC connection. We create the connector and then set it in VerticaPy.
# Import VerticaPy.
import verticapy as vp
# Import the ODBC Module.
import pyodbc
# Option 1 - credentials.
# Connecting with DSN credentials
# Credentials.
driver = "/Library/Vertica/ODBC/lib/libverticaodbc.dylib"
server = "10.211.55.14"
database = "testdb"
port = "5433"
uid = "dbadmin"
pwd = "XxX"
dsn = (
"DRIVER={}; SERVER={}; DATABASE={}; PORT={}; UID={}; PWD={};"
).format(
driver,
server,
database,
port,
uid,
pwd,
)
# Connect.
conn = pyodbc.connect(dsn)
# Option 2 - DSN.
# Connecting with the DSN
# DSN.
dsn = ("DSN=VerticaDSN")
# Connect.
conn = pyodbc.connect(dsn, autocommit = True)
# Setting the connection in the VerticaPy API.
vp.set_connection(conn)
Warning
This type of connection is not recommended, as some functionalities might not be supported. With the native client, we strive to accommodate all continuous changes, including the integration of complex data types, vmaps, special data types and more…
The same process can be applied to set up an JDBC connection. We create the connector and then set it in VerticaPy.
# Import VerticaPy.
import verticapy as vp
# Import the JDBC Module.
import jaydebeapi
# Credentials.
database = "testdb"
hostname = "your_host"
port = "5433"
uid = "dbadmin"
pwd = "XxX"
# Vertica JDBC class name.
jdbc_driver_name = "com.vertica.jdbc.Driver"
# Vertica JDBC driver path.
jdbc_driver_loc = "/Library/Vertica/JDBC/vertica-jdbc-9.3.1-0.jar"
# JDBC connection string.
connection_string = 'jdbc:vertica://' + hostname + ':' + port + '/' + database
url = '{}:user={};password={}'.format(connection_string, uid, pwd)
conn = jaydebeapi.connect(
jdbc_driver_name,
connection_string,
{'user': uid, 'password': pwd},
jars = jdbc_driver_loc,
)
# Setting the connection in the VerticaPy API.
vp.set_connection(conn)
Warning
This type of connection is not recommended, as some functionalities might not be supported. With the native client, we strive to accommodate all continuous changes, including the integration of complex data types, vmaps, special data types and more…
Closing Connections#
Closing connections is a crucial step in managing your database connections with VerticaPy. This section outlines the recommended practices for closing connections once they are no longer needed, ensuring proper resource management and security.
To close a connection in VerticaPy, follow these steps:
# Import VerticaPy.
import verticapy as vp
# Close the connection.
vp.close_connection()
Note
Closing the connection promptly is essential to free up resources and avoid potential issues related to open connections.
In this comprehensive guide, we’ve explored the various facets of managing connections in VerticaPy, from creating connections and leveraging auto connections to advanced functionalities. The flexibility offered by VerticaPy allows you to tailor your approach to meet the unique requirements of your data analytics projects.
As you navigate through the connection options, it’s essential to weigh the advantages and disadvantages of each method. Whether you opt for manual connections, auto connections, or utilize existing connections, understanding the nuances ensures a seamless and secure workflow.
Remember that efficient connection management is the cornerstone of successful data analytics. By making informed choices and leveraging the capabilities of VerticaPy, you empower yourself to harness the full potential of your data.