JDBC Routable Query API: Best Practices
About the JDBC Routable Query API
The JDBC driver can route queries directly to a single node using a routable connection. Known previously as the JDBC Key/Value API, this routable query was renamed to JDBC Routable Query API, effective in Vertica 7.1 SP1. The JDBC Routable Query API introduces the VerticaRoutableExecutor
interface. This interface allows you to use traditional SQL to query a single node. It supports table joins and the GROUP BY clause.
This JDBC Routable Query API: Best Practices guide uses Vertica Server 7.1SP1 (7.1.1) and Vertica JDBC Driver 7.1SP1 (7.1.1) for running examples and obtaining results.
This document assumes you are already familiar with both Vertica and Java programming.
Benefits of Using the JDBC Routable Query API
The JDBC Routable Query API is ideal for high-volume short requests that return a small number of results on a single node. The Routable Query API provides lower latency, increases throughput, and uses fewer system resources than distributed queries. However, you must segment the data such that the JDBC client can determine on which node the data resides. The following table shows the increase in throughput by using the Routable Query API compared to not using the Routable Query API.
Parameters | Not Using the Routable Query API | Using the Routable Query API |
---|---|---|
Clusters and cores |
3-node cluster with 24 cores |
3-node cluster with 24 cores |
Concurrency |
24 |
96 |
Latency |
2 |
0.5 |
Throughput |
12 queries per second |
192 queries per second |
A typical analytic Vertica query requires dense computation on data across all the nodes in the cluster. You can improve the performance by having all the nodes involved in the planning and the execution phases of the queries. The following graphic shows an unsegmented query directed to multiple nodes. The query sends the query plan to other nodes to compute the query and sends the results back to the client. Unsegmenting the query and computing it on multiple nodes increases the execution time.
The Routable Query API executes the query on a single node that contains the data for high-volume queries. The table must be segmented by at least one column. You can also segment the tables by multiple columns. This segmentation allows the client to directly connect to that node and fulfill requests. The following graphic shows a segmented query directed to a specific node. The node that has specific data computes the query and sends the results back to the client. The segmented query reduces computation due to column segmentation.
The JDBC Routable Query Feature Support
Since Vertica Server 7.0 SP1, the Routable Query API has supported flex tables and live aggregate projections. However, Vertica does not support the addOutputExpression()
method when querying the flex tables.
The Routable Query API supports the following data types.
Supported | Not Supported |
---|---|
BOOLEAN CHAR DATE DOUBLE FLOAT DOUBLE REAL LONG (BIGINT) LONG VARCHAR NUMERIC TIME TIMESTAMP VARCHAR |
BINARY INTERVALS LONG VARBINARY TIMESTAMPTZ TIMETZ VARBINARY |
Creating Tables and Projections for the JDBC Routable Query API
For routable queries, the client should determine the appropriate node to fetch the data. To do so, the client compares all the projections available for the table and determines the best projection to find the single node that contains the data.
To use the JDBC Routable Query API, you must:
- Segment the projection by hash for the routable queries.
- Define a segmentation key using the significant columns on at least one table.
You can create new tables and projections to segment the data or use the existing projections with the segmented data. Be sure to verify the segmented data in existing projections before using them.
Creating New Tables for the JDBC Routable Query API
To create a table for use with the Routable Query API, segment by hash the table on a uniformly distributed column. You can accelerate your search results by sorting the projection on the segmented columns, as shown in the following code snippet.
=> CREATE TABLE s1.table1 ( customer_key INT NOT NULL, date_key INT, username CHAR(100) SEGMENTED BY HASH(customer_key) ALL NODES;
This table is segmented based on the customer_key
column and sorted by the customer_key
column to accelerate the search results. You are not limited to segmenting with one column. Instead, you can segment by multiple columns.
Creating New Projections for the JDBC Routable Query API
To create a projection for use with the Routable Query API, segment by hash the projection on a uniformly distributed column. You can accelerate your search results by sorting the projection on the segmented columns, as shown in the following code snippet.
=> CREATE TABLE s1.table1 ( customer_key INT NOT NULL, date_key INT, username CHAR(100) ; => CREATE PROJECTION s1.table1_p1 AS (SELECT * FROM s1.table1) SEGMENTED BY HASH(customer_key) ALL NODES;
This projection is segmented based on the customer_key
column and sorted by the customer_key
column to accelerate the search results.
Can I Use Existing Projections?
If you have existing tables that are segmented by hash (for example, on a customer_key
column), you can determine what predicates are needed to query the table. To do so, use the SELECT GET_TABLE_PROJECTIONS('tablename') command to view the projections associated with the table. You can then verify the existing projections using either of these functions:
- GET_TABLE_PROJECTIONS
- EXPORT_OBJECTS
If you use the GET_TABLE_PROJECTIONS function, the output consists of the segmented column as Seg Cols
. If you use the EXPORT_OBJECT function, the output consists of the segmented column as SEGMENTED BY HASH
. Both Segs Cols
and SEGMENTED BY HASH
are the same.
Using the GET_TABLE_PROJECTIONS Function
The GET_TABLE_PROJECTIONS function returns information relevant to the status of a table. To verify the segmented column, check Seg Cols
in the table, as shown in the following code snippet.
=> SELECT GET_TABLE_PROJECTIONS('s1.table1'); GET_TABLE_PROJECTIONS Current system K is 1. # of Nodes: 4. Table s1.table1 has 2 projections. Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats] s1.table1_b1 [Segmented: Yes] [Seg Cols: "s1.table1.customer_key"] [K: 1] [s1.table1_b0] [Safe: Yes] [UptoDate: Yes] [Stats: No] s1.table1_b0 [Segmented: Yes] [Seg Cols: "s1.table1.customer_key"] [K: 1] [s1.table1_b1] [Safe: Yes] [UptoDate: Yes] [Stats: No]
Using the EXPORT_OBJECTS Function
The EXPORT_OBJECTS function generates a SQL script that you can use to recreate the catalog objects on a different cluster. The EXPORT_OBJECTS function always attempts to recreate the projection statements with KSAFE clauses, if they existed in the original definitions. To verify the segmented column, check the SEGMENTED BY clause of projection DDLs, as shown in the following code snippet.
=> SELECT EXPORT_OBJECTS('','s1.table1'); EXPORT_OBJECTS CREATE TABLE s1.table1 ( customer_key INT NOT NULL, date_key INT, username CHAR(100)) ); CREATE PROJECTION s1.table1 /*+createtype(P)*/ ( customer_key, date_key, username ) AS SELECT table1.customer_key, table1.date_key, table1.username FROM s1.table1 ORDER BY table1.customer_key, table1.date_key, table1.username SEGMENTED BY HASH(table1.customer_key) ALL NODES KSAFE 1; SELECT MARK_DESIGN_KSAFE(1);
Creating a Connection for the JDBC Routable Query API
To interact with Vertica, the Java application must create a connection. Connecting to Vertica using JDBC is similar to connecting to most other databases. The Routable Query API provides the VerticaRoutableConnection
interface with advanced routing capabilities to connect to a cluster. You can create a routable connection using one of the following:
- A JDBC data source
- The DriverManager API
Both methods result in an identical connection object conn
.
Using a JDBC Data Source
You enable access to the VerticaRoutableConnection
interface by setting the EnableRoutableQueries
JDBC connection property to true, as shown in the following code snippet.
com.vertica.jdbc.DataSource jdbcSettings = new com.vertica.jdbc.DataSource(); jdbcSettings.setDatabase("dbname"); jdbcSettings.setHost("ipaddress"); jdbcSettings.setUserID(“username"); jdbcSettings.setPassword(“password"); jdbcSettings.setEnableRoutableQueries(true); jdbcSettings.setPort(5433); VerticaRoutableConnection conn = (VerticaRoutableConnection) jdbcSettings.getConnection();
Using the DriverManager API
You enable access to the VerticaRoutableConnection
interface by setting the EnableRoutableQueries
JDBC connection property to true, as shown in the following code snippet.
VerticaRoutableConnection conn = (VerticaRoutableConnection) DriverManager.getConnection("jdbc:vertica://ipaddress:5433/dbname?user=username&password=password&EnableRoutableQueries=true");
PooledConnection Interface
The Routable Query API adds the PooledConnection interface settings to the Vertica JDBC connection class. You may need to change the PooledConnection interface settings because the defaults are restrictive. MaxPooledConnections and MaxPooledConnectionsPerNode are a subset of the PooledConnection interface:
- MaxPooledConnections: Cluster-wide maximum number of connections to keep in the internal pool stored in VerticaRoutableConnection. The default is 20.
jdbcSettings.setMaxPooledConnections(200);
- MaxPooledConnectionsPerNode: Per-node maximum number of connections to keep in the internal pool stored in VerticaRoutableConnection. The default is 5.
jdbcSettings.setMaxPooledConnectionsPerNode(24);
Creating a Query for the JDBC Routable Query API
The VerticaRoutableConnection
interface provides access to the VerticaRoutableExecutor
interface and the VGet
interface. The Routable Query API provides two interfaces for performing routable queries:
- The VerticaRoutableExecutor interface provides a powerful SQL-based API that adds flexibility to the API.
- The VGet interface provides a more structured API for programmatic access.
Both the VerticaRoutableExecutor interface and the VGet interface directly query the Vertica nodes that have all the data needed for the query. Thus, you can avoid the distributed planning and the execution costs associated with a normal Vertica execution.
Using the VGet Interface
You can access the table data directly from a single node when you do not need to join the tables or use a GROUP BY clause. Using the VGet
interface helps you perform key/value lookups on single tables. The VGet
interface does not use traditional SQL syntax. Instead, the VGet
interface allows you to build a data structure by defining the predicates and the outputs.
VGet get = conn.prepareGet( “s1”/*schema*/, “table1” /*table or projection*/); get.addOutputColumn(“customer_key"); get.addOutputColumn(“date_key"); get.addOutputColumn(“username"); get.addPredicate(“customer_key", 1); get.setLimit(1); ResultSet rs = get.execute();
Using the VerticaRoutableExecutor Interface
If you need to join tables or use a GROUP BY clause, use the VerticaRoutableExecutor
interface. This interface uses traditional SQL with a reduced feature set to query data on a single node.
String schema = "s1"; String table = "table1"; VerticaRoutableExecutor q = conn.createRoutableExecutor(schema, table); String column = "customer_key"; Integer value = 1; Map<String, Integer> map = new HashMap<String, Integer>(); map.put(column, value); String query = "select customer_key, date_key, username "; query += " from s1.table1"; query += " where a = " + map.get("a") ; ResultSet rs = q.execute(query, column, value);
Verify That the API Runs on a Single Node
Set the setFailOnMultiNodePlans
to true to check if the Routable Query API works only on one node. If the parameter is set to true, the query fails if it does not use the Routable Query API.
jdbcSettings.setFailOnMultiNodePlans(true);
Alternatively, you can check the query logs. If a query uses the Routable Query API, the query logs show /* +KV */ as an identifier.
Managing Resources for the JDBC Routable Query API
The resources you allocate to a query determine the performance of that query. Vertica resource management helps you allocate resources to best suit your business needs. To optimize the performance of short queries, create a dedicated resource pool. Different parameters specified in the resource pool help you optimize the resource pool statement.
Create a resource pool for a user as follows. This procedure assumes you have created a user named vmart_user:
- Create a dedicated resource pool for the user running the query. Set the EXECUTIONPARALLELISM parameter to 1 to force single-threaded queries. This setting improves the routable query performance.
=> CREATE RESOURCE POOL vmart_pool executionparallelism 1;
- Grant your user permission to access the resource pool.
=> GRANT USAGE ON RESOURCE POOL vmart_pool TO vmart_user;
- Associate the user to the resource pool.
=> ALTER USER vmart_user RESOURCE POOL vmart_pool;
Performance Test Results
Performance testing on the JDBC Routable Query API used the following configurations:
Component | Conditions |
---|---|
Server Specification |
HPE ProLiant DL380 Gen9 CPU: Intel XeonE5-2690v3 (2.60GHz, 2P/24C, 64bit) Memory: 256GB Disk: OS:600GBx2(RAID1) Data:600GBx14(RAID10) |
Operating System |
RedHat Enterprise Linux6 Update5 |
Vertica Database |
Vertica Analytic Database 7.1.1-10 |
Vertica JDBC Driver |
Vertica JDBC Driver 7.1.1-3 |
JDBC Routable Query API Query |
SELECT date_key, customer_key, transaction_time, product_key FROM store.store_sales_fact WHERE date_key=?; |
JDBC Connection Settings |
setMaxPooledConnections = 200 setMaxPooledConnectionsPerNode = 24 |
Resource Pool Settings |
EXECUTIONPARALLELISM = 1 |
Data Set Size |
359 MB |
Average Number of Rows per date_key Column |
2738 |
Testing occured for 8 nodes, followed by 4 nodes, 2 nodes, and 1 node. The following graph shows the performance results for 1 node to 8 nodes using the Routable Query API.
Important recommendations:
- Configuring your servers and components that best suit your business needs to achieve optimal performance.
- Adding nodes to achieve higher throughput.
For More Information
See Routing JDBC Queries Directly to a Single Node in the Vertica documentation.