JDBC Routable Query API: Best Practices

About the JDBC Routable Query API

The JDBC driver can route queries directly to a single node using a routable connection. Known previously as the JDBC Key/Value API, this routable query was renamed to JDBC Routable Query API, effective in Vertica 7.1 SP1. The JDBC Routable Query API introduces the VerticaRoutableExecutor interface. This interface allows you to use traditional SQL to query a single node. It supports table joins and the GROUP BY clause.

This JDBC Routable Query API: Best Practices guide uses Vertica Server 7.1SP1 (7.1.1) and Vertica JDBC Driver 7.1SP1 (7.1.1) for running examples and obtaining results.

This document assumes you are already familiar with both Vertica and Java programming.

Benefits of Using the JDBC Routable Query API

The JDBC Routable Query API is ideal for high-volume short requests that return a small number of results on a single node. The Routable Query API provides lower latency, increases throughput, and uses fewer system resources than distributed queries. However, you must segment the data such that the JDBC client can determine on which node the data resides. The following table shows the increase in throughput by using the Routable Query API compared to not using the Routable Query API.

Parameters Not Using the Routable Query API Using the Routable Query API

Clusters and cores

3-node cluster with 24 cores

3-node cluster with 24 cores

Concurrency

24

96

Latency

2

0.5

Throughput

12 queries per second

192 queries per second

A typical analytic Vertica query requires dense computation on data across all the nodes in the cluster. You can improve the performance by having all the nodes involved in the planning and the execution phases of the queries. The following graphic shows an unsegmented query directed to multiple nodes. The query sends the query plan to other nodes to compute the query and sends the results back to the client. Unsegmenting the query and computing it on multiple nodes increases the execution time.

The Routable Query API executes the query on a single node that contains the data for high-volume queries. The table must be segmented by at least one column. You can also segment the tables by multiple columns. This segmentation allows the client to directly connect to that node and fulfill requests. The following graphic shows a segmented query directed to a specific node. The node that has specific data computes the query and sends the results back to the client. The segmented query reduces computation due to column segmentation.

The JDBC Routable Query Feature Support

Since Vertica Server 7.0 SP1, the Routable Query API has supported flex tables and live aggregate projections. However, Vertica does not support the addOutputExpression() method when querying the flex tables.

The Routable Query API supports the following data types.

Supported Not Supported

BOOLEAN

CHAR

DATE

DOUBLE FLOAT

DOUBLE REAL

LONG (BIGINT)

LONG VARCHAR

NUMERIC

TIME

TIMESTAMP

VARCHAR

BINARY

INTERVALS

LONG VARBINARY

TIMESTAMPTZ

TIMETZ

VARBINARY

Creating Tables and Projections for the JDBC Routable Query API

For routable queries, the client should determine the appropriate node to fetch the data. To do so, the client compares all the projections available for the table and determines the best projection to find the single node that contains the data.

To use the JDBC Routable Query API, you must:

  • Segment the projection by hash for the routable queries.
  • Define a segmentation key using the significant columns on at least one table.

You can create new tables and projections to segment the data or use the existing projections with the segmented data. Be sure to verify the segmented data in existing projections before using them.

Creating New Tables for the JDBC Routable Query API

To create a table for use with the Routable Query API, segment by hash the table on a uniformly distributed column. You can accelerate your search results by sorting the projection on the segmented columns, as shown in the following code snippet.

=> CREATE TABLE s1.table1 (
   customer_key INT NOT NULL,
   date_key INT,
   username CHAR(100)
   SEGMENTED BY HASH(customer_key)
   ALL NODES;

This table is segmented based on the customer_key column and sorted by the customer_key column to accelerate the search results. You are not limited to segmenting with one column. Instead, you can segment by multiple columns.

Creating New Projections for the JDBC Routable Query API

To create a projection for use with the Routable Query API, segment by hash the projection on a uniformly distributed column. You can accelerate your search results by sorting the projection on the segmented columns, as shown in the following code snippet.

=> CREATE TABLE s1.table1 (
   customer_key INT NOT NULL,
   date_key INT,
   username CHAR(100)
   ;

=> CREATE PROJECTION s1.table1_p1 AS
   (SELECT * FROM s1.table1)
   SEGMENTED BY HASH(customer_key)
   ALL NODES;

This projection is segmented based on the customer_key column and sorted by the customer_key column to accelerate the search results.

Can I Use Existing Projections?

If you have existing tables that are segmented by hash (for example, on a customer_key column), you can determine what predicates are needed to query the table. To do so, use the SELECT GET_TABLE_PROJECTIONS('tablename') command to view the projections associated with the table. You can then verify the existing projections using either of these functions:

  • GET_TABLE_PROJECTIONS
  • EXPORT_OBJECTS

If you use the GET_TABLE_PROJECTIONS function, the output consists of the segmented column as Seg Cols. If you use the EXPORT_OBJECT function, the output consists of the segmented column as SEGMENTED BY HASH. Both Segs Cols and SEGMENTED BY HASH are the same.

Using the GET_TABLE_PROJECTIONS Function

The GET_TABLE_PROJECTIONS function returns information relevant to the status of a table. To verify the segmented column, check Seg Cols in the table, as shown in the following code snippet.

=> SELECT GET_TABLE_PROJECTIONS('s1.table1');

   GET_TABLE_PROJECTIONS 
   Current system K is 1.
   # of Nodes: 4. 
   Table s1.table1 has 2 projections. 

   Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats] 
   s1.table1_b1 [Segmented: Yes] [Seg Cols: "s1.table1.customer_key"] [K: 1] [s1.table1_b0] [Safe: Yes] [UptoDate: Yes] [Stats: No] 
   s1.table1_b0 [Segmented: Yes] [Seg Cols: "s1.table1.customer_key"] [K: 1] [s1.table1_b1] [Safe: Yes] [UptoDate: Yes] [Stats: No]

Using the EXPORT_OBJECTS Function

The EXPORT_OBJECTS function generates a SQL script that you can use to recreate the catalog objects on a different cluster. The EXPORT_OBJECTS function always attempts to recreate the projection statements with KSAFE clauses, if they existed in the original definitions. To verify the segmented column, check the SEGMENTED BY clause of projection DDLs, as shown in the following code snippet.

=> SELECT EXPORT_OBJECTS('','s1.table1');
   EXPORT_OBJECTS
   CREATE TABLE s1.table1
   (
      customer_key INT NOT NULL,
      date_key INT,
      username CHAR(100))
   );
   CREATE PROJECTION s1.table1 /*+createtype(P)*/
   (
      customer_key,
      date_key,
      username
   )
   AS
     SELECT table1.customer_key,
            table1.date_key,
            table1.username
     FROM s1.table1
     ORDER BY table1.customer_key,
              table1.date_key,
              table1.username
   SEGMENTED BY HASH(table1.customer_key) ALL NODES KSAFE 1;
   SELECT MARK_DESIGN_KSAFE(1);

Creating a Connection for the JDBC Routable Query API

To interact with Vertica, the Java application must create a connection. Connecting to Vertica using JDBC is similar to connecting to most other databases. The Routable Query API provides the VerticaRoutableConnection interface with advanced routing capabilities to connect to a cluster. You can create a routable connection using one of the following:

  • A JDBC data source
  • The DriverManager API

Both methods result in an identical connection object conn.

Using a JDBC Data Source

You enable access to the VerticaRoutableConnection interface by setting the EnableRoutableQueries JDBC connection property to true, as shown in the following code snippet.

com.vertica.jdbc.DataSource jdbcSettings = new com.vertica.jdbc.DataSource();
jdbcSettings.setDatabase("dbname");
jdbcSettings.setHost("ipaddress");
jdbcSettings.setUserID(“username");
jdbcSettings.setPassword(“password");
jdbcSettings.setEnableRoutableQueries(true);
jdbcSettings.setPort(5433);

VerticaRoutableConnection conn = (VerticaRoutableConnection) 
jdbcSettings.getConnection();

Using the DriverManager API

You enable access to the VerticaRoutableConnection interface by setting the EnableRoutableQueries JDBC connection property to true, as shown in the following code snippet.

VerticaRoutableConnection conn = (VerticaRoutableConnection) 
DriverManager.getConnection("jdbc:vertica://ipaddress:5433/dbname?user=username&password=password&EnableRoutableQueries=true");

PooledConnection Interface

The Routable Query API adds the PooledConnection interface settings to the Vertica JDBC connection class. You may need to change the PooledConnection interface settings because the defaults are restrictive. MaxPooledConnections and MaxPooledConnectionsPerNode are a subset of the PooledConnection interface:

  • MaxPooledConnections: Cluster-wide maximum number of connections to keep in the internal pool stored in VerticaRoutableConnection. The default is 20.
jdbcSettings.setMaxPooledConnections(200);
  • MaxPooledConnectionsPerNode: Per-node maximum number of connections to keep in the internal pool stored in VerticaRoutableConnection. The default is 5.
jdbcSettings.setMaxPooledConnectionsPerNode(24);

Creating a Query for the JDBC Routable Query API

The VerticaRoutableConnection interface provides access to the VerticaRoutableExecutor interface and the VGet interface. The Routable Query API provides two interfaces for performing routable queries:

  • The VerticaRoutableExecutor interface provides a powerful SQL-based API that adds flexibility to the API.
  • The VGet interface provides a more structured API for programmatic access.

Both the VerticaRoutableExecutor interface and the VGet interface directly query the Vertica nodes that have all the data needed for the query. Thus, you can avoid the distributed planning and the execution costs associated with a normal Vertica execution.

Using the VGet Interface

You can access the table data directly from a single node when you do not need to join the tables or use a GROUP BY clause. Using the VGet interface helps you perform key/value lookups on single tables. The VGet interface does not use traditional SQL syntax. Instead, the VGet interface allows you to build a data structure by defining the predicates and the outputs.

VGet get = conn.prepareGet( “s1”/*schema*/, “table1” /*table or projection*/);
get.addOutputColumn(“customer_key");
get.addOutputColumn(“date_key");
get.addOutputColumn(“username");
get.addPredicate(“customer_key", 1);
get.setLimit(1);
ResultSet rs = get.execute();

Using the VerticaRoutableExecutor Interface

If you need to join tables or use a GROUP BY clause, use the VerticaRoutableExecutor interface. This interface uses traditional SQL with a reduced feature set to query data on a single node.

String schema = "s1";
String table = "table1";
VerticaRoutableExecutor q = conn.createRoutableExecutor(schema, table);
String column = "customer_key";
Integer value = 1;
Map<String, Integer> map = new HashMap<String, Integer>();
map.put(column, value);
String query = "select customer_key, date_key, username ";
      query += " from s1.table1";
      query += " where a = " + map.get("a") ;
ResultSet rs = q.execute(query, column, value);

Verify That the API Runs on a Single Node

Set the setFailOnMultiNodePlans to true to check if the Routable Query API works only on one node. If the parameter is set to true, the query fails if it does not use the Routable Query API.

jdbcSettings.setFailOnMultiNodePlans(true);

Alternatively, you can check the query logs. If a query uses the Routable Query API, the query logs show /* +KV */ as an identifier.

Managing Resources for the JDBC Routable Query API

The resources you allocate to a query determine the performance of that query. Vertica resource management helps you allocate resources to best suit your business needs. To optimize the performance of short queries, create a dedicated resource pool. Different parameters specified in the resource pool help you optimize the resource pool statement.

Create a resource pool for a user as follows. This procedure assumes you have created a user named vmart_user:

  1. Create a dedicated resource pool for the user running the query. Set the EXECUTIONPARALLELISM parameter to 1 to force single-threaded queries. This setting improves the routable query performance.
=> CREATE RESOURCE POOL vmart_pool executionparallelism 1;
  1. Grant your user permission to access the resource pool.
=> GRANT USAGE ON RESOURCE POOL vmart_pool TO vmart_user;     
  1. Associate the user to the resource pool.
=> ALTER USER vmart_user RESOURCE POOL vmart_pool;

Performance Test Results

Performance testing on the JDBC Routable Query API used the following configurations:

Component Conditions

Server Specification

HPE ProLiant DL380 Gen9

CPU: Intel XeonE5-2690v3 (2.60GHz, 2P/24C, 64bit)

Memory: 256GB

Disk: OS:600GBx2(RAID1) Data&colon;600GBx14(RAID10)

Operating System

RedHat Enterprise Linux6 Update5

Vertica Database

Vertica Analytic Database 7.1.1-10

Vertica JDBC Driver

Vertica JDBC Driver 7.1.1-3

JDBC Routable Query API Query

SELECT date_key, customer_key, transaction_time, product_key FROM store.store_sales_fact WHERE date_key=?;

JDBC Connection Settings

setMaxPooledConnections = 200

setMaxPooledConnectionsPerNode = 24

Resource Pool Settings

EXECUTIONPARALLELISM = 1

Data Set Size

359 MB

Average Number of Rows per date_key Column

2738

Testing occured for 8 nodes, followed by 4 nodes, 2 nodes, and 1 node. The following graph shows the performance results for 1 node to 8 nodes using the Routable Query API.

Performance Testing.png

Important recommendations:

  • Configuring your servers and components that best suit your business needs to achieve optimal performance.
  • Adding nodes to achieve higher throughput.

For More Information

See Routing JDBC Queries Directly to a Single Node in the Vertica documentation.