Tuning the jvm
Resource Pool for Vertica Pulse
Note: You must modify the jvm
resource pool to match the capabilities of your hardware so that Vertica Pulse has adequate resources to perform queries. If a cluster does not have sufficient resources to run an Vertica Pulse query, then such a query can fail with an Out Of Memory (OOM) exception.
Vertica Pulse runs as a Java UDx (User Defined eXtension) and uses the jvm
resource pool to define the resources available to run Vertica Pulse queries.
Vertica starts a Java Virtual Machine (JVM) when you perform an Vertica Pulse query. The session from which you issue the query reserves resources for the JVM (across all nodes in the cluster) and it releases the resources when the session ends. You can also explicitly close the JVM attached to the session by using the command SELECT release_jvm_memory();
.
The most critical resource pool settings that affect Vertica Pulse are MAXMEMORYSIZE and PLANNEDCONCURRENCY.
- MAXMEMORYSIZE defines the amount of RAM that a JVM can use. By default MAXMEMORYSIZE is set to either 10% of system memory or 2GB, whichever is smaller.
- PLANNEDCONCURRENCY defines how many JVMs are allowed to run across the cluster and how many Pulse sessions you are able to run cluster-wide. By default, PLANNEDCONCURRENCY is set to AUTO, which is the lower of either the number of cores on the node, or memory / 2GB, but it is never automatically set to less than "4".
The amount of memory that each JVM is allocated is determined by MAXMEMORYSIZE / PLANNEDCONCURRENCY. For example, suppose MAXMEMORYSIZE is set to 8G and PLANNEDCONCURRENCY is set to 2. In this case, only a maximum of 2 sessions can run Vertica Pulse queries and the session JVMs have a maximum memory size of 4GB.
Tip: The basic thing to remember is that PLANNEDCONCURRENCY controls the number of sessions across the entire cluster that can run the sentimentAnalysis() function. If set to 1, then only a single session can run Pulse functions. No other sessions are able to run Pulse or Java UDx functions until the session currently running Pulse functions is closed.
While resource pool settings are based on the resources of a node, they apply across the entire cluster. A session with an Vertica Pulse query reserves the same resources for its JVM on all nodes in the cluster. Therefore, it doesn't matter if the cluster contains 3 nodes or 30 nodes; each node reserves, for example, 4GB of the node's memory for the JVM used by the Vertica Pulse session and PLANNEDCONCURRENCY limits the amount of sessions that can run Pulse cluster-wide. If PLANNEDCONCURRENCY is 1, then only 1 vsql session (or client connection) in the entire cluster can run Pulse.
You can display the current resource pool settings for the jvm
resource pool with the following command:
select name, MAXMEMORYSIZE, PLANNEDCONCURRENCY from V_CATALOG.RESOURCE_POOLS
where name = 'jvm';
Configuring the jvm
Resource Pool for your System
Do not use the default jvm
resource pool settings for Vertica Pulse. You must configure the jvm
resource pool to match your hardware and workload requirements. Specifically, specify PLANNEDCONCURRENCY and MAXMEMORYSIZE to match your hardware.
You may need to experiment to find the optimal settings for your hardware and your specific workloads. As a best practice, allow:
- At least 2GB of memory per session for Vertica Pulse
- At least 25% of the memory available for general Vertica overhead. Essentially, MAXMEMORYSIZE must never exceed 75% of total system memory.
Note: If you are running a lot of queries not in the context of Vertica Pulse, then you should allow for more memory to be available outside of the jvm resource pool.
To configure your system for Vertica Pulse:
-
Determine the number of cores on a node. Your PLANNEDCONCURRENCY setting cannot exceed this value. For example, you can run the following from a shell to determine cores:
cat /proc/cpuinfo | egrep "core id|physical id" | tr -d "\n" | sed s/physical/\\nphysical/g |
grep -v ^$ | sort | uniq | wc -l -
Determine the amount of memory in GB on a node. Your MAXMEMORYSIZE cannot exceed 75% of the total system memory. For example, you can run the following from a shell to determine the Total System Memory in GB for any particular node:
awk /MemTotal/'{printf "%f GB\n", $2/1024/1024}' /proc/meminfo
-
Use the formula MAXMEMORYSIZE / PLANNEDCONCURRENCY to determine how much memory each Vertica Pulse JVM receives. For example, you can use
(.75 * Total System Memory) / PLANNEDCONCURRENCY
if you plan to use most of your RAM for Vertica Pulse. The outcome of the formula must be 2 (which denotes GB) or greater. For example, if you have 8GB of total system memory, and your estimated PLANNEDCONCURRENCY is 3, then the formula results in "2" and is acceptable. However, if you have the same amount of memory and PLANNEDCONCURRENCY is set to 4, then the result of the formula is "1.5", which is below the recommended minimum of 2GB. You can either add more RAM to the system or reduce PLANNEDCONCURRENCY to get the resulting number up to "2". -
Finally, alter the
jvm
resource pool. For example, for a cluster with nodes each having 16GB of memory, and you determine to use up to 75% of the total system memory (0.75 * 16GB = 12GB) for Vertica Pulse, then you can set the resource pool as follows:ALTER RESOURCE POOL jvm MAXMEMORYSIZE '12G' PLANNEDCONCURRENCY 3;
Note: For evaluation purposes on systems with lower memory, set MAXMEMORYSIZE to 75% and PLANNEDCONCURRENCY to 1: ALTER RESOURCE POOL jvm MAXMEMORYSIZE '75%' PLANNEDCONCURRENCY 1;
While these settings are unsupported, they do allow you to run simple Vertica Pulse queries. You may experience Out Of Memory exceptions and slow performance.
For additional details, see: