Managing Workload Resources in an Eon Mode Database

In an Eon Mode database, the primary way that you control workloads is through subclusters. For example, you can create subclusters for specific use cases, such as ETL or query workloads, or you can create subclusters for different groups of users to isolate workloads. Within each subcluster, you can create individual resource pools to optimize resource allocation according to workload. See Managing Subclusters for more information about how Vertica uses subclusters.

Global and Subcluster-specific Resource Pools

You can create global resource pools that are shared among all nodes in the database, or you can create resource pools for a specific subcluster. If you create both, you can override global MEMORYSIZE, MAXMEMORYSIZE, and MAXQUERYMEMORYSIZE settings with subcluster-specific resource pool settings.

The GENERAL pool requires at least 25% of available memory to function properly. If you attempt to set MEMORYSIZE for a user-defined resource pool to more than 75%, Vertica returns an error.

This feature is useful to essentially remove global resource pools that are not needed on the subcluster. For example, the built-in TM resource pool allocates 3 gigabytes of memory on each subcluster by default. Because secondary subclusters do not run Tuple Mover operations, you can adjust the TM MEMORYSIZE allocation to 0% on secondary clusters and reallocate that memory for other workloads. The following query sets the global TM MEMORYSIZE allocation to 0 for the analytics_1 subcluster:

=> ALTER RESOURCE POOL TM FOR SUBCLUSTER analytics MEMORYSIZE '0%';

Additionally, you can create a resource pool with settings that are adequate for most subclusters, and then tailor the settings for specific subclusters as needed.

Optimizing Primary and Secondary Subclusters

Overriding resource pool settings at the subcluster level allows you to isolate built-in and user-defined resource pools and optimize them by workload. Primary subclusters perform ETL tasks and execute DDL statements that alter the database, so they are less efficient at executing queries than secondary subclusters. To increase efficiency, you can fine-tune your primary subclusters and allocate more resources for ETL and DDL tasks. Secondary subclusters are optimized for executing queries. In general, you usually configure your secondary subclusters to run one of two workloads: in-depth, long-running queries, and shorter-running "dashboard" queries that you want to finish quickly. After you define the type of queries executed by each subcluster, you can create a subcluster-specific resource pool that is optimized to improve efficiency for that workload.

The following scenario optimizes 3 subclusters by workload:

  • primary_etl: The primary subcluster that you want to optimize for Tuple Mover operations.
  • dashboard: A secondary subcluster that you want to designate for short-running queries executed by a large number of users to refresh a web page.
  • analytics: A secondary subcluster that you want to designate for long-running queries.

See Best Practices for Managing Workload Resources for additional scenarios about resource pool tuning.

Primary subcluster
Tuple Mover operations run only on primary subclusters, so you can alter the MAXCONCURRENCY setting to increase the number of threads available for mergeout operations:

=> ALTER RESOURCE POOL TM MAXCONCURRENCY 10;

See Tuning Tuple Mover Pool Settings for additional information about Tuple Mover resources for primary subclusters.

Secondary Subclusters
Secondary subclusters do not run Tuple Mover operations, so remove the global TM resource pool from your secondary subclusters and make its resources available. The following statements override the TM MEMORYSIZE setting on your secondary subclusters:

=> ALTER RESOURCE POOL TM FOR SUBCLUSTER analytics_1 MEMORYSIZE '0%';
=> ALTER RESOURCE POOL TM FOR SUBCLUSTER dashboard MEMORYSIZE '0%';

To confirm the overrides, query the SUBCLUSTER_RESOURCE_POOL_OVERRIDES table:

=> SELECT pool_oid, name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES;
     pool_oid      | name | subcluster_name | memorysize
-------------------+------+-----------------+------------
 45035996273705058 | tm   | analytics       | 0%
 45035996273705058 | tm   | dashboard       | 0%
(2 rows)

To optimize the dashboard subcluster for short-running queries on a web page, create a dash_pool subcluster-level resource pool that uses 70% of the subcluster's memory. Additionally, increase PLANNEDCONCURRENCY to utilize all of the machine's logical cores, and limit EXECUTIONPARALLELISM to no more than half of the machine's available cores:

=> CREATE RESOURCE POOL dash_pool FOR SUBCLUSTER dashboard
     MEMORYSIZE '70%'
     PLANNEDCONCURRENCY 16
     EXECUTIONPARALLELISM 8;

To optimize the analytics subcluster for long-running queries, create an analytics_pool subcluster-level resource pool that uses 70% of the subcluster's memory. Additionally, set EXECUTIONPARALLELISM to AUTO to utilize all cores available on the node to process a query, and limit PLANNEDCONCURRENCY to no more than 8 concurrent queries:

=> CREATE RESOURCE POOL analytics_pool FOR SUBCLUSTER analytics
      MEMORYSIZE '70%'
      EXECUTIONPARALLELISM AUTO
      PLANNEDCONCURRENCY 8;