Managing Workload Resources in an Eon Mode Database
In an Eon Mode database, the primary way that you control workloads is through subclusters. For example, you can create subclusters for specific use cases, such as ETL or query workloads, or you can create subclusters for different groups of users to isolate workloads. Within each subcluster, you can create individual resource pools to optimize resource allocation according to workload. See Managing Subclusters for more information about how Vertica uses subclusters.
Global and Subcluster-specific Resource Pools
You can create global resource pools that are shared among all nodes in the database, or you can create resource pools for a specific subcluster. If you create both, you can override global MEMORYSIZE
, MAXMEMORYSIZE
, and MAXQUERYMEMORYSIZE
settings with subcluster-specific resource pool settings.
The GENERAL pool requires at least 25% of available memory to function properly. If you attempt to set MEMORYSIZE
for a user-defined resource pool to more than 75%, Vertica returns an error.
This feature is useful to essentially remove global resource pools that are not needed on the subcluster. For example, the built-in TM resource pool allocates 3 gigabytes of memory on each subcluster by default. Because secondary subclusters do not run Tuple Mover operations, you can adjust the TM MEMORYSIZE
allocation to 0%
on secondary clusters and reallocate that memory for other workloads. The following query sets the global TM MEMORYSIZE
allocation to 0
for the analytics_1
subcluster:
=> ALTER RESOURCE POOL TM FOR SUBCLUSTER analytics MEMORYSIZE '0%';
Additionally, you can create a resource pool with settings that are adequate for most subclusters, and then tailor the settings for specific subclusters as needed.
Optimizing Primary and Secondary Subclusters
Overriding resource pool settings at the subcluster level allows you to isolate built-in and user-defined resource pools and optimize them by workload. Primary subclusters perform ETL tasks and execute DDL statements that alter the database, so they are less efficient at executing queries than secondary subclusters. To increase efficiency, you can fine-tune your primary subclusters and allocate more resources for ETL and DDL tasks. Secondary subclusters are optimized for executing queries. In general, you usually configure your secondary subclusters to run one of two workloads: in-depth, long-running queries, and shorter-running "dashboard" queries that you want to finish quickly. After you define the type of queries executed by each subcluster, you can create a subcluster-specific resource pool that is optimized to improve efficiency for that workload.
The following scenario optimizes 3 subclusters by workload:
primary_etl
: The primary subcluster that you want to optimize for Tuple Mover operations.dashboard
: A secondary subcluster that you want to designate for short-running queries executed by a large number of users to refresh a web page.analytics
: A secondary subcluster that you want to designate for long-running queries.
See Best Practices for Managing Workload Resources for additional scenarios about resource pool tuning.
Primary subcluster
Tuple Mover operations run only on primary subclusters, so you can alter the MAXCONCURRENCY
setting to increase the number of threads available for mergeout operations:
=> ALTER RESOURCE POOL TM MAXCONCURRENCY 10;
See Tuning Tuple Mover Pool Settings for additional information about Tuple Mover resources for primary subclusters.
Secondary Subclusters
Secondary subclusters do not run Tuple Mover operations, so remove the global TM resource pool from your secondary subclusters and make its resources available. The following statements override the TM MEMORYSIZE
setting on your secondary subclusters:
=> ALTER RESOURCE POOL TM FOR SUBCLUSTER analytics_1 MEMORYSIZE '0%'; => ALTER RESOURCE POOL TM FOR SUBCLUSTER dashboard MEMORYSIZE '0%';
To confirm the overrides, query the SUBCLUSTER_RESOURCE_POOL_OVERRIDES table:
=> SELECT pool_oid, name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES; pool_oid | name | subcluster_name | memorysize -------------------+------+-----------------+------------ 45035996273705058 | tm | analytics | 0% 45035996273705058 | tm | dashboard | 0% (2 rows)
To optimize the dashboard
subcluster for short-running queries on a web page, create a dash_pool
subcluster-level resource pool that uses 70% of the subcluster's memory. Additionally, increase PLANNEDCONCURRENCY
to utilize all of the machine's logical cores, and limit EXECUTIONPARALLELISM
to no more than half of the machine's available cores:
=> CREATE RESOURCE POOL dash_pool FOR SUBCLUSTER dashboard MEMORYSIZE '70%' PLANNEDCONCURRENCY 16 EXECUTIONPARALLELISM 8;
To optimize the analytics
subcluster for long-running queries, create an analytics_pool
subcluster-level resource pool that uses 70% of the subcluster's memory. Additionally, set EXECUTIONPARALLELISM
to AUTO
to utilize all cores available on the node to process a query, and limit PLANNEDCONCURRENCY
to no more than 8
concurrent queries:
=> CREATE RESOURCE POOL analytics_pool FOR SUBCLUSTER analytics MEMORYSIZE '70%' EXECUTIONPARALLELISM AUTO PLANNEDCONCURRENCY 8;