Managing the Tuple Mover
The Tuple Mover is preconfigured to handle typical workloads. However, some situations might require you to adjust Tuple Mover behavior. You can do so in various ways:
- Configure resource pools: TM and WOSDATA
- Manage data loading
- Add threads
- Manage active data partitions
Configuring TM Resource Pool
The Tuple Mover has its own TM
resource pool. You can increase the Tuple Mover's ability to cope with high load rates by adjusting several configuration parameters:
MEMORYSIZE
Specifies how much memory is allocated to the TM pool per node. By default, this parameter is set to 5% of available memory. If MEMORYSIZE
of the GENERAL resource pool is also set to a percentage, the TM pool can compete with it for memory.
Increasing TM MEMORYSIZE
to a large percentage can cause regressions in memory-sensitive queries that run in the GENERAL pool.
MAXCONCURRENCY
Sets across all nodes the maximum number of concurrent execution slots available to TM pool, by default 3.
This specifies the maximum number of merges that can occur simultaneously on multiple threads.
PLANNEDCONCURRENCY
Specifies the preferred number queries to execute concurrently in the resource pool, across all nodes. The Resource Manage uses PLANNEDCONCURRENCY
to calculate the target memory that is available to a given query can use:
TM-memory-size / PLANNEDCONCURRENCY
The setting for PLANNEDCONCURRENCY
must be proportional to the size of RAM, the CPU, and the storage subsystem. Depending on the storage type, increasing PLANNEDCONCURRENCY
for the Tuple Mover threads might create a storage I/O bottleneck. Monitor the storage subsystem; if it becomes saturated with long I/O queues, more than two I/O queues, and long latency in read and write, adjust the PLANNEDCONCURRENCY
parameter to keep the storage subsystem resources below saturation level. You might also need to:
- Partition storage data files
- Adjust block-size optimization on storage subsystems such as RAID 5 or RAID 10
- Identify the optimal number of disks in the RAID array
The following statement illustrates how to increase the size of the TM
resource pool and set the concurrency settings for the pool:
=> ALTER RESOURCE POOL tm MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
Configuring WOSDATA Resource Pool
WOSDATA
resource pool settings also indirectly affect the Tuple Mover. INSERT
and COPY
queries use this resource pool's PLANNEDCONCURRENCY
setting to determine whether data is small enough to store in WOS, or should be written directly to ROS. Therefore, set this value to be the number of concurrent loads you expect to perform in your database. The WOSDATA
resource pool also determines how much RAM is available to WOS.
=> ALTER RESOURCE POOL wosdata MAXMEMORYSIZE '4G' PLANNEDCONCURRENCY 3;
Managing Data Loads
By default, Vertica automatically decides whether to place data in WOS or load it directly to ROS containers, based on the amount of data processed by a DML statement: COPY
/COPY FROM VERTICA
, INSERT
, MERGE
, and UPDATE
. Vertica stores large loads directly to disk and stores smaller loads in memory, which it later moves to disk.
For low-latency access to data, use small loads. The automatic Tuple Mover settings are the best option for handling such smaller loads. One exception is for single-node deployments, where a system failure would cause in-memory data to be lost. In this case, you might want to force all data loads to go directly to disk.
For high load rates, you might want the Tuple Mover to check for jobs more frequently by changing the MoveOutInterval
configuration parameters. Reduce the MoveOutInterval
if you expect peak load rates to fill the WOS quickly.
In some cases, you might want to control how Vertica loads data into ROS or WOS. Vertica provides various options to control how it loads data for specific tables and DML operations. This can also affect Tuple Mover behavior. For details, see Choosing a Load Method.
Adding Threads
If your database is receiving a large volume of data to load or if it is performing many DIRECT loads or inserts, consider allowing the Tuple Mover to perform more operations concurrently by increasing the TM
resource pool until it can keep up with anticipated peak loads. For example:
=> ALTER RESOURCE POOL TM MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
See ALTER RESOURCE POOL
and Built-In Pools in the SQL Reference Manual.
Managing Active Data Partitions
The Tuple Mover assumes that all loads and updates to a partitioned table are targeted to one or more partitions that it identifies as active. In general, the partitions with the largest partition keys—typically, the most recently created partitions—are regarded as active. As the partition ages, it commonly transitions to a mostly read-only workload and requires much less activity.
You can specify how many partitions are active for partitioned tables at two levels:
- Configuration parameter
ActivePartitionCount
determines how many partitions are active for partitioned tables in the database. By default,ActivePartitionCount
is set to 1. The Tuple Mover applies this setting to all tables that do not set their own active partition count. - Individual tables can supersede
ActivePartitionCount
by setting their own active partition count withCREATE TABLE
andALTER TABLE
.
For details, see Active and Inactive Partitions.