Managing the Tuple Mover
The Tuple Mover is preconfigured to handle typical workloads. However, some situations might require you to adjust Tuple Mover behavior. You can do so in various ways:
- Configure resource pools: TM and WOSDATA
- Manage data loading
- Add threads
- Manage active data partitions
The Tuple Mover has its own
TM resource pool. You can increase the Tuple Mover's ability to cope with high load rates by adjusting several configuration parameters:
Specifies how much memory is allocated to the TM pool per node. By default, this parameter is set to 5% of available memory. If
MEMORYSIZE of the GENERAL resource pool is also set to a percentage, the TM pool can compete with it for memory.
MEMORYSIZE to a large percentage can cause regressions in memory-sensitive queries that run in the GENERAL pool.
Sets across all nodes the maximum number of concurrent execution slots available to TM pool, by default 3. This specifies the maximum number of merges that can occur simultaneously on multiple threads.
Specifies the preferred number queries to execute concurrently in the resource pool, across all nodes. The Resource Manage uses
PLANNEDCONCURRENCY to calculate the target memory that is available to a given query can use:
TM-memory-size / PLANNEDCONCURRENCY
The setting for
PLANNEDCONCURRENCY must be proportional to the size of RAM, the CPU, and the storage subsystem. Depending on the storage type, increasing
PLANNEDCONCURRENCY for the Tuple Mover threads might create a storage I/O bottleneck. Monitor the storage subsystem; if it becomes saturated with long I/O queues, more than two I/O queues, and long latency in read and write, adjust the
PLANNEDCONCURRENCY parameter to keep the storage subsystem resources below saturation level. You might also need to:
- Partition storage data files
- Adjust block-size optimization on storage subsystems such as RAID 5 or RAID 10
- Identify the optimal number of disks in the RAID array
The following statement illustrates how to increase the size of the
TM resource pool and set the concurrency settings for the pool:
=> ALTER RESOURCE POOL tm MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
WOSDATA resource pool settings also indirectly affect the Tuple Mover.
COPY queries use this resource pool's
PLANNEDCONCURRENCY setting to determine whether data is small enough to store in WOS, or should be written directly to ROS. Therefore, set this value to be the number of concurrent loads you expect to perform in your database. The
WOSDATA resource pool also determines how much RAM is available to WOS.
=> ALTER RESOURCE POOL wosdata MAXMEMORYSIZE '4G' PLANNEDCONCURRENCY 3;
By default, Vertica automatically decides whether to place data in WOS or load it directly to ROS containers, based on the amount of data processed by a DML statement:
COPY FROM VERTICA,
UPDATE. Vertica stores large loads directly to disk and stores smaller loads in memory, which it later moves to disk.
For low-latency access to data, use small loads. The automatic Tuple Mover settings are the best option for handling such smaller loads. One exception is for single-node deployments, where a system failure would cause in-memory data to be lost. In this case, you might want to force all data loads to go directly to disk.
For high load rates, you might want the Tuple Mover to check for jobs more frequently by changing the
MoveOutInterval configuration parameters. Reduce the
MoveOutInterval if you expect peak load rates to fill the WOS quickly.
In some cases, you might want to control how Vertica loads data into ROS or WOS. Vertica provides various options to control how it loads data for specific tables and DML operations. This can also affect Tuple Mover behavior. For details, see Choosing a Load Method.
If your database is receiving a large volume of data to load or if it is performing many DIRECT loads or inserts, consider allowing the Tuple Mover to perform more operations concurrently by increasing the
TM resource pool until it can keep up with anticipated peak loads. For example:
=> ALTER RESOURCE POOL TM MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
The Tuple Mover assumes that all loads and updates to a partitioned table are targeted to one or more partitions that it identifies as active. In general, the partitions with the largest partition keys—typically, the most recently created partitions—are regarded as active. As the partition ages, it commonly transitions to a mostly read-only workload and requires much less activity.
You can specify how many partitions are active for partitioned tables at two levels:
- Configuration parameter
ActivePartitionCountdetermines how many partitions are active for partitioned tables in the database. By default,
ActivePartitionCountis set to 1. The Tuple Mover applies this setting to all tables that do not set their own active partition count.
- Individual tables can supersede
ActivePartitionCountby setting their own active partition count with
For details, see Active and Inactive Partitions.