Managing the Tuple Mover
The Tuple Mover is preconfigured to handle typical workloads. However, some situations can require you to adjust Tuple Mover behavior. You can do so in various ways:
The Tuple Mover draws its resources from the
TM resource pool. Adding more resources (RAM) to this pool, and changing its concurrency setting, can make the Tuple Mover more effective in dealing with high load rates.
TM resource pool concurrency setting is determined by subtracting one from the value of
MAXCONCURRENCY. This calculates the number of merges that can occur simultaneously through multiple threads. As a side effect of the concurrency setting, the Tuple Mover dedicates some threads to aggressively address small ROS containers, while other threads are reserved to work only on merges of ROS containers in the lower strata.
For the TM pool,
PLANNEDCONCURRENCY must be proportional to the size of the RAM, the CPU, and the storage subsystem. Depending on the storage type, if you increase
PLANNEDCONCURRENCY for the Tuple Mover threads, you might create a storage I/O bottleneck. Monitor the storage subsystem; if it becomes saturated with long I/O queues, more than two I/O queues, and long latency in read and write, adjust the
PLANNEDCONCURRENCY parameter to keep the storage subsystem resources below saturation level. In addition, you might need to:
- Partition storage data files
- Adjust block-size optimization on storage subsystems such as RAID 5 or RAID 10
- Identify the optimal number of disks in the RAID array
The following statement illustrates how to increase the size of the
TM resource pool and set the concurrency settings for the pool:
=> ALTER RESOURCE POOL tm MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
WOSDATA resource pool settings also indirectly affect the Tuple Mover. In automatic mode,
COPY commands use the concurrency setting to determine whether data is small enough to store in WOS or if it should be written to ROS. Therefore, set this value to be the number of concurrent loads you expect to perform in your database. The
WOSDATA resource pool also determines how much RAM the WOS can use.
=> ALTER RESOURCE POOL wosdata MAXMEMORYSIZE '4G' PLANNEDCONCURRENCY 3;
By default, Vertica automatically decides whether the data should be placed in WOS or stored directly in ROS containers based on the amount of data processed by a
INSERT command. Vertica stores large loads directly to disk and stores smaller loads in memory, which it later moves to disk.
For low-latency access to data, use small loads. The automatic Tuple Mover settings are the best option for handling such smaller loads. One exception is for single-node deployments, where a system failure would cause in-memory data to be lost. In this case, you might want to force all data loads to go directly to disk.
For high load rates, you might want the Tuple Mover to check for jobs more frequently by changing the
MoveOutInterval configuration parameters. Reduce the
MoveOutInterval if you expect the peak load rate to fill the WOS quickly. Reduce
MergeOutInterval if you anticipate performing many DIRECT loads or inserts.
In some cases, you might wish to control how Vertica loads data into ROS or WOS. Vertica provides various options for controlling how it loads data for specific tables and DML operations. This can also affect Tuple Mover behavior. For more information, see Choosing a Load Method.
If your database is receiving a large volume of data to load or if it is performing many DIRECT loads or inserts, consider allowing the Tuple Mover to perform more operations concurrently by increasing the
TM resource pool until it can keep up with anticipated peak loads. For example:
=> ALTER RESOURCE POOL TM MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
By default, the Tuple Mover assumes that all loads and updates for partitioned tables are going to the same active partition. For example, if a table is partitioned by month, the Tuple Mover expects that after the start of a new month, no data is loaded into the partition for the prior month.
If loads and updates occur to more than one partition, set the
ActivePartitionCount parameter to reflect the number of partitions that will be loading data. For example, if your database receives data for the current month as well as updates to the prior month, set
ActivePartitionCount to 2. For tables partitioned by non-temporal attributes, set
ActivePartitionCount to reflect the number of partitions that will be loaded simultaneously.
For more information, see Managing Partitions.