Managing the Tuple Mover
The Tuple Mover is preconfigured to handle typical workloads. However, some situations can require you to adjust Tuple Mover behavior. You can do so in various ways:
Configuring Resource Pools
The Tuple Mover draws its resources from the TM
resource pool. Adding more resources (RAM) to this pool, and changing its concurrency setting, can make the Tuple Mover more effective in dealing with high load rates.
The TM
resource pool concurrency setting is determined by subtracting one from the value of MAXCONCURRENCY
. This calculates the number of merges that can occur simultaneously through multiple threads. As a side effect of the concurrency setting, the Tuple Mover dedicates some threads to aggressively address small ROS containers, while other threads are reserved to work only on merges of ROS containers in the lower strata.
For the TM pool, PLANNEDCONCURRENCY
must be proportional to the size of the RAM, the CPU, and the storage subsystem. Depending on the storage type, if you increase PLANNEDCONCURRENCY
for the Tuple Mover threads, you might create a storage I/O bottleneck. Monitor the storage subsystem; if it becomes saturated with long I/O queues, more than two I/O queues, and long latency in read and write, adjust the PLANNEDCONCURRENCY
parameter to keep the storage subsystem resources below saturation level. In addition, you might need to:
- Partition storage data files
- Adjust block-size optimization on storage subsystems such as RAID 5 or RAID 10
- Identify the optimal number of disks in the RAID array
The following statement illustrates how to increase the size of the TM
resource pool and set the concurrency settings for the pool:
=> ALTER RESOURCE POOL tm MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
The WOSDATA
resource pool settings also indirectly affect the Tuple Mover. In automatic mode, INSERT
and COPY
commands use the concurrency setting to determine whether data is small enough to store in WOS or if it should be written to ROS. Therefore, set this value to be the number of concurrent loads you expect to perform in your database. The WOSDATA
resource pool also determines how much RAM the WOS can use.
=> ALTER RESOURCE POOL wosdata MAXMEMORYSIZE '4G' PLANNEDCONCURRENCY 3;
See Managing Workloads and Resource Pool Architecture in this guide and ALTER RESOURCE POOL and Built-In Pools in the SQL Reference Manual.
Managing Data Loads
By default, Vertica automatically decides whether the data should be placed in WOS or stored directly in ROS containers based on the amount of data processed by a COPY
or INSERT
command. Vertica stores large loads directly to disk and stores smaller loads in memory, which it later moves to disk.
For low-latency access to data, use small loads. The automatic Tuple Mover settings are the best option for handling such smaller loads. One exception is for single-node deployments, where a system failure would cause in-memory data to be lost. In this case, you might want to force all data loads to go directly to disk.
For high load rates, you might want the Tuple Mover to check for jobs more frequently by changing the MergeOutInterval
and MoveOutInterval
configuration parameters. Reduce the MoveOutInterval
if you expect the peak load rate to fill the WOS quickly. Reduce MergeOutInterval
if you anticipate performing many DIRECT loads or inserts.
In some cases, you might wish to control how Vertica loads data into ROS or WOS. Vertica provides various options for controlling how it loads data for specific tables and DML operations. This can also affect Tuple Mover behavior. For more information, see Choosing a Load Method.
Adding Threads
If your database is receiving a large volume of data to load or if it is performing many DIRECT loads or inserts, consider allowing the Tuple Mover to perform more operations concurrently by increasing the TM
resource pool until it can keep up with anticipated peak loads. For example:
=> ALTER RESOURCE POOL TM MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;
See ALTER RESOURCE POOL and Built-In Pools in the SQL Reference Manual.
Managing Active Data Partitions
By default, the Tuple Mover assumes that all loads and updates for partitioned tables are going to the same active partition. For example, if a table is partitioned by month, the Tuple Mover expects that after the start of a new month, no data is loaded into the partition for the prior month.
If loads and updates occur to more than one partition, set the ActivePartitionCount
parameter to reflect the number of partitions that will be loading data. For example, if your database receives data for the current month as well as updates to the prior month, set ActivePartitionCount
to 2. For tables partitioned by non-temporal attributes, set ActivePartitionCount
to reflect the number of partitions that will be loaded simultaneously.
For more information, see Managing Partitions.