Managing the Tuple Mover

The Tuple Mover is preconfigured to handle typical workloads. However, some situations can require you to adjust Tuple Mover behavior. You can do so in various ways:

Configure resource pools
Manage data loading
Add threads
Manage active data partitions

Configuring Resource Pools

The Tuple Mover draws its resources from the TM resource pool. Adding more resources (RAM) to this pool, and changing its concurrency setting, can make the Tuple Mover more effective in dealing with high load rates.

The TM resource pool concurrency setting is determined by subtracting one from the value of MAXCONCURRENCY. This calculates the number of merges that can occur simultaneously through multiple threads. As a side effect of the concurrency setting, the Tuple Mover dedicates some threads to aggressively address small ROS containers, while other threads are reserved to work only on merges of ROS containers in the lower strata.

For the TM pool, PLANNEDCONCURRENCY must be proportional to the size of the RAM, the CPU, and the storage subsystem. Depending on the storage type, if you increase PLANNEDCONCURRENCY for the Tuple Mover threads, you might create a storage I/O bottleneck. Monitor the storage subsystem; if it becomes saturated with long I/O queues, more than two I/O queues, and long latency in read and write, adjust the PLANNEDCONCURRENCY parameter to keep the storage subsystem resources below saturation level. In addition, you might need to:

Partition storage data files
Adjust block-size optimization on storage subsystems such as RAID 5 or RAID 10
Identify the optimal number of disks in the RAID array

The following statement illustrates how to increase the size of the TM resource pool and set the concurrency settings for the pool:

=> ALTER RESOURCE POOL tm MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;

The WOSDATA resource pool settings also indirectly affect the Tuple Mover. In automatic mode, INSERT and COPY commands use the concurrency setting to determine whether data is small enough to store in WOS or if it should be written to ROS. Therefore, set this value to be the number of concurrent loads you expect to perform in your database. The WOSDATA resource pool also determines how much RAM the WOS can use.

=> ALTER RESOURCE POOL wosdata MAXMEMORYSIZE '4G' PLANNEDCONCURRENCY 3;

See Managing Workloads and Resource Pool Architecture in this guide and ALTER RESOURCE POOL and Built-In Pools in the SQL Reference Manual.

Managing Data Loads

By default, Vertica automatically decides whether the data should be placed in WOS or stored directly in ROS containers based on the amount of data processed by a COPY or INSERT command. Vertica stores large loads directly to disk and stores smaller loads in memory, which it later moves to disk.

For low-latency access to data, use small loads. The automatic Tuple Mover settings are the best option for handling such smaller loads. One exception is for single-node deployments, where a system failure would cause in-memory data to be lost. In this case, you might want to force all data loads to go directly to disk.

For high load rates, you might want the Tuple Mover to check for jobs more frequently by changing the MergeOutInterval and MoveOutInterval configuration parameters. Reduce the MoveOutInterval if you expect the peak load rate to fill the WOS quickly. Reduce MergeOutInterval if you anticipate performing many DIRECT loads or inserts.

In some cases, you might wish to control how Vertica loads data into ROS or WOS. Vertica provides various options for controlling how it loads data for specific tables and DML operations. This can also affect Tuple Mover behavior. For more information, see Choosing a Load Method.

Adding Threads

If your database is receiving a large volume of data to load or if it is performing many DIRECT loads or inserts, consider allowing the Tuple Mover to perform more operations concurrently by increasing the TM resource pool until it can keep up with anticipated peak loads. For example:

=> ALTER RESOURCE POOL TM MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;

See ALTER RESOURCE POOL and Built-In Pools in the SQL Reference Manual.

Managing Active Data Partitions

By default, the Tuple Mover assumes that all loads and updates for partitioned tables are going to the same active partition. For example, if a table is partitioned by month, the Tuple Mover expects that after the start of a new month, no data is loaded into the partition for the prior month.

If loads and updates occur to more than one partition, set the ActivePartitionCount parameter to reflect the number of partitions that will be loading data. For example, if your database receives data for the current month as well as updates to the prior month, set ActivePartitionCount to 2. For tables partitioned by non-temporal attributes, set ActivePartitionCount to reflect the number of partitions that will be loaded simultaneously.

For more information, see Managing Partitions.

Managing the Tuple Mover

Configuring Resource Pools

Managing Data Loads

Adding Threads

Managing Active Data Partitions

See Also