Managing the Tuple Mover

The Tuple Mover is preconfigured to handle typical workloads. However, some situations might require you to adjust Tuple Mover behavior. You can do so in various ways:

Configuring TM Resource Pool

The Tuple Mover has its own TM resource pool. You can increase the Tuple Mover's ability to cope with high load rates by adjusting several configuration parameters:

MEMORYSIZE
Specifies how much memory is allocated to the TM pool per node. By default, this parameter is set to 5% of available memory. If MEMORYSIZE of the GENERAL resource pool is also set to a percentage, the TM pool can compete with it for memory.

Increasing TM MEMORYSIZE to a large percentage can cause regressions in memory-sensitive queries that run in the GENERAL pool.

MAXCONCURRENCY
Sets across all nodes the maximum number of concurrent execution slots available to TM pool. In databases created in Vertica version 9.3 and later, this value defaults to 7. In databases created in earlier versions it defaults to 3. This setting specifies the maximum number of merges that can occur simultaneously on multiple threads.

PLANNEDCONCURRENCY
Specifies the preferred number queries to execute concurrently in the resource pool, across all nodes. The Resource Manage uses PLANNEDCONCURRENCY to calculate the target memory that is available to a given query can use:

TM-memory-size / PLANNEDCONCURRENCY

The setting for PLANNEDCONCURRENCY must be proportional to the size of RAM, the CPU, and the storage subsystem. Depending on the storage type, increasing PLANNEDCONCURRENCY for the Tuple Mover threads might create a storage I/O bottleneck. Monitor the storage subsystem; if it becomes saturated with long I/O queues, more than two I/O queues, and long latency in read and write, adjust the PLANNEDCONCURRENCY parameter to keep the storage subsystem resources below saturation level. This setting has a default value of 6.

You might also need to:

  • Partition storage data files
  • Adjust block-size optimization on storage subsystems such as RAID 5 or RAID 10
  • Identify the optimal number of disks in the RAID array

The following statement illustrates how to increase the size of the TM resource pool and set the concurrency settings for the pool:

=> ALTER RESOURCE POOL tm MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;

Configuring WOSDATA Resource Pool

In new databases created in Vertica version 9.3 and later, the WOSDATA resource pool has a default value of 0. As a result, Vertica writes directly to ROS at all times.

In databases created in earlier versions of Vertica, WOSDATA resource pool settings also indirectly affect the Tuple Mover. INSERT and COPY queries use this resource pool's PLANNEDCONCURRENCY setting to determine whether data is small enough to store in WOS, or should be written directly to ROS. Therefore, set this value to be the number of concurrent loads you expect to perform in your database. The WOSDATA resource pool also determines how much RAM is available to WOS.

=> ALTER RESOURCE POOL wosdata MAXMEMORYSIZE '4G' PLANNEDCONCURRENCY 3;

Managing Data Loads

By default, Vertica automatically loads data directly to ROS containers, regardless of the amount of data processed by a DML statement: COPY/COPY FROM VERTICA, INSERT, MERGE, and UPDATE. Vertica stores large loads directly to disk and stores smaller loads in memory, which it later moves to disk.

For low-latency access to data, you can choose to use small loads. The automatic Tuple Mover settings are the best option for handling such smaller loads. One exception is for single-node deployments, where a system failure would cause in-memory data to be lost. In this case, you might want to force all data loads to go directly to disk.

For high load rates, you might want the Tuple Mover to check for jobs more frequently by changing the MoveOutInterval configuration parameters. Reduce the MoveOutInterval if you expect peak load rates to fill the WOS quickly.

In some cases, you might want to control how Vertica loads data into ROS or WOS. Vertica provides various options to control how it loads data for specific tables and DML operations. This can also affect Tuple Mover behavior. For details, see Choosing a Load Method.

Adding Threads

If your database is receiving a large volume of data to load or if it is performing many DIRECT loads or inserts, consider allowing the Tuple Mover to perform more operations concurrently by increasing the TM resource pool until it can keep up with anticipated peak loads. For example:

=> ALTER RESOURCE POOL TM MEMORYSIZE '4G' PLANNEDCONCURRENCY 4 MAXCONCURRENCY 5;

See ALTER RESOURCE POOL and Built-In Pools in the SQL Reference Manual.

Managing Active Data Partitions

The Tuple Mover assumes that all loads and updates to a partitioned table are targeted to one or more partitions that it identifies as active. In general, the partitions with the largest partition keys—typically, the most recently created partitions—are regarded as active. As the partition ages, it commonly transitions to a mostly read-only workload and requires much less activity.

You can specify how many partitions are active for partitioned tables at two levels:

  • Configuration parameter ActivePartitionCount determines how many partitions are active for partitioned tables in the database. By default, ActivePartitionCount is set to 1. The Tuple Mover applies this setting to all tables that do not set their own active partition count.
  • Individual tables can supersede ActivePartitionCount by setting their own active partition count with CREATE TABLE and ALTER TABLE.

For details, see Active and Inactive Partitions.