Tuning Linux Dirty Data Parameters for Vertica

Flushing Dirty Data

When flushing dirty data in the cache to disk, by default, Linux allows 120 seconds for the flush to take place.  During this time, your Vertica process may appear to have hung.

If the disk is overloaded, 120 seconds is not sufficient and you see this error in the dmesg or var/log/messages output:

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
vertica       D 0000000000000002     0 60068      1 0x00000080
ffff88024d9e7c98 0000000000000082 ffffffff81ed09f0 ffff882012336ae0
ffff88024d9e7c68 ffffffff810aa570 ffff88024d9e7ca0 ffff882012336ae0
ffff882012337098 ffff88024d9e7fd8 000000000000fb88 ffff882012337098 Call Trace:
2401  [<ffffffff810aa570>] ? exit_robust_list+0x90/0x160
2402  [<ffffffff81072f95>] exit_mm+0x95/0x180
2403  [<ffffffff810733df>] do_exit+0x15f/0x870
2404  [<ffffffff81063340>] ? wake_up_state+0x10/0x20
2405  [<ffffffff81073b48>] do_group_exit+0x58/0xd0
2406  [<ffffffff81088e16>] get_signal_to_deliver+0x1f6/0x460
2407  [<ffffffff8100a265>] do_signal+0x75/0x800
2408  [<ffffffff81435b25>] ? sys_sendto+0x185/0x190
2409  [<ffffffff8100bbee>] ? invalidate_interrupt1+0xe/0x20
2410  [<ffffffff8100bc2e>] ? invalidate_interrupt3+0xe/0x20
2411  [<ffffffff8100bbce>] ? invalidate_interrupt0+0xe/0x20
2412  [<ffffffff810ace0b>] ? sys_futex+0x7b/0x170
2413  [<ffffffff8100aa80>] do_notify_resume+0x90/0xc0
2414  [<ffffffff8100b341>] int_signal+0x12/0x17

Root Cause

The root cause of this problem is an imbalance in processing power. In Vertica, the memory and processing power (CPU) may be too high and the disk throughput may be comparatively too low to keep up. All Vertica nodes must have an appropriate balance of CPU, RAM, disk throughput and network bandwidth.

This error usually occurs on a system that

  • Has a large amount of memory and CPU processing power relative to the amount of disk throughput
  • Has low disk throughput (for example, virtual machines or machines using storage area networks (SANs) where disk throughput is limited)

Solution

To prevent this problem, consider the following tunings. These tunings are valid for all versions of Vertica:

  1. Reduce vm.dirty_ratio to reduce the maximum amount of memory allocated to dirty pages.

If you have 400 MB/sec write throughput, you can write 6 GB of data in 15 seconds. To calculate the vm.dirty_ratio, divide the amount of data you can transfer in 15 seconds by the amount of RAM.  For example:

- For a 256 GB system:6/256 = 2.5%, set vm.dirty_ratio = 3 (approx.)

- For a 128 GB system: 6/128 = 4.6%, set vm.dirty_ratio to 5 (approx.)

If you double the disk throughput, say from 400MB/sec to 800MB/sec, you should also double vm.dirty_ratio.

  1. Reduce the vm.dirty_background_ratio value to flush the cache to disk more frequently. In particular, set vm.dirty_background_ratio to half the value of vm.dirty_ratio.That way, when the cache is half full, the task of flushing to disk starts. For example, if vm.dirty_ratio=6, set vm.dirty_background_ratio to 3.

The vm.dirty_background_ratio parameter specifies a percentage of system memory occupied by dirty data. When this percentage is reached, Linux begins to flush the buffer. The vm.dirty_background_ratio depends on the available disk throughput and is inversely related to the total memory on the system.

  1. To prevent a kernel panic while Vertica is paused during the flush, set hung_task_panic to 0. This allows the process unlimited time to flush the data instead of shutting down the node.

For More Information

See Linux Kernel panic issue: How to fix hung_task_timeout_secs and blocked for more than 120 seconds pr....