Tuning Linux Dirty Data Parameters for Vertica
Flushing Dirty Data
When flushing dirty data in the cache to disk, by default, Linux allows 120 seconds for the flush to take place. During this time, your Vertica process may appear to have hung.
If the disk is overloaded, 120 seconds is not sufficient and you see this error in the dmesg
or var/log/messages
output:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message vertica D 0000000000000002 0 60068 1 0x00000080 ffff88024d9e7c98 0000000000000082 ffffffff81ed09f0 ffff882012336ae0 ffff88024d9e7c68 ffffffff810aa570 ffff88024d9e7ca0 ffff882012336ae0 ffff882012337098 ffff88024d9e7fd8 000000000000fb88 ffff882012337098 Call Trace: 2401 [<ffffffff810aa570>] ? exit_robust_list+0x90/0x160 2402 [<ffffffff81072f95>] exit_mm+0x95/0x180 2403 [<ffffffff810733df>] do_exit+0x15f/0x870 2404 [<ffffffff81063340>] ? wake_up_state+0x10/0x20 2405 [<ffffffff81073b48>] do_group_exit+0x58/0xd0 2406 [<ffffffff81088e16>] get_signal_to_deliver+0x1f6/0x460 2407 [<ffffffff8100a265>] do_signal+0x75/0x800 2408 [<ffffffff81435b25>] ? sys_sendto+0x185/0x190 2409 [<ffffffff8100bbee>] ? invalidate_interrupt1+0xe/0x20 2410 [<ffffffff8100bc2e>] ? invalidate_interrupt3+0xe/0x20 2411 [<ffffffff8100bbce>] ? invalidate_interrupt0+0xe/0x20 2412 [<ffffffff810ace0b>] ? sys_futex+0x7b/0x170 2413 [<ffffffff8100aa80>] do_notify_resume+0x90/0xc0 2414 [<ffffffff8100b341>] int_signal+0x12/0x17
Root Cause
The root cause of this problem is an imbalance in processing power. In Vertica, the memory and processing power (CPU) may be too high and the disk throughput may be comparatively too low to keep up. All Vertica nodes must have an appropriate balance of CPU, RAM, disk throughput and network bandwidth.
This error usually occurs on a system that
- Has a large amount of memory and CPU processing power relative to the amount of disk throughput
- Has low disk throughput (for example, virtual machines or machines using storage area networks (SANs) where disk throughput is limited)
Solution
To prevent this problem, consider the following tunings. These tunings are valid for all versions of Vertica:
- Reduce
vm.dirty_ratio
to reduce the maximum amount of memory allocated to dirty pages.
If you have 400 MB/sec write throughput, you can write 6 GB of data in 15 seconds. To calculate the vm.dirty_ratio
, divide the amount of data you can transfer in 15 seconds by the amount of RAM. For example:
- For a 256 GB system:6/256 = 2.5%, set vm.dirty_ratio
= 3 (approx.)
- For a 128 GB system: 6/128 = 4.6%, set vm.dirty_ratio
to 5 (approx.)
If you double the disk throughput, say from 400MB/sec to 800MB/sec, you should also double vm.dirty_ratio
.
- Reduce the
vm.dirty_background_ratio
value to flush the cache to disk more frequently. In particular, setvm.dirty_background_ratio
to half the value ofvm.dirty_ratio
.That way, when the cache is half full, the task of flushing to disk starts. For example, ifvm.dirty_ratio
=6, setvm.dirty_background_ratio
to 3.
The vm.dirty_background_ratio
parameter specifies a percentage of system memory occupied by dirty data. When this percentage is reached, Linux begins to flush the buffer. The vm.dirty_background_ratio
depends on the available disk throughput and is inversely related to the total memory on the system.
- To prevent a kernel panic while Vertica is paused during the flush, set
hung_task_panic
to 0. This allows the process unlimited time to flush the data instead of shutting down the node.