CPU and Memory Starvation in SPREAD

This blog post was authored by Sumeet Keswani.

What is Spread?

Vertica uses an open source toolkit, Spread, to provide a high-performance control message service. Spread daemons start automatically when your database starts up for the first time. The spread daemons run on control nodes in your cluster. The control nodes manage message communication.

On highly loaded, very busy Vertica nodes, the Spread process might be starved of resources. If Spread is starved of memory, CPU, or network resources, it can result in:

• Nodes being dropped from the cluster
• Vertica being unable to open new sessions
• Existing sessions will not be cleaned up
• Users being unable to run queries

This blog post explains some techniques you can use to dedicate CPU and memory resources to the Spread process and prevent CPU and memory starvation. Any changes you make should be done on all nodes. The settings described in this post are only good if the database instance they were applied to is up. These settings do not persist, so these commands have to be run each time the spread process is restarted.

CPU Resources

If your machine shows a very high load average and if your monitors show a CPU usage above 80%, your machine might be CPU bound. In this case, you can increase the nice priority of the Spread process. Setting the value to a negative priority may also help. A negative nice value means a higher priority.

[@:/]$ uptime
 11:32:17 up  1:42,  4 users,  load average: 45.50, 42.50, 40.52
 
[@:/]$ ps -o pid,command,nice -p $(pgrep spread$)
  PID COMMAND                      NI
19358 /opt/vertica/bin/spread      0
 
 
# give spread the higher priority/highest priority possible -20.
 [@:/]$ sudo renice  -20 -p  $(pgrep spread$)
19358 (process ID) old priority 0, new priority -20
 
 
[@:/]$ ps -o pid,command,nice -p $(pgrep spread$)
  PID COMMAND                      NI
19358 /opt/vertica/bin/spread     -20

You can also dedicate an exclusive CPU to spread and exclude Vertica from using that processor/CPU. If you do this, note that the Vertica process must perform the same work with one less CPU.

Option 1:

# assign CPU 0 to spread
[@:/]$ sudo taskset -pc 0    $(pgrep spread$)
pid 19358's current affinity list: 0-7
pid 19358's new affinity list: 0
 
# assign CPU 1-7 to vertica (rest of the CPUs)
[@:/]$ sudo taskset -pc 1-7    $(pgrep vertica$)
pid 19360's current affinity list: 0-7
pid 19360's new affinity list: 1-7
 
 
[@:/]$ sudo taskset -p    $(pgrep spread$)
pid 19358's current affinity mask: 1
[@:/]$ sudo taskset -p    $(pgrep vertica$)
pid 19360's current affinity mask: fe

Option 2:

You can create a control group to dedicate a CPU to Spread and disable swapping when you do that. The following example assigns a single CPU exclusively to spread and disables swapping:

# create a control group for spread..
sudo cgcreate -t dbadmin:verticadba -g cpu,cpuset,cpuacct,memory:spread
# assign cpu 0 to spread
sudo cgset -r cpuset.cpus=0 spread
sudo cgset -r cpuset.cpu_exclusive=1 spread
# restrict all other process by any user to cpu 1-7
sudo cgset -r cpuset.cpus=1-7 /user
 
# prevent process from swapping
sudo cgset -r memory.swappiness=0 spread
 
 
# allocate 100% of cpu 0 to spread
sudo cgset -r cpu.cfs_period_us=1000000 spread
sudo cgset -r cpu.cfs_quota_us=1000000 spread
 
 
# enable cpu accounting for spread
sudo cgset -r cpuacct.usage=0 spread
 
 
# attach spread process to cgroup
sudo cgclassify -g cpu,memory:spread $(pgrep spread$)
 
 
# check...
cat /proc/$(pgrep spread$)/cgroup
# or make a persistent config file and copy to/etc/cgconfig.conf
sudo touch /etc/cgsnapshot_blacklist.conf && cgsnapshot -s
 
 
# clean up, if you dont like what you did
sudo cgdelete -g cpu,cpuset,memory:spread

The following is a sample control group configuration:

group spread {
    perm {
        admin {
            uid = root;
            gid = root;
        }
        task {
            uid = dbadmin;
            gid = verticadba;
        }
    }
    cpuset {
        cpuset.cpu_exclusive="1";
        cpuset.cpus="0";
    }
    cpu {
        cpu.cfs_period_us="1000000";
        cpu.cfs_quota_us="1000000";
        cpu.shares="1024";
    }
    cpuacct {
        cpuacct.usage="0";
    }
    memory {
        memory.swappiness="0";
    }
}

Option 3:

# alternatively, run all queries in a new pool set to use only CPU 1-7 (for example)
dbadmin=> CREATE resource pool  constraint_pool  CPUAFFINITYSET '1-7' CPUAFFINITYMODE EXCLUSIVE EXECUTIONPARALLELISM 7;
CREATE RESOURCE POOL
dbadmin => SELECT name,cpuaffinityset, cpuaffinitymode, executionparallelism FROM resource_pools WHERE name = 'constraint_pool';
      name       | cpuaffinityset | cpuaffinitymode | executionparallelism
-----------------+----------------+-----------------+----------------------
 constraint_pool | 1-7            | EXCLUSIVE       | 7
(1 row)

Memory Resources

Check if spread is swapping to disk on an already running Spread process using the following command:

grep VmSwap /proc/$(pgrep spread$)/status
VmSwap:        0 kB

Then, disable anything from swapping to disk:

grep swappiness /etc/sysctl.conf
# The kernel will swap only to avoid an out of memory condition, other wise swapping is disabled
vm.swappiness = 0

If Vertica is using most of the memory, resulting in a memory issue, reduce the Vertica general pool memory usage. If you reduce the general pool memory, be aware that it can affect other pools that borrow from it or queries that directly use it:

dbadmin => SELECT name, maxmemorysize FROM resource_pools WHERE name='general';
  name   | maxmemorysize
---------+---------------
 general | Special: 95%
(1 row)
 
 
dbadmin => ALTER resource pool general maxmemorysize '80%';
NOTICE 2585:  Change takes effect upon restart.  Recovering nodes will use the new value
ALTER RESOURCE POOL
 
 
dbadmin => SELECT name, maxmemorysize FROM resource_pools WHERE name='general';
  name   | maxmemorysize
---------+---------------
 general | Special: 80%
(1 row)

About the Author

Soniya Shah
Information Developer

Currently, a first year law student with a background in science and technology. Experienced technical writer, with specializations in software documentation, big data, blog development, and website development. I build user-centered content to communicate complex and technical information more easily.

I used to work for Vertica full time for about 3 years. I still work at Vertica part time while going to law school.

Update: Soniya is now doing her law internship, and no longer working at Vertica. Good luck, Soniya!

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

CPU and Memory Starvation in SPREAD

What is Spread?

CPU Resources

Memory Resources

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

CPU and Memory Starvation in SPREAD

What is Spread?

CPU Resources

Memory Resources

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Best Practices Posts