Share this article:

Spread Debugging

What is Spread?

Vertica uses an open source toolkit, Spread, to provide a high-performance messaging service that is resilient to network failures. Spread daemons start automatically when your database starts up for the first time. The spread daemons run on control nodes in your cluster. The control nodes manage message communication.

Vertica Process and Spread Daemons Pairs

The Vertica installation procedure installs Spread daemons on your database. A node in the Vertica process communicates with Spread daemons using domain socket. Between the nodes, communication is through 2 channels—the data channel and the control channel (UDP main channel and UDP token channel). The following image shows a 4-node cluster with a domain socket and 2 channels.

  • Vertica process and Spread daemon on the same node connect to each other through the domain socket.
  • Vertica process on one node connects with Vertica process on another node through the TCP socket.
  • Spread daemon on one node connects to the Spread daemon on another node through the UDP token and main channelss.

How Vertica and Spread Work Together

If all the nodes are healthy, Vertica and Spread communicate seamlessly with each other.

Spread daemons communicate with one another by exchanging UDP packets.  The two UDP channels (or ports) used are:

  1. The main channel—where Spread daemons sends Spread-related control messages to other daemons, and Vertica-related control messages originated from the Vertica servers.
  2. A token channel—where a special message called a Spread token is passed from one Spread daemon to the next Spread daemon. The token is important for membership, fault tolerance, and for enforcing consistent control message ordering.

If Spread on one node stops communicating with the other Spreads in the cluster, the Spread daemon removes that node from the cluster membership. The Spread daemon waits for 8 seconds before removing nodes from the membership.

When Vertica and Spread Cannot Communicate

You may encounter situations when the communication between the Spread daemon and Vertica fails.

Spread Token Timeout

The Spread daemons monitor each other's health using a token mechanism. The token is a special Spread message passed from one node to another node in the current active membership using the User Datagram Protocol (UDP). The token verifies that the Spread daemons in all the nodes are active. The token is active for 8 seconds.

Vertica Process Killed Abruptly Using kill-9

Vertica process creates a PROBE message for the Spread daemon. However, if the Vertica process crashes or is terminated abruptly using kill-9, that Vertica process cannot send a PROBE message to the other nodes or call the Spread API, SP_disconnect.

If the Spread daemon for the killed Vertica process is still running, the Spread daemon receives a message about that killed Vertica process through the TCP socket connection. Then, the Spread daemon notifies the other nodes in the cluster that the Vertica process has left the current membership.

Cluster is DOWN

If the user kills a Spread daemon, the Spread daemons on other nodes in the cluster detects the killed Spread daemon through a token timeout. For example, suppose you have an 8-node cluster with a K-safety value of 1. If 4 nodes go down or 2 buddy nodes go down, the cluster goes down. When the cluster goes down, the Vertica process invokes the membership protocol to enable remaining Spread daemons to create a new membership.

Troubleshooting and Problem Resolution

Spread Logging

By default, Spread logging is disabled. If enabled, it delays token transmission. This results in the nodes not receiving tokens for a period of 8 seconds. After 8 seconds, the Spread daemon ejects a node from the cluster due to no communication. To avoid this delay and disable Spread logging, contact Vertica Support.

Network Bandwidth Starvation

Use the following commands to check and reduce your network usage.

  1. Check the network usage of your system by querying the DC_NETWORK_INFO system table. Specify the start time and end time to see network usage for a specific timeframe.
=> SELECT node_name
     ,start_time StartTime
     ,end_time EndTime
     ,tx_kbytes_per_sec
     ,rx_kbytes_per_sec
     ,tx_kbytes_per_sec + rx_kbytes_per_sec total_kbytes_per_sec
FROM (
     SELECT node_name
           ,round(min(start_time), 'SS') AS start_time
           ,round(max(end_time), 'SS') AS end_time
           ,round(((sum(tx_bytes_end_value - tx_bytes_start_value) / 1024) / (datediff('millisecond', min(start_time), max(end_time)) / 1000))::FLOAT, 2) AS tx_kbytes_per_sec
           ,round(((sum(rx_bytes_end_value - rx_bytes_start_value) / 1024) / (datediff('millisecond', min(start_time), max(end_time)) / 1000))::FLOAT, 2) AS rx_kbytes_per_sec
     FROM dc_network_info_by_second
     WHERE start_time > '2016-09-13 16:00:00-04'
           AND end_time < '2016-09-13 17:00:00-04'
                 and
           interface_id LIKE 'eth0'
     GROUP BY node_name
           ,round(start_time, 'SS')
     ) a
ORDER BY 2,node_name;
 
     node_name     |      StartTime      |       EndTime       | tx_kbytes_per_sec | rx_kbytes_per_sec | total_kbytes_per_sec
-------------------+---------------------+---------------------+-------------------+-------------------+---------------------
v_vmartdb_node0001 | 2016-09-14 05:00:03 | 2016-09-14 05:00:04 |                 0 | 6.68              | 6.68
v_vmartdb_node0002 | 2016-09-14 05:00:03 | 2016-09-14 05:00:04 |                 0 | 6.66              | 6.66
v_vmartdb_node0003 | 2016-09-14 05:00:03 | 2016-09-14 05:00:04 |                 0 | 6.67              | 6.67
v_vmartdb_node0001 | 2016-09-14 05:00:04 | 2016-09-14 05:00:05 |              0.38 | 0.19              | 0.57
  1. Measure the network performance of your host using vnetperf. Compare the megabytes sent and megabytes received with the available bandwidth using vnetperf. In the following example, the available bandwidth is about 128 MB (1 Gbps).
$ /opt/vertica/bin/vnetperf
 
(example)
Date                    | Test           | Rate Limit (MB/s) | Node    | MB/s (sent) | MB/s (rec)  |
----------------------------------------------------------------------------------------------------
2016-09-14_11:33:37,973 | tcp-throughput | 32                | average | 31.1396     | 31.1396     |
2016-09-14_11:33:40,21  | tcp-throughput | 64                | average | 61.6482     | 61.6467     |
2016-09-14_11:33:42,65  | tcp-throughput | 128               | average | 122.681     | 122.683     |
2016-09-14_11:33:44,125 | tcp-throughput | 256               | average | 144.845     | 148.393     |
2016-09-14_11:33:46,664 | tcp-throughput | 512               | average | 154.858     | 160.587     |
2016-09-14_11:33:49,291 | tcp-throughput | 640               | average | 147.637     | 151.906     |
2016-09-14_11:33:51,928 | tcp-throughput | 768               | average | 152.137     | 156.577     |
2016-09-14_11:33:54,455 | tcp-throughput | 1024              | average | 150.206     | 153.423     |
2016-09-14_11:33:56,965 | tcp-throughput | 2048              | average | 155.692     | 157.993     |
  1. Use Database Designer to tune your queries. Unoptimized projections may cause high network usage due to the RESEGMENT/BROADCAST operators.

If you continue to see high network usage, upgrade your network.

Network Retransmission Problems

If Vertica database has large RAM and large memory than default memory is used for write-caching, it results in a full flush. Full flush blocks the Spread daemons for a long time (about 8 seconds) triggering token timeouts.

If the disks are unable to keep up with the incoming write requests and system writes the data to disks from the application. This results in the write cache to grow beyond a threshold. The kernel blocks all the IO requests until the cache is flushed below the defined threshold. Sometimes, under high load this leads to the node exiting the cluster.

vm.dirty_background_ratio = 10
vm.dirty_ratio = 20

The first parameter vm.dirty_background ratio defines the threshold of write-cache usage. Upon reaching the threshold, the kernel performs a background flush to disk.

The second parameter vm.dirty_ratio defines the threshold of memory. If the threshold becomes dirty, the kernel begins to block other IO requests until a flush completes.

Network UDP Receive Error

If you continue to see a large number of UDP packet receive errors, you may need to tune the kernel parameters as follows.

  1. Track the number of UDP network packet receive errors using the following command.
$ netstat -su

Udp:
    490180944 packets received
    359 packets to unknown port received.
    43303030 packet receive errors
    37289 packets sent
  1. If you see a large number of UDP packet receive errors, add the following parameters to the /etc/sysctl.conf file.

The following parameters are relevant to UDP error issues:

  • net.core.rmem_max
  • net.core.rmem_default
  • net.core.netdev_max_backlog
  • net.ipv4.udp_mem
  • net.ipv4.udp_rmem_min
  • net.ipv4.udp_wmem_min
# Increase number of incoming connections
net.core.somaxconn = 1024 
# Sets the send socket buffer maximum size in bytes.
net.core.wmem_max = 16777216
# Sets the receive socket buffer maximum size in bytes.
net.core.rmem_max = 16777216
# Sets the receive socket buffer default size in bytes.
net.core.wmem_default = 262144
# Sets the receive socket buffer maximum size in bytes.
net.core.rmem_default = 262144
# Sets the maximum number of packets allowed to queue when a particular interface receives packets faster than the kernel can process them.
# increase the length of the processor input queue 
net.core.netdev_max_backlog = 2000
net.ipv4.tcp_mem = 16777216 16777216 16777216
net.ipv4.tcp_wmem = 8192 262144 8388608
net.ipv4.tcp_rmem = 8192 262144 8388608 
net.ipv4.udp_mem = 16777216 16777216 16777216
net.ipv4.udp_rmem_min = 16384
net.ipv4.udp_wmem_min = 16384

If you see an UDP erros with net.core.rmem_max=16777216, you can increase this value by double by using the following command:

sudo sysctl -w net.core.rmem_max=33554432

If you have a high concurreny workload and if Vertica is CPU bound, you can increase the memory for the queue and the queue depth by using the following command:

sudo sysctl -w net.core.netdev_max_backlog=2000
  1. After adding parameters to the /etc/sysctl.conf file, run the following command.
$ sysctl -p

Memory Starvation

Use the following commands to check and reduce memory usage.

  1. Check memory usage on your system.
=> SELECT node_name, round(start_time, 'SS') as start_time,
   round(end_time, 'SS') as end_time,
   round(100 -
           ( free_memory_sample_sum       / free_memory_sample_count +
             buffer_memory_sample_sum     / free_memory_sample_count +
             file_cache_memory_sample_sum / file_cache_memory_sample_count ) /
           ( total_memory_sample_sum      / total_memory_sample_count ) * 100.0, 2.0)
   as average_memory_usage_percent
FROM dc_memory_info_by_second
WHERE start_time between '2016-09-13 15:00:00-04' and '2016-09-13 16:00:00-04'
order by start_time, node_name; 

    node_name       |     start_time      |      end_time       | average_memory_usage_percent
--------------------+---------------------+---------------------+-----------------------------
 v_vmartdb_node0001 | 2016-09-14 04:00:00 | 2016-09-14 04:00:01 | 79.52
 v_vmartdb_node0002 | 2016-09-14 04:00:00 | 2016-09-14 04:00:01 | 71.29
 v_vmartdb_node0003 | 2016-09-14 04:00:00 | 2016-09-14 04:00:01 | 71.47
 v_vmartdb_node0001 | 2016-09-14 04:00:01 | 2016-09-14 04:00:02 | 79.52
 v_vmartdb_node0002 | 2016-09-14 04:00:01 | 2016-09-14 04:00:02 | 71.29
 v_vmartdb_node0003 | 2016-09-14 04:00:01 | 2016-09-14 04:00:02 | 71.47
  1. Check the size of the catalog.
=> SELECT node_name
       ,max(ts) AS ts
       ,max(catalog_size_in_MB) AS catlog_size_in_MB
FROM (
       SELECT node_name,trunc((dc_allocation_pool_statistics_by_second."time")::TIMESTAMP, 'SS'::VARCHAR(2)) AS ts ,sum((dc_allocation_pool_statistics_by_second.total_memory_max_value - dc_allocation_pool_statistics_by_second.free_memory_min_value)) / (1024 * 1024) AS catalog_size_in_MB
       FROM dc_allocation_pool_statistics_by_second
       GROUP BY 1,trunc((dc_allocation_pool_statistics_by_second."time")::TIMESTAMP, 'SS'::VARCHAR(2))
       ) foo
GROUP BY 1
ORDER BY 1;
 
    node_name       |         ts          |   catlog_size_in_MB
--------------------+---------------------+-----------------------
 v_vmartdb_node0001 | 2016-09-29 19:50:16 | 5343.6447143554687500
 v_vmartdb_node0002 | 2016-09-29 19:50:16 | 4889.1784667968750000
 v_vmartdb_node0003 | 2016-09-29 19:50:16 | 4861.3525390625000000
  1. If the catalog size is larger than 4% of total memory size, reduce the Vertica GENERAL pool memory usage and restart the cluster.

For example, if the catalog size is 5 GB and physical memory size is 100 GB, 95% MEMORYSIZE is 95 GB. The catalog information is loaded outside of the Vertica memory space. Hence, you need to reduce the MAXMEMORYSIZE of the general pool.

  1. To reduce the MAXMEMORYSIZE of the general pool, use the following command.
=> SELECT name, maxmemorysize FROM resource_pools WHERE name='general';
 
  name   | maxmemorysize
---------+---------------
 general | Special: 95%
(1 row)
 
=> ALTER RESOURCE POOL general maxmemorysize '90%';
NOTICE 2585:  Change takes effect upon restart.  Recovering nodes will use the new value
ALTER RESOURCE POOL
 
=> SELECT name, maxmemorysize FROM resource_pools WHERE name='general';
 
  name   | maxmemorysize
---------+---------------
 general | Special: 90%
(1 row)

Another alternative is to create a new dedicated resource pool for the catalog.

To reduce the MAXMEMORYSIZE of the dedicated resource pool, use the following command.

=> CREATE RESOURCE POOL catalog_pool memorysize '4G';
CREATE RESOURCE POOL

CPU Bound

Use the following commands to check and reduce your CPU usage.

  1. Check the CPU usage of your system. Specify the start time and end time to see CPU usage for a specific timeframe.
=> SELECT node_name, round(start_time, 'SS') as start_time, round(end_time, 'SS') as end_time, round(100 -
((idle_microseconds_end_value - idle_microseconds_start_value) /
(user_microseconds_end_value + nice_microseconds_end_value + system_microseconds_end_value
+ idle_microseconds_end_value + io_wait_microseconds_end_value + irq_microseconds_end_value
+ soft_irq_microseconds_end_value + steal_microseconds_end_value + guest_microseconds_end_value
- user_microseconds_start_value - nice_microseconds_start_value - system_microseconds_start_value
- idle_microseconds_start_value - io_wait_microseconds_start_value - irq_microseconds_start_value
- soft_irq_microseconds_start_value - steal_microseconds_start_value - guest_microseconds_start_value)
) * 100, 2.0) average_cpu_usage_percent
FROM dc_cpu_aggregate_by_second
where start_time between '2016-09-13 15:00:00-04' and '2016-09-13 16:00:00-04'
order by start_time, node_name;
 
     node_name      |     start_time      |      end_time       | average_cpu_usage_percent
--------------------+---------------------+---------------------+---------------------------
 v_vmartdb_node0001 | 2016-09-14 04:00:00 | 2016-09-14 04:00:01 | 98.53
 v_vmartdb_node0002 | 2016-09-14 04:00:00 | 2016-09-14 04:00:01 | 95.67
 v_vmartdb_node0003 | 2016-09-14 04:00:00 | 2016-09-14 04:00:01 | 96.97
  1. If the CPU usage is very high, assign a single core to spread.

a. Identify the Spread daemon ID and Vertica process ID.

$ ps -ef|grep catalog
dbadmin  461884      1  0 Sep13 ?        00:00:07 /opt/vertica/spread/sbin/spread -c /home/dbadmin/VMartDB/v_vmartdb_node0001_catalog/spread.conf -D /opt/vertica/spread/tmp
dbadmin  461886      1  1 Sep13 ?        00:03:52 /opt/vertica/bin/vertica -D /home/dbadmin/VMartDB/v_vmartdb_node0001_catalog -C VMartDB -n v_vmartdb_node0001 -h 192.168.30.71 -p 5433 -P 4803 -Y ipv4
dbadmin  461902 461886  0 Sep13 ?        00:00:11 /opt/vertica/bin/vertica-udx-zygote 15 14 461886 debug-log-off /home/dbadmin/VMartDB/v_vmartdb_node0001_catalog/UDxLogs 60 16 0
dbadmin  509818 504522  0 05:56 pts/0    00:00:00 grep catalog

These results show that 461884 is the Spread daemon ID and 461886 is the Vertica process ID.

b. Using taskset, check the CPU affinity of the Spread daemon.

$ taskset -cp 461884
pid 461884's current affinity list: 0,1,2,3

c. Using taskset, check the CPU affinity of the Vertica process ID

$ taskset -cp 461886
pid 461886's current affinity list: 0,1,2,3

d. Assign core 0 to Spread and cores 1, 2, 3 to Vertica.

$ taskset -cp 0 461884
pid 461884's current affinity list: 0-3
pid 461884's new affinity list: 0

$ taskset -cp 1,2,3 461886
pid 461886's current affinity list: 0-3
pid 461886's new affinity list: 1-3 

For More Information

Share this article: