Automatic Eviction of Unhealthy Nodes

To manage the health of the nodes in your cluster, Vertica performs regular health checks by sending and receiving "heartbeats." During a health check, each node in the cluster verifies read-write access to its catalog, catalog disk, and local storage locations ('TEMP, DATA', TEMP, DATA, and DEPOT). Upon verification, the node sends a heartbeat. If a node fails to send a heartbeat after five intervals (fails five health checks), then the node is evicted from the cluster.

You can control the time between each health check with the DatabaseHeartBeatInterval parameter. By default, DatabaseHeartBeatInterval is set to 120, which allows five 120-second intervals to pass without a heartbeat.

The amount of time allowed before an eviction is:

TOT = DHBI * 5

where TOT is the total time (in seconds) allowed without a heartbeat before eviction, and DHBI is equal to the value of DatabaseHeartBeatInterval.

If you set the DatabaseHeartBeatInterval too low, it can cause evictions in cases of brief node health issues. Sometimes, such premature evictions result in lower availability and performance of the Vertica database.

See Also

DatabaseHeartbeatInterval in General Parameters