What Should I do to Shut Down Vertica Node for Maintenance?

Posted June 14, 2018 by Soniya Shah, Information Developer

If you need to shut down a Vertica node for maintenance, follow this checklist.
Step Task Results
1 Verify that all cluster nodes are UP. $ /opt/vertica/bin/admintools -t view_cluster To avoid a long node recovery time after shutdown, if one or more nodes is DOWN, identify and restart them using the instructions in Restarting Vertica on a Host.
2 If you can’t restart a node:
  • Check if the AHM is being held too long: => SELECT current_epoch, ahm_epoch, last_good_epoch FROM system;
  • Check the number of delete vectors: => SELECT COUNT(*) FROM delete_vectors;
If the AHM has been held for a long time and the delete vector count is high and spread across multiple tables, node recovery could be slow due to replay deletes. In that case, follow the steps in the node recovery checklist. Suppose the following are true:
  • You are doing deletes and updates that affect more than 3% of your data.
  • Your node has been down for a long time.
  • You have a staging table that has a lot of updates and deletes. In this case, drop the table before recovering the node.
Your final option is to force the AHM to advance with this command: => SELECT MAKE_AHM_NOW(true); Then, when you recover the down node, Vertica recovers that node from scratch by copying the data from its buddy nodes.
3 Check node dependencies: => SELECT GET_NODE_DEPENDENCIES(); A clean node dependency lists (number of nodes + 1) lines. If the node dependencies are correct, go to Step 4. If the node dependencies are incorrect, rebalance the data in the cluster: => SELECT REBALANCE_CLUSTER(); If, after rebalancing the cluster, node dependencies are still incorrect, contact Vertica Support.
4 Back up your database to avoid any loss of data. $ vbr -t backup --config $FULLBAK_CONFIG For more information, see Backing Up the Database. Consider a hard-link backup to speed up the process, as described in Creating Hard-Link Local Backups. If the backup is successful, go to Step 5. If the backup does not complete, when the database is down, perform a cold backup or an offline backup. To do so, copy the catalog and data directories to another location. For Vertica 7.2.x and earlier: => SELECT name, catalogpath, bdbpath FROM vs_NODES; For Vertica 8.0.x and later: => SELECT name, catalogpath FROM vs_NODES;
5 Prepare to shut down the database.
a. View the maximum number of sessions: => SHOW CURRENT "MaxClientSessions"; To prevent additional users from connecting to the database, set MaxClientSessions to 0: => ALTER DATABASE <database_name>/< SET MaxClientSessions=0; Note: MaxClientSessions = 0 allows five dbadmin sessions.
b. The shutdown process executes the Tuple Mover. However, to control the shutdown process, run the Tuple Mover to move all projections from the WOS to the ROS: => SELECT DO_TM_TASK('moveout');If you omit the table name, the Tuple Mover moves everything out of the WOS.
c. Verify that Tuple Mover moved everything by querying the RESOURCE_USAGE system table to check for bytes used in WOS. => SELECT node_name, SUM(memory_inuse_kb) FROM resource_pool_status WHERE pool_name = 'wosdata' GROUP BY 1 ORDER BY 1 ; node_name | sum --------------------+----- v_vmart_db_node0001 | 0 v_vmart_db_node0002 | 0 v_vmart_db_node0003 | 0 v_vmart_db_node0004 | 0 v_vmart_db_node0005 | 0
d. Verify that mergeout is not running. Shutdown cannot occur until mergeout completes: => SELECT * FROM tuple_mover_operations WHERE is_executing; If there are mergeout operations running, wait until they complete, or cancel the mergeouts by closing the sessions, as described in Step 5f.
e. Query the SESSIONS system table to see which sessions are still running: => SELECT * FROM SESSIONS; If sessions don’t complete, proceed to Step 5f. If sessions are complete, proceed to Step 5g.
f. Close any open sessions: => SELECT CLOSE_ALL_SESSIONS(); or close specific sessions: => SELECT CLOSE_SESSION(‘session_id’); To close mergeout sessions, get the session_id from the TUPLE_MOVER_OPERATIONS system table.
g. Verify that the sessions are closed: => SELECT * FROM SESSIONS; If sessions are still open, return to Step 5b and continue. Otherwise, proceed to Step 5h.
h. If Steps 5b through 5g took a long time, new data loads or Tuple Mover operations may have started. If that’s the case, return to Step 5b. $ /opt/vertica/bin/admintools -t view_cluster
i. Move the ancient history mark (AHM) to avoid replay deletes. => SELECT MAKE_AHM_NOW(); Note: If a node is down, this step fails. => SELECT MAKE_AHM_NOW(‘true’); The above query advances the AHM but forces a recovery from scratch, as explained in Step 2.
6 Shut down the database: => SELECT SHUTDOWN();
7 Verify that the Vertica process is properly shut down on each node: $ for host in `grep -P "^v_" /opt/vertica/config/admintools.conf|awk '{print $3}'|awk -F, '{print $1}'`;do echo ---- $host ---- ; ssh $host "ps -ef | grep /opt/vertica/bin/vertica"; done If Vertica is properly shut down on each node, proceed to Step 11. Otherwise, continue to Step 8.
8 Connect to any node that still has the Vertica process running. Run the following command to see what the Vertica process is doing: $ tail –f vertica.log If the vertica.log shows activity, wait until it completes. If you can’t wait for the process to complete, collect vstack information to send to Vertica Technical Support, letting them know that shutdown process did not complete successfully. $ /opt/vertica/bin/vstack > /tmp/vstack_nodexx.log Then proceed to Step 9.
9 To stop the shutdown process, kill the Vertica process in admintools > Advanced > Killing a Vertica Process on host. If you shut down the database on two buddy nodes so that the database becomes unsafe, Vertica automatically shuts down on all the other nodes.
10 Verify that Vertica process stopped on all the nodes using the command in Step 7.
11 After you have performed the necessary maintenance, restart the database following the instructions in Restart Vertica on a Node. The checklist is complete.