What Should I do if my Node Recovery is Slow?

Posted June 14, 2018 by Soniya Shah, Information Developer

white cloud in vault type room representing cloud computing
If you are running Vertica 7.2.x or later, perform recovery by table. For details, see Recovery By Table in the Vertica documentation. If you are running a Vertica version prior to 7.1.x, stop the ETL jobs and restart node recovery.
Step Task Results
1 Monitor progress of recovery: => SELECT node_name, is_running FROM RECOVERY_STATUS;If is_running = f, recovery completed. Repeat this statement to see if the current_completed value increases, meaning that recovery is progressing. Is recovery progressing? If recovery is not progressing, go to Step 3. If recovery is progressing or complete, go to Step 2.
2 Did node recovery complete successfully? If yes, this is the end of your checklist. If no, recovery completed with errors, go to Step 6.
3 Is recovery slower than expected? If no, go to Step 4. If yes, recovery is slow:
  • Use iostat to check disk I/O. If there is a problem, change the disk scheduler to DEADLINE_ R_NOOP.
  • Check to see if the number of concurrently running queries is at or close to the maximum number:=> SELECT node_name, pool_name, max_concurrency, running_query_count from RESOURCE_POOL_STATUS;
  • If max_concurrency = running_query_count, the query load is too high:
    a. Increase MAXCONCURRENCY.
    b. Restart recovery.
    • c. Go to
Step 1.
4 Does recovery seem to be stuck on a particular table? If no, go to Step 6. If yes, check if the node is recovering: => SELECT node_name, node_state from NODES;If node_state=RECOVERING, node recovery is in progress.
5 Check whether node is in the Historical phase of recovery: => SELECT node_name, recovery_phase, historical_completed, historical_total FROM RECOVERY_STATUS;
  • If historical_completed is less than historical_total, node is in Historical phase. Vertica is moving storage containers.
  • Repeat previous statement until historical_complete = historical_total.
If Vertica is in the Historical phase of recovery, you need to wait until it completes.
6 Is the transaction stuck in replay deletes? If no, go to Step 7. If yes, do one of the following depending on the size of your data: Data < 1 TB:
  • Stop the node.
  • Run MAKE_AHM_NOW(true).
  • Restart node recovery.
  • Go to Step 2.
Data > 1 TB:
  • Create a new table.
  • Load the data from existing table.
  • Delete old table.
  • Restart recovery.
  • Go to Step 2.
7 Check whether recovery is waiting for a lock. => SELECT node_name, user_id, transaction_id, object_name, mode FROM DC_LOCK_REQUESTS; If no, go to Step 6. If yes,
  • Ask user to release the table.
  • Resume recovery.
  • Go to Step 1.
8 Node recovery failed. Check recovery errors. $ grep "Recovery Error" vertica.log Is the error a lock error? If no, not a lock error, contact Vertica Technical Support. If yes,
  • Check the transaction that is locking the recovery:
=> SELECT node_name, user_id, transaction_id, object_name, mode FROM DC_LOCK_REQUESTS;
  • If the transaction is the Tuple Mover, wait for it to complete.
  • If the transaction is a load, stop the load, restart recovery, and go to Step 1.
  • If running Vertica 7.2.x, perform recovery by table. See Recovery by Table in the Vertica documentation.
  • Otherwise, if running a version earlier than Vertica 7.2.x, contact Vertica Technical Support.

Learn More

Learn more about NODE_STATES in the Vertica Documentation.