What Should I do if my Node Recovery is Slow?

If you are running Vertica 7.2.x or later, perform recovery by table. For details, see Recovery By Table in the Vertica documentation.
If you are running a Vertica version prior to 7.1.x, stop the ETL jobs and restart node recovery.

Step Task Results
1 Monitor progress of recovery: => SELECT node_name, is_running FROM RECOVERY_STATUS;

If is_running = f, recovery completed.
Repeat this statement to see if the current_completed value increases, meaning that recovery is progressing.
Is recovery progressing?

If recovery is not progressing, go to Step 3.

If recovery is progressing or complete, go to Step 2.
2 Did node recovery complete successfully? If yes, this is the end of your checklist.
If no, recovery completed with errors, go to Step 6.
3 Is recovery slower than expected? If no, go to Step 4.
If yes, recovery is slow:
  • Use iostat to check disk I/O. If there is a problem, change the disk scheduler to DEADLINE_ R_NOOP.
  • Check to see if the number of concurrently running queries is at or close to the maximum number: => SELECT node_name, pool_name, max_concurrency, running_query_count from RESOURCE_POOL_STATUS;
  • If max_concurrency = running_query_count, the query load is too high:
    • a. Increase MAXCONCURRENCY.
      b. Restart recovery.
    4 Does recovery seem to be stuck on a particular table? If no, go to Step 6.
    If yes, check if the node is recovering: => SELECT node_name, node_state from NODES; If node_state=RECOVERING, node recovery is in progress.
    5 Check whether node is in the Historical phase of recovery: => SELECT node_name, recovery_phase, historical_completed, historical_total FROM RECOVERY_STATUS;
  • If historical_completed is less than historical_total, node is in Historical phase. Vertica is moving storage containers.
  • Repeat previous statement until historical_complete = historical_total.
  • If Vertica is in the Historical phase of recovery, you need to wait until it completes.
    6 Is the transaction stuck in replay deletes? If no, go to Step 7.
    If yes, do one of the following depending on the size of your data:
    Data < 1 TB:
  • Stop the node.
  • Run MAKE_AHM_NOW(true).
  • Restart node recovery.
  • Go to Step 2.

  • Data > 1 TB:
  • Create a new table.
  • Load the data from existing table.
  • Delete old table.
  • Restart recovery.
  • Go to Step 2.
  • 7 Check whether recovery is waiting for a lock. => SELECT node_name, user_id, transaction_id, object_name, mode FROM DC_LOCK_REQUESTS; If no, go to Step 6.

    If yes,
  • Ask user to release the table.
  • Resume recovery.
  • Go to Step 1.
  • 8 Node recovery failed. Check recovery errors. $ grep "Recovery Error" vertica.log Is the error a lock error? If no, not a lock error, contact Vertica Technical Support.
    If yes,
  • Check the transaction that is locking the recovery:
  • => SELECT node_name, user_id, transaction_id, object_name, mode FROM DC_LOCK_REQUESTS;
  • If the transaction is the Tuple Mover, wait for it to complete.
  • If the transaction is a load, stop the load, restart recovery, and go to Step 1.
  • If running Vertica 7.2.x, perform recovery by table. See Recovery by Table in the Vertica documentation.
  • Otherwise, if running a version earlier than Vertica 7.2.x, contact Vertica Technical Support.
  • Learn More

    Learn more about NODE_STATES in the Vertica Documentation.