1 |
Monitor progress of recovery:
=> SELECT node_name, is_running FROM RECOVERY_STATUS; If is_running = f, recovery completed.
Repeat this statement to see if the current_completed value increases, meaning that recovery is progressing.
Is recovery progressing? |
If recovery is not progressing, go to Step 3.
If recovery is progressing or complete, go to Step 2. |
2 |
Did node recovery complete successfully? |
If yes, this is the end of your checklist.
If no, recovery completed with errors, go to Step 6. |
3 |
Is recovery slower than expected? |
If no, go to Step 4.
If yes, recovery is slow:
- Use iostat to check disk I/O. If there is a problem, change the disk scheduler to DEADLINE_ R_NOOP.
- Check to see if the number of concurrently running queries is at or close to the maximum number:
=> SELECT node_name, pool_name, max_concurrency, running_query_count from RESOURCE_POOL_STATUS;
- If max_concurrency = running_query_count, the query load is too high:
a. Increase MAXCONCURRENCY.
Step 1. |
4 |
Does recovery seem to be stuck on a particular table? |
If no, go to Step 6.
If yes, check if the node is recovering:
=> SELECT node_name, node_state from NODES; If node_state=RECOVERING, node recovery is in progress. |
5 |
Check whether node is in the Historical phase of recovery:
=> SELECT node_name, recovery_phase, historical_completed, historical_total FROM RECOVERY_STATUS; |
- If historical_completed is less than historical_total, node is in Historical phase. Vertica is moving storage containers.
- Repeat previous statement until historical_complete = historical_total.
If Vertica is in the Historical phase of recovery, you need to wait until it completes. |
6 |
Is the transaction stuck in replay deletes? |
If no, go to Step 7.
If yes, do one of the following depending on the size of your data:
Data < 1 TB:
- Stop the node.
- Run MAKE_AHM_NOW(true).
- Restart node recovery.
- Go to Step 2.
Data > 1 TB:
- Create a new table.
- Load the data from existing table.
- Delete old table.
- Restart recovery.
- Go to Step 2.
|
7 |
Check whether recovery is waiting for a lock.
=> SELECT node_name, user_id, transaction_id, object_name, mode FROM DC_LOCK_REQUESTS; |
If no, go to Step 6.
If yes,
- Ask user to release the table.
- Resume recovery.
- Go to Step 1.
|
8 |
Node recovery failed. Check recovery errors.
$ grep "Recovery Error" vertica.log
Is the error a lock error? |
If no, not a lock error, contact Vertica Technical Support.
If yes,
- Check the transaction that is locking the recovery:
=> SELECT node_name, user_id, transaction_id, object_name, mode FROM DC_LOCK_REQUESTS;
- If the transaction is the Tuple Mover, wait for it to complete.
- If the transaction is a load, stop the load, restart recovery, and go to Step 1.
- If running Vertica 7.2.x, perform recovery by table. See Recovery by Table in the Vertica documentation.
- Otherwise, if running a version earlier than Vertica 7.2.x, contact Vertica Technical Support.
|
|
|
|