What Should I do if the Database Performance is Slow?

Posted June 14, 2018 by Soniya Shah, Information Developer

Troubleshootusing the following checklist if your database performance is slow. Check if any of the following problems exist:
Step Task Results
1 Is the query performance slow? If the query performance is slow, review the Query Performance checklist. If the query performance is not slow, go to Step 2.
2 Is the entire database slow? If the whole database is slow, go to Step 3. If the whole database is not slow, your checklist is complete.
3 Check if all the nodes are UP. => SELECT node_name, node_address, node_state FROM nodes WHERE node_state != 'UP'; If there is any node DOWN,
  • Investigate why the node is down review the Node Down checklist.
  • Restart the node. $ admintools –t restart_nodes –d <database> -s <nodes_address>
If nodes restarted and the performance improved, your checklist is complete. If the node restarted and performance is still slow, go to Step 4. If the node did not restart, review the Node Down checklist.
4 Check if there are too many delete vectors. => SELECT count(*) FROM delete vectors; If there are more than 1000 delete vectors, review the Manage Delete Vectors checklist. If there are not too many delete vectors, go to Step 5.
5 Check if epochs are advancing. => SELECT current_epoch, ahm_epoch, last_good_epoch, designed_Fault_tolerance, current_fault_tolerance FROM system ; If epochs are not advancing, review the AHM not Advancing checklist. If epochs are advancing, go to Step 6.
6 Check if one node is slower than the others. Run a select statement for each node in the cluster and identify the there is a slower node. $ `grep -P "^v_" /opt/vertica/config/admintools.conf|awk '{print $3}'| awk -F, '{print $1}'`; do echo ----- $host -----; date ; vsql -h $host -c ";select /*+kV*/ 1 ;";date ; done If one node is slower than the others,
  • Investigate host performance issue.
  • Restart the Vertica process on that node.
Start: $ admintools –t restart_node –d <database> -s Stop: $ admintools –t stop_node –s <node_ip/Host slow> If all the nodes have similar performance, go to Step 7.
7 Check if the workload is balanced across all the nodes. => SELECT node_name,count(*) FROM dc_requests_issued WHERE time > sysdate() -1 group by 1 ORDER BY 1; If one node has a heavier workload, distribute the workload to all the nodes. Review the documentation on Connection Load balancing. If the workload is balanced, go to Step 8.
8 Check if there are resource rejections. => SELECT * FROM resource_rejections ORDER BY last_rejected_timestamp; If there are resource rejections, review the Query Performance checklist. If there are no significant resource rejections that justify slowness go to Step 9.
9 Check if there are sessions in queue. => SELECT * FROM resource_queues; If there are queries waiting for resources, go to Step 10.
10 Check if there are long-running sessions that are using too many resources. => SELECT r.pool_name, s.node_name AS initiator_node, s.session_id, r.transaction_id, r.statement_id, max(s.user_name) AS user_name, max(substr(s.current_statement, 1, 100)) AS statement_running, max(r.thread_count) AS threads, max(r.open_file_handle_count) AS fhandlers, max(r.memory_inuse_kb) AS max_mem, count(DISTINCT r.node_name) AS nodes_count, min(r.queue_entry_timestamp) AS entry_time, max(((r.acquisition_timestamp - r.queue_entry_timestamp))) AS waiting_queue, max(((clock_timestamp() - r.queue_entry_timestamp))) AS running_time FROM (v_internal.vs_resource_acquisitions r JOIN v_monitor.sessions s ON (((r.transaction_id = s.transaction_id) AND (r.statement_id = r.statement_id)))) WHERE (length(s.current_statement) > 0) GROUP BY r.pool_name, s.node_name, s.session_id, r.transaction_id, r.statement_id ORDER BY r.pool_name; If there is a statement running for too long and using a high proportion of the box resources, consider stopping the statement, => SELECT interrupt_statement(‘session_id’,’statement_id’); Upon statement cancellation, resources should be freed and performance should improve. If it does not improve, go to Step 11.. If the statement does not terminate properly, contact Vertica Technical Support.
11 Check if any transactions are waiting for locks. => SELECT * FROM locks where grant_timestamp is null; If transactions are waiting for locks, identify lock-holding sessions and consider wait for transaction to complete or cancel the session to free locks. => SELECT interrupt_stament(‘session_id’,’statement_id’); Upon statement completion or cancellation and lock release, performance should improve. If it does not improve, go to Step 12. If the session does not terminate properly, contact Vertica Technical Support reporting a hang session.
12 Check the catalog size in memory. => SELECT node_name,max(ts) as ts, max(catalog_size_in_MB) as catlog_size_in_MB FROM ( SELECT node_name,trunc((dc_allocation_pool_statistics_by_second."time")::TIMESTAMP, 'SS'::VARCHAR(2)) AS ts, sum((dc_allocation_pool_statistics_by_second.total_memory_max_value - dc_allocation_pool_statistics_by_second.free_memory_min_value))/(1024*1024) AS catalog_size_in_MB from dc_allocation_pool_statistics_by_second group by 1,2) foo group by 1 ORDER BY 1 limit 50; If catalog is larger than the 5% of memory in the host, resource pools should be adjust to free memory needed by the catalog as Vertica process is a risk of being terminated by the kernel with OOM. Contact Vertica Technical Support to debug catalog size growth and discuss alternatives to free memory to be used to allocated catalog. Alternatives are:
  • Adjust general pool to use less than 95%.
  • Create an additional Pool of size of the difference needed to accommodate catalog.
  • Adjust the METADATA resource pool to free memory for the catalog
In Many cases restarting the node could free memory used by catalog, debugging with support will help to determine the best course of action.
13 Check usage of resident and virtual memory and maps of memory created. => SELECT * FROM ( SELECT time, node_name, files_open, other_open,sockets_open,virtual_size,resident_size,thread_count,map_count, row_number() over (partition by node_name ORDER BY time::timestamp desc) as row FROM dc_process_info ) a where row <=3 ; If Virtual Memory or resident memory is high, monitor to see if the numbers lower. If the numbers do not lower, contact Vertica Technical support to debug the issue. Restarting the nodes should resolve the issue but a proper debugging should be done, follow Catalog Size Debugging checklist.

Learn More

Learn more about Connection Load Balancing in the Vertica Documentation.