Obtaining and installing your HP Vertica license may seem like tricky business. Especially if you have more than one. But the process need not be complicated or frustrating. For a Community Edition license, you don’t even need to go through any additional steps after installing Vertica. For Enterprise Edition or Flex Zone licenses, you’ll go through a step-by-step process in HP’s licensing portal called Poetic and then provide Vertica with the path to the license file you download. That’s it! You can even apply your license through the Vertica Management Console. To see the process in action, watch this video about obtaining and installing the different HP Vertica licenses.
This blog is just the first in a series that addresses frequently asked tech support questions. For now, we’ll talk about optimizing your database for deletion.
You may find that from time to time your recovery and query execution is slow due to high volumes of delete vectors. Occasionally, performing a high number of deletes or updates can negatively affect query performance and recovery due to delete replay.
Delete replay occurs when ROS containers are merged together. The data marked for deletion in each of the ROS containers needs to be remarked once the containers are merged. This process can hold up your ETL processes because the Tuple Mover lock (T lock) stays on until the replay deletes finish.
Luckily, optimizing your database for deletes can help speed up your processes. If you expect to perform a high number of deletes, first consider the reason for deletion. The following is a list of common reasons for high delete usage:
- You regularly delete historical data and upload new data at specific intervals
- You constantly update data or you want to delete data that was loaded my mistake
- You often delete staging tables
To optimize your database for deletion, follow the suggestions that correspond to your reason for deletion.
- If you regularly delete historical data to make room for newer data, use partitioning to chunk data into groups that will be deleted together. For example, if you regularly delete the previous month’s data, partition data by month. When you use partitioning, you can use the DROP_PARTITION function to discard all ROS containers that contain data for the partition. This operation removes historical data fast because no purging or replay deletes are involved.
- You may also want to delete a high volume of data because it was loaded by mistake or because you frequently update data (which involves frequently deleting data). In these cases, you may see a high volume of delete vectors. There are three good ways to prevent this:
- Create delete-optimized projections by using a high cardinality column at the end of the sort order. This helps the replay delete process quickly identify rows to be marked for deletion.
- Make sure your Ancient History Mark (AHM) is advancing and close to the Last Good Epoch (LGE) or Current Epoch. You may also want to periodically use the MAKE_AHM_NOW function to advance the ancient history mark to the greatest allowable value. When a mergeout occurs, all data that is marked for deletion before the AHM will be purged, minimizing the amount of replay deletes.
- Periodically check the number of delete vectors in your tables using the DELETE_VECTORS system table. The automatic Tuple Mover will eventually purge deleted data but if you find your tables have a large number of delete vectors, you can manually purge records using the PURGE_TABLE function.
- You may find that you frequently delete staging tables. To streamline this process, you can truncate the staging table instead of deleting it using the TRUNCATE TABLE function. Truncating a table will discard the ROS containers that contain the data instead of creating delete vectors, and thus is more efficient than table deletion.
Frequently deleting data is often a cause of slow query performance. Fortunately, you can optimize your database for deletions with these tips and avoid the headache.
Drop a partition:
=> SELECT current_epoch, ahm_epoch, last_good_epoch FROM SYSTEM;
Set AHM to greatest allowable value:
=> SELECT MAKE_AHM_NOW();
Determine number of delete vectors:
=> SELECT * FROM v_monitor.DELETE_VECTORS;
=> SELECT PURGE_TABLE(table_name);
HP Vertica 7 introduces a new installer that helps you configure your system for the best possible performance. How does it do this? Well, the new installer checks your systems and lists any changes that you need to make for optimum performance. Not only that, it gives you a direct link to the documentation page that will help you change your system appropriately. Once you change your system configuration, you can rerun the install and know that your system is optimized to run HP Vertica 7.
Check out the following video to see a demonstration of how to change typical settings for default installs on RedHat or CentOS systems.
With the HP Vertica 7, you can use Database Designer with Management Console. As in previous releases, you can still run Database Designer from Admin Tools, but its integration with Management Console offers an additional easy-to-use method for creating a database design.
Database Designer optimizes query performance and minimizes the disk storage that the database uses. It does this by analyzing your logical schema, sample data, and, optionally, your sample queries. Then, Database Designer creates a physical schema design (a set of projections) that can be deployed automatically or manually.
Check out the following demo to get started with the new Database Designer feature in Management Console.
* When using this new feature, remember that, to create the design, you must be a DBADMIN user or have the DBUSER role assigned to you with write access to the tables in your schema.
For more information, visit www.vertica.com/documentation.
With HP Vertica 7.0, Management Console offers a new way to visualize your query plans and get profile information about your queries. You can run EXPLAIN on a query using Management Console’s Query Plan Visualizer, and it provides you with a visual representation of the query plan with the lowest cost. Management Console highlights and links to key information in the output, so you can spot issues at a glance. When you profile a query, Management Console provides a graphical view of what resources were used when HP Vertica executed the query.
Our new video tutorial walks you through using Management Console to view query plans and profile data. See the video below, and make sure to check out the other video tutorials we offer here.
You can also read more about the feature in this blog post: Visualizing Your Query Plan with Management Console 7.