Vertica

Author Archive

Meet our Summer 2014 Interns!

Did your 2014 summer internship include rubber bulldozers, ice cream, and bumper boats? Were you able to develop and work on real projects for real customers, while eating free bagels and bananas? If not, then consider applying to HP Vertica for the summer of 2015.

Our 2014 interns had a great time scaling mountains (all right, it was a 635-foot hill) and building bridges (made out of toothpicks and gumdrops), while developing software that makes the HP Vertica database faster, more accurate, and more secure.

Our interns work closely with their mentors to solve hard problems and improve our product. Along the way, we encourage them to collaborate with cross-functional teams, attend technical talks that aren’t necessarily related to their projects, and create funny short videos about our database features. One of those videos appeared in EVP and GM Colin Mahony’s welcome presentation at the HP Vertica Big Data Conference!

Our interns tackled (and solved!) some interesting problems during the summer of 2014, including:

  • Improving trickle loading using Kestrel and Apache Storm.
  • Improving the encoding algorithm for Vmap data in a flex table.
  • Creating an R package for pattern mining.
  • Integrating HP Vertica with Apache Hadoop YARN.
  • Enhancing the documentation about database locks.
  • Implementing key-based client authentication for the HP Vertica Amazon Machine Image (AMI).
  • Adding features to and improve the performance of our test-tracking application.
  • Improving the scalability and performance of HP Vertica Database Designer.
  • Improving query optimizer plans for columns that are guaranteed to be unique.
  • Developing a tool that processes diagnostic information.

Everyone at HP Vertica works hard, but we like to have fun, too. We make sure to include the interns in our company outings and weekly gaming nights, but we also plan extra activities for them: hiking, mini-golf, volleyball, and tubing were some of this summer’s highlights. And our in-work and out-of-work activities this summer usually included copious consumption of ice cream.

Over the years, many of our best employees are former interns. So if you want to improve your technical skills, gain an understanding of our column-store database, make new friends, and have a lot of fun in the process, now is the best time to apply for an internship at HP Vertica.

What’s New in Dragline (7.1.0): Live Aggregate Projections

Live Aggregate Projections Video

 

HP Vertica 7.1 introduces live aggregate projections. A live aggregate projection is a projection that contains one or more columns of data that have been aggregated from a table.

If you frequently query data that requires aggregation, you could benefit from using a live aggregate projection. Because data in your live aggregate projection is aggregated at load time, rather than at the time you run a query, you can save time and resources. On subsequent data loads, HP Vertica updates the table and loads the aggregated values into the live aggregate projection. If you query the live aggregate projection any time after that, you’ll not only see the same results you would if you queried the data from the table and then aggregated it, but you’ll also use less resources in the process.

Check out this video to learn more about Live Aggregate Projections. Stay tuned for our next video, ‘Top-K Projections’.

See Also:

 Live Aggregate Projections with HP Vertica

Live Aggregate Projection documentation

Smart Grid Solution Demo

What’s New in Dragline (7.1.0): Using HP Vertica Pulse

Using HP Vertica Pulse on Vimeo

In our previous video, we showed you how to install HP Vertica Pulse, our add-on sentiment analysis package that allows you to analyze and extract the sentiment from text.

Take a look at this video to learn how to use and tune HP Vertica Pulse to work for your specific business needs.

You can download HP Vertica Pulse as an add-on package for your Enterprise Edition of HP Vertica at my.vertica.com.

HP Vertica Pulse documentation.

What’s New in Dragline (7.1.0): Installing HP Vertica Pulse

Installing HP Vertica Pulse on Vimeo

HP Vertica 7.1.0 introduces the general availability of HP Vertica Pulse, our add-on sentiment analysis package for HP Vertica. Pulse provides a suite of functions that allow you to analyze and extract the sentiment from text, directly from your HP Vertica database. For example, you can use HP Vertica Pulse to analyze sentiment from Tweets or online product reviews to get a feel for how satisfied your customers are about your products or services.

HP Vertica Pulse automatically discovers attributes included in text and scores them using a built-in system dictionary. You can tune user-dictionaries to detect certain words or phrases, to determine how words are scored, and to filter out attributes that are of no interest to you. Because of this flexibility, you can tune HP Vertica Pulse to work for your specific business needs.

Currently, HP Vertica Pulse allows you to analyze English language text only. You can download HP Vertica Pulse as an add-on package for your Enterprise Edition or as a trial for your Community Edition, from my.vertica.com. Additionally, the Innovations section of the HP Vertica Marketplace offers a beta version of Pulse for Spanish only. Take a look at this video to learn how to install Pulse and stay tuned for our next video, ‘Using Pulse’.

HP Vertica Pulse documentation.

Obtaining and installing your HP Vertica license

Watch the video here!

Obtaining and installing your HP Vertica license may seem like tricky business. Especially if you have more than one. But the process need not be complicated or frustrating. For a Community Edition license, you don’t even need to go through any additional steps after installing Vertica. For Enterprise Edition or Flex Zone licenses, you’ll go through a step-by-step process in HP’s licensing portal called Poetic and then provide Vertica with the path to the license file you download. That’s it! You can even apply your license through the Vertica Management Console. To see the process in action, watch this video about obtaining and installing the different HP Vertica licenses.

Useful links:
Poetic (HP’s Licensing for Software portal)
HP License Support Center

Tech Support Series: Optimizing for Deletes

This blog is just the first in a series that addresses frequently asked tech support questions. For now, we’ll talk about optimizing your database for deletion.

You may find that from time to time your recovery and query execution is slow due to high volumes of delete vectors. Occasionally, performing a high number of deletes or updates can negatively affect query performance and recovery due to delete replay.

Delete replay occurs when ROS containers are merged together. The data marked for deletion in each of the ROS containers needs to be remarked once the containers are merged. This process can hold up your ETL processes because the Tuple Mover lock (T lock) stays on until the replay deletes finish.

Luckily, optimizing your database for deletes can help speed up your processes. If you expect to perform a high number of deletes, first consider the reason for deletion. The following is a list of common reasons for high delete usage:

  • You regularly delete historical data and upload new data at specific intervals
  • You constantly update data or you want to delete data that was loaded my mistake
  • You often delete staging tables

To optimize your database for deletion, follow the suggestions that correspond to your reason for deletion.

  1. If you regularly delete historical data to make room for newer data, use partitioning to chunk data into groups that will be deleted together. For example, if you regularly delete the previous month’s data, partition data by month. When you use partitioning, you can use the DROP_PARTITION function to discard all ROS containers that contain data for the partition. This operation removes historical data fast because no purging or replay deletes are involved.
  2. You may also want to delete a high volume of data because it was loaded by mistake or because you frequently update data (which involves frequently deleting data). In these cases, you may see a high volume of delete vectors. There are three good ways to prevent this:
  3.  

    1. Create delete-optimized projections by using a high cardinality column at the end of the sort order. This helps the replay delete process quickly identify rows to be marked for deletion.
    2.  

    3. Make sure your Ancient History Mark (AHM) is advancing and close to the Last Good Epoch (LGE) or Current Epoch. You may also want to periodically use the MAKE_AHM_NOW function to advance the ancient history mark to the greatest allowable value. When a mergeout occurs, all data that is marked for deletion before the AHM will be purged, minimizing the amount of replay deletes.
    4.  

    5. Periodically check the number of delete vectors in your tables using the DELETE_VECTORS system table. The automatic Tuple Mover will eventually purge deleted data but if you find your tables have a large number of delete vectors, you can manually purge records using the PURGE_TABLE function.
  4. You may find that you frequently delete staging tables. To streamline this process, you can truncate the staging table instead of deleting it using the TRUNCATE TABLE function. Truncating a table will discard the ROS containers that contain the data instead of creating delete vectors, and thus is more efficient than table deletion.

 
Frequently deleting data is often a cause of slow query performance. Fortunately, you can optimize your database for deletions with these tips and avoid the headache.

How to:

Drop a partition:

=> SELECT DROP_PARTITION (table_name, partition_value);

Get epoch:

=> SELECT current_epoch, ahm_epoch, last_good_epoch FROM SYSTEM;

Set AHM to greatest allowable value:

=> SELECT MAKE_AHM_NOW();

Determine number of delete vectors:

=> SELECT * FROM v_monitor.DELETE_VECTORS;

Purge data:

=> SELECT PURGE_TABLE(table_name);

Get Started With Vertica Today

Subscribe to Vertica