Vertica

Archive for the ‘Technical’ Category

Connecting HP Vertica to DbVisualizer



In our latest video entry, we show you how to connect the DbVis Software database tool, DbVisualizer, to your HP Vertica database. DbVisualizer is a database tool that allows you to manage and administer your database visually. To learn more about DbVisualizer, visit their webpage at dbvis.com. You can also go to vertica.com/documentation to find more HP Vertica documentation.

HP Vertica for SQL on Hadoop

HP Vertica for SQL on Hadoop from Vertica Systems on Vimeo

HP Vertica now offers a SQL on Hadoop license, which allows you to leverage Vertica’s powerful analytics engine to explore data in Hadoop Distributed File System (HDFS).

This offering is licensed per-node/per-year term with no data volume limits.

With your SQL on Hadoop license, you get access to proven and enterprise features like:

  • Database designer
  • Management console
  • Workload management
  • Flex tables
  • External tables
  • Backup functionality

See our documentation on HP Vertica SQL on Hadoop for limitations.
To learn more about other HP Vertica licenses, view our Obtaining and Installing Your HP Vertica Licenses video or contact an HP Licensing center.

Using Location Data with HP Vertica Place

A couple of weeks ago, I went to my favorite museum in the Boston area to see a new exhibition. While I was there, I noticed a lot of visitors walking through the museum using self-guided tours on handheld devices. I knew that the museum rented such devices to visitors, but I hadn’t thought about how the museum could use these devices to gather information about visitor behavior. Because I’ve been working on HP Vertica’s geospatial functionality, HP Vertica Place, I wondered what kind of insight the museum might be able to gather from their visitors’ spatial relationship with the museum and its artwork.

HP Vertica Place leverages the performance and scale of the HP Vertica Analytics Platform and uses OGC-based SQL functions for computation on two-dimensional planar data. So, I know that HP Vertica Place can help me answer a number of questions regarding point-in-polygon, distance, and intersection to name a few.

To test my hypothesis I needed a dataset. Because I couldn’t find one publicly available, I created my own dataset using a Python script.

Instead of trying to replicate the movement of thousands of people moving through a space with more than 500,000 square feet, I took a more micro approach. Museums are normally compartmentalized into rooms, so I choose to focus on a single room of a museum. In that room, I added six different works of art for visitors to view. Visitors could view the artworks in differing order and could also view the works of art for a varying amount of time. All visitors to the room are assumed to be using a handheld device providing them additional information about the artwork. The handheld device records each visitor’s location data every second. This information is then downloaded from the device after the visitor returns it and is then uploaded to an HP Vertica database.

Using the script I created, I generated a dataset of 2,000 visitors moving through the room from 9:00 am to 6:00 pm.


This graphic shows how the room was laid out:
HP_Vertica_Place_blog_image

I then proposed four questions that can provide valuable information about the collection and how visitors engaged with the space:

  • Which work of art was the most popular?
  • How many people interacted with each artwork?
  • On average, how much time does a visitor spend viewing an artwork?
  • At the busiest times of day, how physically close are visitors to one another?

To find the answers to these questions I queried my database of location data and polygons representing the viewing area of each artwork.

While working through the first question, I discovered that I could use HP Vertica Place to write one query to answer my first two questions. Here’s the query I wrote:

SELECT pol_gid,
       COUNT(DISTINCT(usr_id)) count_user_visit
FROM
  (SELECT pol_gid,
          usr_id,
          COUNT(usr_id) user_points_in
   FROM
     (SELECT STV_Intersect(usr_id, g USING PARAMETERS INDEX='art_index') OVER(PARTITION BEST) AS (usr_id,
                                                                                                  pol_gid)
      FROM usr_data
      WHERE date_time BETWEEN '2014-07-02 09:30:20' AND '2014-07-02 17:05:00') AS c
   GROUP BY pol_gid,
            usr_id HAVING COUNT(usr_id) > 20) AS real_visits
GROUP BY pol_gid
ORDER BY count_user_visit DESC;
 pol_gid | count_user_visit
---------+------------------
       4 |             1663
       2 |             1558
       3 |             1483
       6 |             1272
       5 |             1261
       1 |              856
(6 rows)

From the results it’s clear that artwork number 4 is the most popular, but 2 and 3 are in close competition. However, artwork 1 isn’t drawing nearly as many visitors as we’d expect. Why? Is it not properly marketed? Or is the location a problem? The results of this query could help us address those types of questions.

Curators across the country would be thrilled to know on average how long visitors spent in the viewing area of each artwork in their collection. I used this query to calculate the average time spent viewing an artwork:

SELECT AVG(count_seconds)
FROM
  (SELECT usr_id, COUNT(*) count_seconds
   FROM usr_data
   WHERE STV_Intersect(g USING PARAMETERS index='art_index') = 4
   GROUP BY usr_id) foo;

       AVG
------------------
 328.536970933197
(1 row)

I know that the handheld device records the visitor’s location each second, so I can infer the amount of time the average person intersects with the viewing area of an artwork. From this inference, I can make an accurate estimation about the length of time each visitor viewed a work.

The fourth question relates to my most common gripe when visiting a popular exhibition or artwork: how close are people to standing to one another during the busiest time of day? For this query, I decided to look at artwork number 4 because it was the most popular in our previous query:

SELECT AVG(ST_Distance(foo.g,foo1.g))
FROM tmp foo,
     tmp foo1
WHERE foo.i!=foo1.i;
       AVG
------------------
 2.20523723570353
(1 row)

On average during the busiest time of day, visitors viewing the most popular artwork are standing 2.2 feet apart from one another! This type of insight is fascinating. What if this artwork was placed in a larger room with a wider viewing angle? Would more people be able to get a more direct view of the artwork? These questions are difficult to answer without this type of data and analysis, but think about how having this data could improve the overall visitor experience.

Imagine that you’re a museum curator and learn that your most popular artwork isn’t the Monet that your members continually rave about in their visitor surveys. Instead, it’s actually the Renoir across from the Monet that generates the most traffic. Or, you discover that the buzz of a high-profile auction drove more traffic to your three Giacometti statues than the email marketing campaign you did the month prior.

This type of data is available in many different types of business scenarios, not just museums. Location data provides valuable insights into how people interact with spaces. HP Vertica can help you discover these insights. 

Do you want to try out this example on your own? Install HP Vertica 7.1.x and the HP Vertica Place package from your my.vertica page. Then, download the dataset and accompanying SQL file from our GitHub repository.

HP Women at Grace Hopper

A couple weeks ago, the Anita Borg: Grace Hopper Conference for Women in Technology took place in Phoenix, Arizona, from October 8 -11, 2014. A Platinum Sponsor of the event, HP sent many women from across the organization to attend the conference.

HP Vertica had a significant presence at the event. Bo Hyun Kim, an HP Vertica developer, was chosen to present a technical paper she authored on Sentiment Analysis and Big Data. Shilpa Lawande, now General Manager of HP Software’s Big Data Platform Business unit, spoke at a breakfast given for technical women at HP. Seven of our own HP Vertica women attended the conference with the purpose of recruiting and screening talented candidates for the growing HP Vertica team.

Bo Hyun’s Presentation on Sentiment Analysis

In March 2014, Shilpa Lawande reached out to the technical women of HP Vertica, encouraging volunteers to submit session proposals for the conference.

Bo Hyun Kim, of the HP Vertica Management Console team, accepted the challenge. She collaborated with Lina Chen in authoring a paper called “Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree.” After several months of waiting anxiously, Bo Hyun learned that her paper was accepted!

Bo Hyun

Bo Hyun’s presentation became part of a larger Data Science in Social Media Analysis presentation. On Friday, October 10, she presented to a room packed with technical women of all ages and backgrounds.

Bo Hyun rocked—she handled the presentation with good humor, poise, and confidence. The presentation was held in one of the large ball rooms, and Bo Hyun prepared herself well before she stood in front of hundreds of bright, technical women who came from all over the world. She engaged with the audience right away by stepping down from the podium and asking them questions related to her research work, which was enhancing the sentiment analysis analytic package by HP Vertica. She had proposed to improve the performance of HP Vertica Pulse earlier this year, and the proposal was accepted. With many nights of hard work, Bo Hyun created a presentation that was both informative and educational. Bo Hyun made sure the audience members followed and understood each slide.

After the presentation ended, audience members asked her various questions about her work on sentiment analysis. Most were students, seeking advice and guidance in knowing more about sentiment analysis for their senior thesis research projects. Students also asked about the skill set required to work at a software engineering company. Bo Hyun did not have enough time to answer all their questions, so she handed out her business card so that the students could contact her later and ask as many questions as they wanted. On her way out, Bo Hyun was stopped by the Panel, who congratulated her on the talk. After the conference, Bo Hyun received many emails from the eager, enthusiastic students, who completed the presentation questionnaire, and she was happy to be able to help them further.

Bo Hyun Presentation

Vertigals

HP VertiGals at Bo Hyun’s talk.

HP Technical Women’s Breakfast

On Thursday, 10/9, conference attendees from HP were invited to a technical women’s breakfast. Shilpa Lawande, General Manager of HP Software’s Big Data Platform business unit, was the key speaker at the breakfast. Shilpa shared stories of her own journey as a woman in the male-dominated field of computer science. She spoke of the experience of being “the only woman in the room,” spanning from her undergraduate education in India, as an individual contributor at Oracle and Vertica, to her leadership roles as VP of engineering and, most recently General Manager of HP Software’s Big Data Platform business unit. Among the questions she answered for the attendees, Shilpa shared that her own personal role model is her mother, who raised her family while pursuing a career, and always met every challenge with a smile.

shilpa

Shilpa Lawande speaking at the HP Technical Women’s breakfast

Recruiting

Lina Chen, Beth Favini, Dan Huang, Bo Hyun Kim, Shilpa Lawande, Pratibha Rana, and Amy Miller were among the women from HP Vertica who staffed the HP recruiting booth located in the career fair area of the conference. The staff met countless talented young candidates eager to learn more about HP and about Vertica. The recruiting team collected literally hundreds of resumes and even held several “on-the-spot” interviews.

recruiting booth

The HP Recruiting Booth

One of the big attractions of the HP booth was the daily raffle. At the end of each of the three days of the college fair, an HP Slate was raffled off. The raffle was immensely popular, drawing visits (and repeat visits) from women across the conference.

booth 2

Crowds gathering at our daily HP Slate raffle
The HP Vertica women also scoped out the surrounding scene. Companies like Google, LinkedIn, Facebook, and Pinterest brought incredible creativity and enthusiasm to their booths. The team gathered ideas and hope work with HP recruiting to make HP recruiting booth even more eye-catching and memorable than ever.

Summary

The conference was nothing short of a success for HP Vertica attendees, who gathered a large pool of candidates, in addition to cheering on both Bo Hyun and Shilpa at their presentations. And, last but not least, everyone got to know each other a bit better, laughed a lot, and enjoyed a sense of camaraderie, teamwork, and enthusiasm at being part of HP Vertica.

Connecting HP Vertica 7.x to Tableau Desktop 8.2

Connecting HP Vertica to Tableau Desktop from Vertica Systems on Vimeo.
Have you ever wanted to visualize your HP Vertica Analytics Platform with graphs, tables, maps, or other formats? The Tableau Desktop by Tableau Software visualization tool lets you do just that in a couple steps. Use the Tableau Desktop HP Vertica specific ODBC driver connector to access your data from HP Vertica and create different views for visual analysis. Watch this video to learn how to connect HP Vertica to Tableau Desktop using both the HP Vertica specific ODBC driver connector and the generic connector.

What’s New in Dragline (7.1.0): Resource Pool Routing

Resource Pool Routing from Vertica Systems on Vimeo.

HP Vertica 7.1.x introduces new features that allow you to dynamically reroute queries to secondary resource pools.

Use the new CASCADE TO parameter when creating or altering a resource pool to indicate a secondary resource pool to which queries can cascade and execute when they exceed the RUNTIMECAP of the pool on which they started running. This way, if a query exceeds its initial pool’s RUNTIMECAP, it can cascade to a designated secondary pool with a larger RUNTIMECAP instead of causing an error.

Because grant privileges are not considered on secondary pools, you can use this functionality to designate secondary resource pools where user queries can cascade to without giving users explicit permission to run queries on that pool.

Check out this video to learn more about dynamically rerouting queries.
For more information, see the documentation.

Get Started With Vertica Today

Subscribe to Vertica