Vertica

Archive for the ‘Vertica 7’ Category

Using Location Data with HP Vertica Place

A couple of weeks ago, I went to my favorite museum in the Boston area to see a new exhibition. While I was there, I noticed a lot of visitors walking through the museum using self-guided tours on handheld devices. I knew that the museum rented such devices to visitors, but I hadn’t thought about how the museum could use these devices to gather information about visitor behavior. Because I’ve been working on HP Vertica’s geospatial functionality, HP Vertica Place, I wondered what kind of insight the museum might be able to gather from their visitors’ spatial relationship with the museum and its artwork.

HP Vertica Place leverages the performance and scale of the HP Vertica Analytics Platform and uses OGC-based SQL functions for computation on two-dimensional planar data. So, I know that HP Vertica Place can help me answer a number of questions regarding point-in-polygon, distance, and intersection to name a few.

To test my hypothesis I needed a dataset. Because I couldn’t find one publicly available, I created my own dataset using a Python script.

Instead of trying to replicate the movement of thousands of people moving through a space with more than 500,000 square feet, I took a more micro approach. Museums are normally compartmentalized into rooms, so I choose to focus on a single room of a museum. In that room, I added six different works of art for visitors to view. Visitors could view the artworks in differing order and could also view the works of art for a varying amount of time. All visitors to the room are assumed to be using a handheld device providing them additional information about the artwork. The handheld device records each visitor’s location data every second. This information is then downloaded from the device after the visitor returns it and is then uploaded to an HP Vertica database.

Using the script I created, I generated a dataset of 2,000 visitors moving through the room from 9:00 am to 6:00 pm.


This graphic shows how the room was laid out:
HP_Vertica_Place_blog_image

I then proposed four questions that can provide valuable information about the collection and how visitors engaged with the space:

  • Which work of art was the most popular?
  • How many people interacted with each artwork?
  • On average, how much time does a visitor spend viewing an artwork?
  • At the busiest times of day, how physically close are visitors to one another?

To find the answers to these questions I queried my database of location data and polygons representing the viewing area of each artwork.

While working through the first question, I discovered that I could use HP Vertica Place to write one query to answer my first two questions. Here’s the query I wrote:

SELECT pol_gid,
       COUNT(DISTINCT(usr_id)) count_user_visit
FROM
  (SELECT pol_gid,
          usr_id,
          COUNT(usr_id) user_points_in
   FROM
     (SELECT STV_Intersect(usr_id, g USING PARAMETERS INDEX='art_index') OVER(PARTITION BEST) AS (usr_id,
                                                                                                  pol_gid)
      FROM usr_data
      WHERE date_time BETWEEN '2014-07-02 09:30:20' AND '2014-07-02 17:05:00') AS c
   GROUP BY pol_gid,
            usr_id HAVING COUNT(usr_id) > 20) AS real_visits
GROUP BY pol_gid
ORDER BY count_user_visit DESC;
 pol_gid | count_user_visit
---------+------------------
       4 |             1663
       2 |             1558
       3 |             1483
       6 |             1272
       5 |             1261
       1 |              856
(6 rows)

From the results it’s clear that artwork number 4 is the most popular, but 2 and 3 are in close competition. However, artwork 1 isn’t drawing nearly as many visitors as we’d expect. Why? Is it not properly marketed? Or is the location a problem? The results of this query could help us address those types of questions.

Curators across the country would be thrilled to know on average how long visitors spent in the viewing area of each artwork in their collection. I used this query to calculate the average time spent viewing an artwork:

SELECT AVG(count_seconds)
FROM
  (SELECT usr_id, COUNT(*) count_seconds
   FROM usr_data
   WHERE STV_Intersect(g USING PARAMETERS index='art_index') = 4
   GROUP BY usr_id) foo;

       AVG
------------------
 328.536970933197
(1 row)

I know that the handheld device records the visitor’s location each second, so I can infer the amount of time the average person intersects with the viewing area of an artwork. From this inference, I can make an accurate estimation about the length of time each visitor viewed a work.

The fourth question relates to my most common gripe when visiting a popular exhibition or artwork: how close are people to standing to one another during the busiest time of day? For this query, I decided to look at artwork number 4 because it was the most popular in our previous query:

SELECT AVG(ST_Distance(foo.g,foo1.g))
FROM tmp foo,
     tmp foo1
WHERE foo.i!=foo1.i;
       AVG
------------------
 2.20523723570353
(1 row)

On average during the busiest time of day, visitors viewing the most popular artwork are standing 2.2 feet apart from one another! This type of insight is fascinating. What if this artwork was placed in a larger room with a wider viewing angle? Would more people be able to get a more direct view of the artwork? These questions are difficult to answer without this type of data and analysis, but think about how having this data could improve the overall visitor experience.

Imagine that you’re a museum curator and learn that your most popular artwork isn’t the Monet that your members continually rave about in their visitor surveys. Instead, it’s actually the Renoir across from the Monet that generates the most traffic. Or, you discover that the buzz of a high-profile auction drove more traffic to your three Giacometti statues than the email marketing campaign you did the month prior.

This type of data is available in many different types of business scenarios, not just museums. Location data provides valuable insights into how people interact with spaces. HP Vertica can help you discover these insights. 

Do you want to try out this example on your own? Install HP Vertica 7.1.x and the HP Vertica Place package from your my.vertica page. Then, download the dataset and accompanying SQL file from our GitHub repository.

Connecting HP Vertica 7.x to Tableau Desktop 8.2

Connecting HP Vertica to Tableau Desktop from Vertica Systems on Vimeo.
Have you ever wanted to visualize your HP Vertica Analytics Platform with graphs, tables, maps, or other formats? The Tableau Desktop by Tableau Software visualization tool lets you do just that in a couple steps. Use the Tableau Desktop HP Vertica specific ODBC driver connector to access your data from HP Vertica and create different views for visual analysis. Watch this video to learn how to connect HP Vertica to Tableau Desktop using both the HP Vertica specific ODBC driver connector and the generic connector.

What’s New in Dragline (7.1.0): Resource Pool Routing

Resource Pool Routing from Vertica Systems on Vimeo.

HP Vertica 7.1.x introduces new features that allow you to dynamically reroute queries to secondary resource pools.

Use the new CASCADE TO parameter when creating or altering a resource pool to indicate a secondary resource pool to which queries can cascade and execute when they exceed the RUNTIMECAP of the pool on which they started running. This way, if a query exceeds its initial pool’s RUNTIMECAP, it can cascade to a designated secondary pool with a larger RUNTIMECAP instead of causing an error.

Because grant privileges are not considered on secondary pools, you can use this functionality to designate secondary resource pools where user queries can cascade to without giving users explicit permission to run queries on that pool.

Check out this video to learn more about dynamically rerouting queries.
For more information, see the documentation.

What’s New in Dragline (7.1.0): Projections with Expressions

Projections with Expressions Video

Projections with Expressions from Vertica Systems on Vimeo.

HP Vertica 7.1 introduces the ability to use expressions with projections.

When you create a projection, you now have the option to use an expression as a column definition. For example, you can declare a column that sums two variables. When you load data into a projection with expressions, HP Vertica automatically calculates the values according to the expressions and inserts the data into that new projection.
Once you create a projection with expressions, you can query the projection directly to access your pre-calculated data and eliminate resource-intensive computations at query time.

Watch this video to learn more about using expressions with projections.

Projections with Expressions documentation.

See also:
Live Aggregate Projections
Top-K Projections

What’s New in Dragline (7.1.0): Top-K Projections

Top-K Projections Video

Top K Projections from Vertica Systems on Vimeo.

HP Vertica 7.1 introduces Top-K projections. A Top-K projection is a type of live aggregate projection that returns the top k rows from a partition.

Top-K projections are useful when you want to retrieve the top rows from a group of frequently updated, aggregated data. For example, say you want to view a list of the 5 singers who have the most votes in a competition. This value will change every time a vote comes in. But since the data is aggregated as it is loaded into your table and Top-k projection, all you have to do is grab the top 5 rows to see the top 5 singers at any moment. Using a Top-k projection is more efficient than if you had to aggregate (count all the votes) every time you wanted to find the top 5 singers.

Check out this video to learn more about Top-k projections and stayed tuned for our next video ‘Projections with Expressions’.

Top-K Documentation

What’s New in Dragline (7.1.0): Installing HP Vertica Pulse

Installing HP Vertica Pulse Video

Installing Pulse from Vertica Systems on Vimeo.

HP Vertica 7.1.0 introduces the general availability of HP Vertica Pulse, our add-on sentiment analysis package for HP Vertica. Pulse provides a suite of functions that allow you to analyze and extract the sentiment from text, directly from your HP Vertica database. For example, you can use HP Vertica Pulse to analyze sentiment from Tweets or online product reviews to get a feel for how satisfied your customers are about your products or services.

HP Vertica Pulse automatically discovers attributes included in text and scores them using a built-in system dictionary. You can tune user-dictionaries to detect certain words or phrases, to determine how words are scored, and to filter out attributes that are of no interest to you. Because of this flexibility, you can tune HP Vertica Pulse to work for your specific business needs.

Currently, HP Vertica Pulse allows you to analyze English language text only. You can download HP Vertica Pulse as an add-on package for your Enterprise Edition or as a trial for your Community Edition, from my.vertica.com. Additionally, the Innovations section of the HP Vertica Marketplace offers a beta version of Pulse for Spanish only. Take a look at this video to learn how to install Pulse and stay tuned for our next video, ‘Using Pulse’.

HP Vertica Pulse documentation.

Get Started With Vertica Today

Subscribe to Vertica