Archive for the ‘Vertica 7’ Category

Partner Interview: Logi Analytics & Haven OnDemand

Logi interview

Last week I sat down with Steven Schneider from Logi Analytics to discuss their integration with the new Haven OnDemand.

HP: What is new with the Haven OnDemand integration with Logi Analytics?

Logi Analytics: So, with IDOL and its new web-services oriented architecture as well as Vertica and its natural SQL layer, you can use Logi’s direct-connect model to not only query and access features, but also integrate that data with other sources.

For example, with Logi you can query your data using Vertica, return the data to IDOL, and execute an operation such as Natural language translation ECT… Then, take the result, and create a visualization. It’s really neat because now you can provide a unified visualization tier on top of both Vertica and IDOL, allowing people to interact with them in a business use-case type environment.

That’s really cool, especially because getting Vertica and IDOl data to work with one another is so incredibly powerful.

Logi Analytics: And that’s kind of unique to us which is why I get excited about it, you know? There are a lot of software BI tools that can talk to SQL and there are some application web environments that can talk to web services. However, to do both of those together and pull them into one unified interface, (and by unified I’m not just talking about two panels, but actually taking the results of one and feeding it into another) enables you to discover completely unique metrics. It’s pretty cool. Check out this video to see a great example of how you can use Logi to come up with a metric to predict churn based on data contained in both IDOL and Vertica independently. It’s pretty interesting and unique.

HP: How does Logi take advantage of Vertica and IDOL specifically?

Logi Analytics: We wanted to let Vertica do what it’s designed to do. Not only does Logi treat it Vertica as a data source, but when the end user wants to do an aggregation or any kind of operation, it’s actually passing those operations back to Vertica. Other BI solutions will say “sure we work with Vertica”, but they’re actually just pulling the data from Vertica and putting it into their own repository to work with it there. That kind of defeats the purpose of Vertica’s speed and high performance analytics doesn’t it? you can’t actually execute a sequence of commands with IDOL or Vertica, you can only load from them initially. From then on, you’re forced to execute commands locally. It’s pretty limiting.

HP: With these new services (IDOL OnDemand, Vertica OnDemand) being available to a wider audience than ever before, what kind of new customers are you looking forward to working with?

Logi Analytics: So there are two kind of avenues. First, there are startups and SMBs in the software industry that want to embed IDOL and/ or Vertica as part of the apps they’re creating. They are trying to get the apps to market quickly and provide proof concepts. That’s a big market for us as we have over 600 OEM partners. These companies may not have had the resources to connect with HP Vertica or Idol in the past, but HAVEn makes it more accessible.

Then there are medium sized companies that maybe aren’t enterprise-sized in their infrastructure, but are still quite complex and have a lot of data they can use – for instance, healthcare and insurance companies.

Regardless of any company’s size Haven makes it easy to create proof concepts and stand up apps without going through the classic corporate IT infrastructure deployment process. In development you don’t need that. They can stand up and they can develop, concurrently with the corporate IT process instead of sequentially.

HP: What’s the coolest use case you’ve encountered with Logi and Haven?

Logi Analytics: I would say Glassbeam, a machine data analytics company, is an interesting one. There’s a great video online that breaks it down. Glassbeam collects unstructured data from machines and converts into structured data based on its own proprietary language, giving customers insight into the performance of their products. They use Logi as their analytics platform and Vertica as their database, allowing them to pass back aggregation and filters to Vertica taking advantage of that deep integration we discussed earlier. Since their implementation, Glassbeam has been able to scale a lot faster.

HP: What are you most excited about for the future of Logi and Vertica?

Logi Analytics: We would love to extend our relationship with Haven OnDemand beyond Logi Info to Logi Vision, our visual analytics slice-and-dice product for business users. We’d also love to expand our role to facilitating the acquisition of data from places like Social media, and analyze it from there, opening up our joint use case to many more organizations.


Using Location Data with HP Vertica Place

A couple of weeks ago, I went to my favorite museum in the Boston area to see a new exhibition. While I was there, I noticed a lot of visitors walking through the museum using self-guided tours on handheld devices. I knew that the museum rented such devices to visitors, but I hadn’t thought about how the museum could use these devices to gather information about visitor behavior. Because I’ve been working on HP Vertica’s geospatial functionality, HP Vertica Place, I wondered what kind of insight the museum might be able to gather from their visitors’ spatial relationship with the museum and its artwork.

HP Vertica Place leverages the performance and scale of the HP Vertica Analytics Platform and uses OGC-based SQL functions for computation on two-dimensional planar data. So, I know that HP Vertica Place can help me answer a number of questions regarding point-in-polygon, distance, and intersection to name a few.

To test my hypothesis I needed a dataset. Because I couldn’t find one publicly available, I created my own dataset using a Python script.

Instead of trying to replicate the movement of thousands of people moving through a space with more than 500,000 square feet, I took a more micro approach. Museums are normally compartmentalized into rooms, so I choose to focus on a single room of a museum. In that room, I added six different works of art for visitors to view. Visitors could view the artworks in differing order and could also view the works of art for a varying amount of time. All visitors to the room are assumed to be using a handheld device providing them additional information about the artwork. The handheld device records each visitor’s location data every second. This information is then downloaded from the device after the visitor returns it and is then uploaded to an HP Vertica database.

Using the script I created, I generated a dataset of 2,000 visitors moving through the room from 9:00 am to 6:00 pm.

This graphic shows how the room was laid out:

I then proposed four questions that can provide valuable information about the collection and how visitors engaged with the space:

  • Which work of art was the most popular?
  • How many people interacted with each artwork?
  • On average, how much time does a visitor spend viewing an artwork?
  • At the busiest times of day, how physically close are visitors to one another?

To find the answers to these questions I queried my database of location data and polygons representing the viewing area of each artwork.

While working through the first question, I discovered that I could use HP Vertica Place to write one query to answer my first two questions. Here’s the query I wrote:

SELECT pol_gid,
       COUNT(DISTINCT(usr_id)) count_user_visit
  (SELECT pol_gid,
          COUNT(usr_id) user_points_in
     (SELECT STV_Intersect(usr_id, g USING PARAMETERS INDEX='art_index') OVER(PARTITION BEST) AS (usr_id,
      FROM usr_data
      WHERE date_time BETWEEN '2014-07-02 09:30:20' AND '2014-07-02 17:05:00') AS c
   GROUP BY pol_gid,
            usr_id HAVING COUNT(usr_id) > 20) AS real_visits
GROUP BY pol_gid
ORDER BY count_user_visit DESC;
 pol_gid | count_user_visit
       4 |             1663
       2 |             1558
       3 |             1483
       6 |             1272
       5 |             1261
       1 |              856
(6 rows)

From the results it’s clear that artwork number 4 is the most popular, but 2 and 3 are in close competition. However, artwork 1 isn’t drawing nearly as many visitors as we’d expect. Why? Is it not properly marketed? Or is the location a problem? The results of this query could help us address those types of questions.

Curators across the country would be thrilled to know on average how long visitors spent in the viewing area of each artwork in their collection. I used this query to calculate the average time spent viewing an artwork:

SELECT AVG(count_seconds)
  (SELECT usr_id, COUNT(*) count_seconds
   FROM usr_data
   WHERE STV_Intersect(g USING PARAMETERS index='art_index') = 4
   GROUP BY usr_id) foo;

(1 row)

I know that the handheld device records the visitor’s location each second, so I can infer the amount of time the average person intersects with the viewing area of an artwork. From this inference, I can make an accurate estimation about the length of time each visitor viewed a work.

The fourth question relates to my most common gripe when visiting a popular exhibition or artwork: how close are people to standing to one another during the busiest time of day? For this query, I decided to look at artwork number 4 because it was the most popular in our previous query:

SELECT AVG(ST_Distance(foo.g,foo1.g))
FROM tmp foo,
     tmp foo1
WHERE foo.i!=foo1.i;
(1 row)

On average during the busiest time of day, visitors viewing the most popular artwork are standing 2.2 feet apart from one another! This type of insight is fascinating. What if this artwork was placed in a larger room with a wider viewing angle? Would more people be able to get a more direct view of the artwork? These questions are difficult to answer without this type of data and analysis, but think about how having this data could improve the overall visitor experience.

Imagine that you’re a museum curator and learn that your most popular artwork isn’t the Monet that your members continually rave about in their visitor surveys. Instead, it’s actually the Renoir across from the Monet that generates the most traffic. Or, you discover that the buzz of a high-profile auction drove more traffic to your three Giacometti statues than the email marketing campaign you did the month prior.

This type of data is available in many different types of business scenarios, not just museums. Location data provides valuable insights into how people interact with spaces. HP Vertica can help you discover these insights. 

Do you want to try out this example on your own? Install HP Vertica 7.1.x and the HP Vertica Place package from your my.vertica page. Then, download the dataset and accompanying SQL file from our GitHub repository.

Connecting HP Vertica 7.x to Tableau Desktop 8.2

Connecting HP Vertica to Tableau Desktop from Vertica Systems on Vimeo.
Have you ever wanted to visualize your HP Vertica Analytics Platform with graphs, tables, maps, or other formats? The Tableau Desktop by Tableau Software visualization tool lets you do just that in a couple steps. Use the Tableau Desktop HP Vertica specific ODBC driver connector to access your data from HP Vertica and create different views for visual analysis. Watch this video to learn how to connect HP Vertica to Tableau Desktop using both the HP Vertica specific ODBC driver connector and the generic connector.

What’s New in Dragline (7.1.0): Resource Pool Routing

Resource Pool Routing from Vertica Systems on Vimeo.

HP Vertica 7.1.x introduces new features that allow you to dynamically reroute queries to secondary resource pools.

Use the new CASCADE TO parameter when creating or altering a resource pool to indicate a secondary resource pool to which queries can cascade and execute when they exceed the RUNTIMECAP of the pool on which they started running. This way, if a query exceeds its initial pool’s RUNTIMECAP, it can cascade to a designated secondary pool with a larger RUNTIMECAP instead of causing an error.

Because grant privileges are not considered on secondary pools, you can use this functionality to designate secondary resource pools where user queries can cascade to without giving users explicit permission to run queries on that pool.

Check out this video to learn more about dynamically rerouting queries.
For more information, see the documentation.

What’s New in Dragline (7.1.0): Projections with Expressions

Projections with Expressions Video

Projections with Expressions from Vertica Systems on Vimeo.

HP Vertica 7.1 introduces the ability to use expressions with projections.

When you create a projection, you now have the option to use an expression as a column definition. For example, you can declare a column that sums two variables. When you load data into a projection with expressions, HP Vertica automatically calculates the values according to the expressions and inserts the data into that new projection.
Once you create a projection with expressions, you can query the projection directly to access your pre-calculated data and eliminate resource-intensive computations at query time.

Watch this video to learn more about using expressions with projections.

Projections with Expressions documentation.

See also:
Live Aggregate Projections
Top-K Projections

What’s New in Dragline (7.1.0): Top-K Projections

Top-K Projections Video

Top K Projections from Vertica Systems on Vimeo.

HP Vertica 7.1 introduces Top-K projections. A Top-K projection is a type of live aggregate projection that returns the top k rows from a partition.

Top-K projections are useful when you want to retrieve the top rows from a group of frequently updated, aggregated data. For example, say you want to view a list of the 5 singers who have the most votes in a competition. This value will change every time a vote comes in. But since the data is aggregated as it is loaded into your table and Top-k projection, all you have to do is grab the top 5 rows to see the top 5 singers at any moment. Using a Top-k projection is more efficient than if you had to aggregate (count all the votes) every time you wanted to find the top 5 singers.

Check out this video to learn more about Top-k projections and stayed tuned for our next video ‘Projections with Expressions’.

Top-K Documentation

What’s New in Dragline (7.1.0): Installing HP Vertica Pulse

Installing HP Vertica Pulse Video

Installing Pulse from Vertica Systems on Vimeo.

HP Vertica 7.1.0 introduces the general availability of HP Vertica Pulse, our add-on sentiment analysis package for HP Vertica. Pulse provides a suite of functions that allow you to analyze and extract the sentiment from text, directly from your HP Vertica database. For example, you can use HP Vertica Pulse to analyze sentiment from Tweets or online product reviews to get a feel for how satisfied your customers are about your products or services.

HP Vertica Pulse automatically discovers attributes included in text and scores them using a built-in system dictionary. You can tune user-dictionaries to detect certain words or phrases, to determine how words are scored, and to filter out attributes that are of no interest to you. Because of this flexibility, you can tune HP Vertica Pulse to work for your specific business needs.

Currently, HP Vertica Pulse allows you to analyze English language text only. You can download HP Vertica Pulse as an add-on package for your Enterprise Edition or as a trial for your Community Edition, from Additionally, the Innovations section of the HP Vertica Marketplace offers a beta version of Pulse for Spanish only. Take a look at this video to learn how to install Pulse and stay tuned for our next video, ‘Using Pulse’.

HP Vertica Pulse documentation.

Get Started With Vertica Today

Subscribe to Vertica