Vertica

Author Archive

Whose Side Are You On? Using HP Vertica Pulse with College Basketball

Want to know what people across the nation are saying about your college basketball team? Want to know when and where those opinions change? You can do this and more with HP Vertica Pulse.

Visit our new blog to see how HP data scientist Manolo Garcia-Solaco used HP Vertica Pulse to analyze the sentiment of tweets when the Wisconsin Badgers faced the Duke Blue Devils in the finals of the NCAA basketball championship.

The HP Vertica Community has Moved!

Visit our new community site: https://community.dev.hp.com/.

The HP Vertica online community now has a new home. We’ve joined the Big Data and Analytics Community, part of the HP Developer Community, located at https://community.dev.hp.com/.

Why are we doing this?

We’ve joined the new community so that you’ll have a centralized place to go for all your big data questions and answers. Using the Big Data and Analytics Community, you can:

  • Connect with customers across all our Big Data offerings, including HP Vertica Enterprise and Community Editions, HP Vertica OnDemand, HP IDOL , and HP IDOL OnDemand.
  • Learn more about HP Haven, the HP Big Data Platform that allows you to harness 100% of your data, including business, machine, and human-generated data.
  • Browse blogs, forum posts, and best practices documents for all big data topics.

In short, the Big Data and Analytics Community provides you with one-stop shopping for product information, guidance on best practices, and solutions to technical problems.

What about the old content?

To preserve the rich exchange of knowledge in our previous community and forum, we migrated all of the content from our old forum to our new Big Data and Analytics location. All your questions and answers are accessible on the new forum.

Want a preview?

Here’s what the new community looks like:

annotatedscreenshot

 

 

 

 

 

 

 

 

 

 

 

 

 

We look forward to greeting you in our new space!

Index Data into Your HP Vertica Database with the New IDOL CFS Vertica Module

HP is pleased to announce the new IDOL CFS Vertica Module. The CFS Vertica module allows the HP Connector Framework Server (CFS) to index into an HP Vertica database.

The new indexing capability makes real integration between HP IDOL and HP Vertica possible, allowing you to use Vertica to perform analytics on data that has been indexed by IDOL. The CFS Vertica Module is compatible only with IDOL 10.9 and later and version 7.1.x and later of the Vertica server. In this blog, we’ll give you a high-level overview of how the new integration works by walking you through a simple example, described below.

Scenario:
Your organization has a large repository of documents, written by many different authors. You want to find the length of documents written by each individual author.

Using IDOL CFS with the HP Vertica Indexer
The power of IDOL allows CFS to process data it retrieves from connectors and index the information into HP Vertica. The process of getting data from a repository into HP Vertica can be broken down into the following five steps:

  1. Connectors scan files from repositories and send documents to CFS
  2. CFS performs pre-import tasks (optional)
  3. CFS uses KeyView to filter document content and extract sub-files
  4. CFS performs post-import tasks (optional)
  5. CFS indexes data into existing HP Vertica flex tables

idolConnectorImage2

 

Step 1: Connectors
IDOL provides many different connectors through which you can access data from difference sources. For example, IDOL has a SharePoint connector, a social media connector, and an Exchange connector. The connectors scan and send files to CFS, where they are processed. By default, the files sent to CFS contain only metadata extracted from the repository. The files contain both the metadata and the file content only AFTER the KeyView filtering step (step 3). As discussed later, you can configure this process with pre-import and post-import tasks.

Step 2: Pre-import tasks
You can also choose to run optional pre-import tasks on the metadata contained in the files before KeyView filtering takes place. In IDOL, import tasks help you manipulate incoming data from a repository to better suit your needs. For example, you can run a facial recognition import task. You can also run post-import tasks on the files after the KeyView filtering step, when the files contain both metadata and content (see step 4).

Step 3: KeyView
You might be wondering, what exactly is the KeyView step? In a nutshell, KeyView filters and extracts elements from the files and records you are retrieving. You can also use it to customize imports. For example, you can run a pre-import task that adds the AUTN_NO_FILTER field to the document. The AUTN_NO_FILTER field specifies that you do not want to extract document content. Because we set this field, during the KeyView step, CFS knows not to extract all the document content. This is the case for our example; to get the file size and author information we want, we need only the metadata associated with the documents. The metadata that it does extract is what will ultimately end up in our HP Vertica database.

Steps 4 and 5: Post-import tasks and Indexing
After CFS has processed the document and performed any post-import tasks (step 4), it automatically indexes the document(step 5). By default, CFS indexes the document into the index or indexes (separated by commas) specified by the IndexerSections parameter in the [Indexing] section of its configuration file. CFS can index into IDOL Server, IDOL OnDemand, and now, a Vertica database.

To have CFS index your information into Vertica, open the CFS configuration file and use the IndexerSections parameter to specify the section containing the indexing setting, as shown here:

indexing

Then, create a new section with the same name that you specified in the IndexerSections parameter:

verticaIndexer

Save and close the configuration file.

The Vertica indexer is part of the CFS product. However, to use the Vertica indexer, you must have the Vertica ODBC drivers installed and configured on the same machine as CFS. This is necessary because CFS uses the ODBC connection to send JSON-formatted data to the existing HP Vertica flex table.

Creating HP Vertica Flex Tables
Since metadata is variable, you must have a destination that can handle variable data. HP flex tables (short for flexible tables) are tables designed especially for loading and querying semi-structured data into your HP Vertica database, which makes them a perfect fit for use with IDOL CFS. Note that the flex table must already exist for CFS to insert the data into it. In our example, we’ve previously created a flex table called myFlexTable (see it listed under TableName in the Vertica indexer code example above). When we created the flex table, we included column definitions for data we want to retrieve, along with CFS data that is inserted automatically:

createFlexTable

We also created a projection to make sure we view only the latest record for any given document:

createProjection

For more information about HP Vertica flex tables, see the documentation here.

When CFS indexes data to an HP Vertica flex table, it issues a COPY command using ODBC with the JSON formatted data:

copy

Our JSON data might look like this:

jsonFile

Here’s where we can see the length of documents written by different authors. The file contains our expected metadata, like author and file size, but we also see some automatically-inserted data like DREREFERENCE, VERTICA_INDEXER_TIMESTAMP, and VERTICA_INDEXER_DELETED. DREREFERENCE is a unique document id used by IDOL. VERTICA_INDEXER_TIMESTAMP is a timestamp inserted by CFS in the JSON record sent to HP Vertica, which represents the time at which the information was indexed for Vertica. The timestamp is used to distinguish and sort different versions or changes of the JSON record. The VERTICA_INDEXER_DELETED field is a Boolean value that, if true, denotes that the document was deleted from the source repository. You can use this field to filter out deleted documents.

Accessing Your Data in HP Vertica
Now that the data is in HP Vertica, we can access it as usual. To view the data in Vertica, query the projection we created earlier:

selectStatement

Use the mapToString() function (with the __raw__ column of flexProjection) to inspect its contents in readable JSON text format. Notice that with this statement, we can see all the metadata that was extracted, even though we didn’t view it in our projection:

map_to_string1

map_to_string2

Using the new CFS Vertica module, you open up new possibilities for your data. You now have the ability to use all of the powerful IDOL features and integrate your data with HP Vertica for analysis. Stay tuned for more blogs about this new integration.

Learn more:
To read more about HP Vertica flex tables, see the flex table documentation.
If you are an IDOL customer, read more about IDOL CFS (password required).
See this post on our new community!

HP Vertica Best Practices: Resource Management

In a perfect world, every query you ever run would receive full attention from all system resources. And in a single user environment, when you are only running one query, this is in fact the case; the system can devote all its resources (CPU, memory, disk I/O, etc.) to your one query. But chances are, you are running more than one query at a time, maybe even more than you can count.

This means that queries have to share resources when they run. Since the performance of a given query depends on how many resources it has been allocated, it’s easy to see how things can get jammed up. Luckily for you, HP Vertica has a resource management feature that can help you allocate resources in a manner that suits your business needs.

Watch this video for an overview of HP Vertica resource management and learn how it can help you optimize your HP Vertica database for use with third-party tools.

For more information about HP Vertica resource management, visit our documentation .

Check out our other video about the resource pool cascade feature, which can further help you with resource management.

HP Vertica Best Practices: Native Connection Load Balancing

You may be aware that each client connection to a host in your HP Vertica cluster requires a small overhead in memory and processor time. For a single connection, this impact is minimal, almost unnoticeable. Now imagine you have many clients all connecting to the same host at the same time. In this situation, the compounded overhead can potentially affect database performance.

To limit the database performance consequences caused by multiple client connections, you might manually assign certain client connections to certain hosts. But this can become tedious and difficult as more and more client connections are added. Luckily, HP Vertica offers a feature that can do all this for you. It’s called native connection load balancing.

Native connection load balancing is available in HP Vertica 7.0 and later releases. It is a feature built into both the server and the client libraries that helps spread the CPU and memory overhead caused by client connections across the hosts in the database. When you enable native load balancing on the server and client, you won’t have to manually assign clients to specific hosts to reduce overhead.

Watch this best practices video to learn more about HP Vertica native connection load balancing and how to enable and disable it on the server and client.

For more information, see Native Connection Load Balancing in our documentation.

HP Vertica Gives Back this Holiday Season

EastEndHouseThanks

This holiday season, four teams of HP Vertica employees and families made a trip to East End House in Cambridge, MA to help with the annual Thanksgiving Basket Giveaway. If this organization sounds familiar, you might have read our blog about our summer interns visiting the same location to work with students to build bridges made of toothpicks and gumdrops.

This time around, Vertica volunteers assisted with a program that provided food to individuals and families for Thanksgiving. On Monday, the team helped stuff hundreds of bags with donated goods like whole frozen turkeys, boxed stuffing, canned fruits and vegetables, potatoes, and even fresh kale. They bagged over 22 thousand pounds of fresh produce! All of these items were generously donated by individuals and companies. The following day, more Vertica volunteers helped distribute the (now overflowing) bags to over 1,200 families to enjoy this Thanksgiving.

The HP Vertica volunteers are pleased to know they contributed. In the words of Tim Severyn, East End House’s Director of Community Programs, “we couldn’t have done it without you.”

East Cambridge is thankful to have a community center that provides such a great service to local families and HP Vertica looks forward to working with it in the future!

Learn more about East End House and how you can give back to the community here: http://www.eastendhouse.org/get-involved

HP Vertica and Qlik Team Up to Hack for a Cause

On November 17 2014, HP Vertica took part in the annual Qlik Hackathon in Orlando, Florida. The hackathon, which took place the day before the Qlik World Conference, brought together engineers, developers, and Qlik experts from all over the world. The challenge was simple: leverage features from Qlik and its partners to create an application to aid the United Nations in their commitment to maintain international peace and security.

QlikHack1Qlik Hackathon: Bringing people together

Qlik and HP Vertica: The Perfect Fit
According to a Senior Political Affairs Officer at the United Nations Department of Political Affairs:

Internet information streams offer an instant snapshot, in real time, of the state of international affairs worldwide at any given moment. The patterns of nations’ Internet interaction with one another and their on-line articulation of emerging international issues frequently reveal diplomatic nuances and changing priorities. The synthesis and channeling of this information produces a powerful analytical tool and briefing resource, focusing not merely on content but also how the information is packaged and communicated – and to whom.

From the officer’s comment, we can see how companies like Qlik and HP Vertica are in a unique position to aid organizations like the United Nations in their goals.

Qlik and HP Vertica are a powerful combination because of Qlik’s ‘Direct Discovery’ feature, which is included in both the QlikView and Sense products. This feature enables business users to conduct associative analysis on large data sets that come directly from external big data sources, like HP Vertica. Users then have the ability to combine that big data with data stored in memory for rapid, intuitive visual analysis.

With the HP Vertica Analytics Platform, combined with the data visualization powers of Qlik products, the United Nations can take advantage of these massive internet information streams and apply the findings to important initiatives. The Qlik Hackathon provided a space for developers from all over the world to do just that.

Pre-Hackathon
Like all successful events, the Qlik Hackathon required preparation and collaboration. For weeks before the actual event, HP Vertica employees and partners were hard at work preparing.

HP Vertica employee Myles Collins worked with representatives from the United Nations, Qlik, Attivio, and Twitter to define goals and organize datasets for the hackers to use. The setup included accessing online data from various websites and Twitter feeds, retrieved by Attivio and Twitter, respectively. That data was then loaded into the HP Vertica Analytics Platform, which Myles installed on ‘on-metal’ servers provided by Rackspace.

A couple weeks before the Hackathon, partners met up for a test hackathon, to make sure everything ran smoothly. Once all the kinks were worked out, the hackathon was ready to begin.

The Hackathon Challenge
The morning of the hackathon, twenty teams of two, armed with resources from Qlik, HP Vertica, Attivio, Twitter, and Rackspace, geared up to create a web-based data visualization application using the Qlik Sense mashup API. This application would ultimately help the United Nations gather and explore data to help with their mission.

QlickHack2The hackers only had 8 hours to finish a challenge

The teams worked for eight hours straight to complete one of two challenges. For the first challenge, hackers needed to develop a diplomatic monitoring and analytics tool to monitor online data from official government sources. For example, teams could use the tool to analyze how official bodies of government in different countries respond to certain topics.
The second challenge asked participants to create similar visuals, but for public sentiment regarding UN initiatives. For example, teams could use this tool to gauge public opinion on how the UN has handled a certain situation.

QlikHack3Hackers hard at work

The applications produced during the hackathon are invaluable. They can help United Nations officers gauge sentiment from different bodies of people from different countries, which can lead to appropriate crises prevention or reaction initiatives.

Aftermath
In the aftermath of the hackathon, while the hackers rested, the partners were still hard at work judging the competition. The judges took many criterion into consideration, the foremost being: Does this address a United Nations need? The judges also based their decisions on technical criteria, innovation, ease of use, ability to leverage Qlik Sense’s Mashup API, and the “wow” factor. In the end, only three could win.

The winning teams, along with their Qlik Sense Mashups are shown here:

QlikHack4Third Place: Team 5

QlikHack5Second Place: Team 10

QlikHack6First Place: Team 9

HP supplied prizes to the winners, including HP 10 Plus Tablets, HP Chromebooks, and Beats Solo HP on-ear headphones.

The entire process was a wild success, not just for HP Vertica, but for the United Nations as well. Through the Qlik Hackathon, hackers not only got to experience the power of HP Vertica, but were able to apply their projects to an organization that strives for international peace and better standards of living.

Until next year, keep hacking!

Get Started With Vertica Today

Subscribe to Vertica