Vertica

Archive for February, 2014

Enter the Flex Zone – Modernizing the Enterprise Data Warehouse

I’ve had the privilege of attending the Data Warehouse Institute’s (TDWI) conference this week. The Las Vegas show is usually one of their biggest gatherings. This year, there were about 600 of us gathered together to talk about the latest and greatest in the data warehouse and business intelligence world. HP Vertica was a sponsor.
The latest buzz was around many of the new data discovery tools that were announced by some vendors. Vendors recognize that there is a significant amount of undiscovered data in most businesses. As data warehouse teams go merrily along delivering daily analytics, piles and piles of dark data builds within that might have value. To innovate, users are recognizing that some of this unexplored data could be quite valuable, and it’s spurring on the development of a new breed of data discovery tools that enable users to develop new views of structured, semi-structured, and unstructured data.

Of course, this is the very reason that we have developed HP Vertica Flex Zone. The ability to ingest semi-structured data and use current visualization tools are one of the key tenets of HP Vertica Flex Zone. With HP Vertica Flex Zone, you can leverage your existing business intelligence (BI) and visualization tools to visually explore and draw conclusions from data patterns across a full spectrum of structured and semi-structured data. Analysts, data scientists, and business users can now explore and visualize information without burdening or waiting for your IT organizations to use lengthy and costly ETL tools and processes typical with legacy databases and data warehouses.
Most agreed that special data discovery tools should converge with standard analytical platforms in the coming months. Discovery should be as much a part of your business as daily analytics.

There were some first-rate executive sessions led by Fern Halper and Philip Russom, who talked about the transformation of analytics over the years. Analytics has become more mainstream, more understood by the masses of business users. Therefore innovation comes when we can deliver business intelligence for this new generation of information consumers.

The panel discussions and sessions focused very much on business value and put forth a call-to-action for some. Innovate. Feed the business users needs for information that will help drive revenue, improve efficiency, and achieve compliance with regulations. It was clear that data warehouse must be modernized of data warehouse (and that is happening today). Data warehouse pros aren’t satisfied with daily static analytics that they delivered in the past. They are looking for new data sources, including big data, and new-age data analytic platforms to help achieve their business goals.

Get started modernizing your enterprise data warehouse – evaluate HP Vertica 7 today.

Enter the Flex Zone – Flex Zone Demo

In December 2013, we introduced HP Vertica Flex Zone with the HP Vertica 7 “Crane” release. HP Vertica Flex Zone gives you the power to quickly and easily load, explore, analyze, and monetize emerging and rapidly growing forms of structured and semi-structured data, such as social media, sensor, log files, and machine data. You can use your favorite industry-standard business intelligence (BI) and visualization tools to explore Flex Zone data in HP Vertica without creating schemas upfront.

We created the HP Vertica Flex Zone demo video based on a common real life scenario of an HP Vertica Flex Zone customer. It showcases how you can tackle the typical challenges dealing with semi-structured and structured data coming from disparate sources, be able to work with the data in an uncomplicated SQL environment, and most importantly, get value out of all of your data.

Check out the HP Vertica Flex Zone demo video here:

Enter the Flex Zone – Under the Hood

With HP Vertica’s latest release (HP Vertica 7 “Crane”), we introduced HP Vertica Flex Zone, based on the patent-pending flex tables technology, which dynamically adapt to whatever schema is present in the data. Flex tables offer dramatic usability improvements over regular tables. In this post, we take a look under the hood and show how flex tables are similar to regular Vertica tables, with a little pinch of magic thrown in.

The Tour Starts with Storage
Let’s take a peek at a flexible table:

    => create flex table tweets();
    CREATE TABLE
    => \d tweets

Schema
Table
Column
Type
Size
Default
Not Null
Primary Key
Foreign Key
public
tweets
__identity__
int
8
t
f
public
tweets
__raw__
long varbinary(130000)
130000
t
f

(2 rows)

At the heart of the implementation is the simple idea that we will store an entire record in __raw__, a single wide varbinary column. Any record provided, no matter the schema, can be stored – assuming that the column is wide enough to hold the whole record. From the example above, you can see that the default max size is around 128K. We’ve found that this size is sufficient for most use cases, but you can stretch it up to 32 MB if necessary. For performance reasons, Vertica includes the __identity__ column as the column on which to sort and segment the data.

So what does the __raw__ column actually contain? Let’s take a peek:

    => copy tweets from ‘tweets.json’ parser fjsonparser();
     Rows Loaded
    ——————————————–
     542
    (1 row)

    => select __raw__ from tweets limit 1;
    __raw__
    ———————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————-
    \001\000\000\000H\000\000\000\004\000\000\000\024\000\000\000&\000\000\0008\000\000\000@\000\000\0002327342449993973762327342449993973763829120138291201\004\000\000\000\024\000\000\000$\000\000\0008\000\000\000M\000\000\000delete.status.iddelete.status.id_strdelete.status.user_iddelete.status.user_id_str
    (1 row)

First, notice that this isn’t the JSON value that we supplied as input. When you load a flex table, the parser reads each input record, parses it into a collection of key/value pairs, and writes it into an internal map format. Assuming you query the data at least once, it is worth parsing upfront to avoid re-parsing with each query. Vertica also does a little in-record indexing to make finding field values efficient. Additionally, the storage is now agnostic to the input format, freeing you from having to specify the input format upfront or locking the table to a specific format. Finally, Vertica’s storage engine has more flexibility with compression algorithms for storing the data efficiently on disk.

It’s a Kind of Magic
So why can I run “select text from tweets” despite the “text” column not existing in the table? Looking at the query plan shows us what we would expect, namely we are reading the __raw__ column:

    => EXPLAIN SELECT text FROM tweets;

    Access Path:
    +-STORAGE ACCESS for tweets [Cost: 21K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
    | Projection: public.tweets_super
    | Materialize: tweets.__raw__
    | Execute on: All Nodes

If you dig into the “GraphViz” (see http://www.graphviz.org/) query plan, you will discover that the actual query that is running is:

     SELECT MapLookup(__raw__,’text’) FROM tweets;

Flex tables have a special parse rule that rewrites unknown column references into calls to the MapLookup function. The MapLookup function takes a key and returns the value, or null if the key does not exist in the map:

    
   => SELECT maplookup(__raw__,’id’) FROM tweets limit 1;
    maplookup
    ——————————————————————————————————————————————————————————————-
    257000038482460672
    (1 row)    

   => SELECT maplookup(__raw__,’nonexistentkey’) FROM tweets limit 1;
    maplookup
    ——————————————————————————————————————————————————————————————————————————————

    (1 row)

This little piece of magic is what lets you write vanilla SQL queries against data with unknown or varying schema.

Maps to Treasure

Now that you know about the __raw__ column, we can play with a number of the map functions that come with flex tables. Here are a couple of examples:]

Getting JSON back:

    => SELECT maptostring(__raw__) FROM tweets limit 1;
    maptostring
    ———————————————————————————————————————————————————————————————
    {
    “delete.status.id” : “230242346078457856”,
    “delete.status.id_str” : “230242346078457856”,
    “delete.status.user_id” : “524683721”,
    “delete.status.user_id_str” : “524683721”
    }

Shredding a record into rows of key value pairs:

    => SELECT mapitems(__raw__) OVER (partition auto) FROM (SELECT__raw__ FROM tweets limit     1) a;

keys
values
delete.status.id
232734244999397376
delete.status.id_str
232734244999397376
delete.status.user_id
38291201
delete.status.user_id_str
38291201

(4 rows)

Finding keys of all records, similar to computing contents of the keys table:

    
    => SELECT distinct key FROM (select mapkeys(__raw__) OVER (partition auto) as
    key from tweets ) a ORDER BY key;
    ————————————————————————————————————————————-
    contributors
    coordinates
    coordinates.coordinates
    coordinates.type
    created_at
    delete.status.id
    delete.status.id_str
    delete.status.user_id
    …

Checking how close you are to overflowing the record size:

    => SELECT max(length(__raw__)) FROM tweets;
    max
    ——————————————————————————————————————
    8675
    (1 row)

Isn’t that a row store?

By storing the whole record in a single column, Vertica must read the entire record even if the query needs only a subset of the keys – the definition of a row-store database. Vertica’s flex tables perform decently well due to their solid executor and distributed execution but Vertica’s columnar tables are orders of magnitude faster. As described in a previous post, accelerating flex table performance is done by promoting important columns to be real columns in the table. These hybrid tables have the best of both worlds: flexible schema due to a __raw__ column and columnar access to the most important data. You can see this for yourself by creating a hybrid flex table:

    bvandiver=> create flex table logs (time timestamp, sessionid varchar, component     varchar);
    CREATE TABLE
    bvandiver=> \d logs

Schema
Table
Column
Type
Size
Default
Not Null
Primary Key
Foreign Key
public
logs
__raw__
long varbinary(130000)
130000
t f
public
logs
"time"
timestamp
8
f f
public
logs
sessionid
varchar(80)
80
f f
public
logs
component
varchar(80)
80
f f

(4 rows)

Conclusion

Hopefully this whirlwind tour through the back end of flextables has given you a better understanding of how it all works and the performance tradeoffs we made. As usual we encourage you to download our Community Edition and give it a try!

Social Media Insights with HP Vertica Pulse

Get insight into what your customers are saying about your products and services with HP Vertica Pulse, one of the new innovations available on the recently announced HP Vertica Marketplace.

HP Vertica Pulse allows you to analyze Tweets automatically, scalably, and in-database so that you can:

  • understand the top-of-mind topics for your community and the sentiment associated with those topics.
  • combine results with other business data already stored in Vertica
  • identify advocates and detractors
  • analyze how sentiment changes over time
  • view high-level aggregate results as well as low-level comments
  • customize results for your domain.

 

Below are some visual examples of how HP Vertica Pulse can help you to make meaning of your data.  In this case, we used our free Social Media Connector on Github to collect and automatically load Tweets about some NFL teams over one weekend in November.  After running Pulse on the data, we used Tableau to visualize.  The visual on the left gives an aggregate view of the most popular topics.  Although we collected Tweets on football teams, a number of other topics appear, like “Tom Brady”, “Cam Newton”, and “call”.  The size of the circle reflects the volume of the tweets, while the color indicates sentiment (green for positive and red for negative).  As it turns out, the Patriots, 49ers, Jets, and Chiefs all lost that weekend.

Bubbles

The next visual shows an excerpt of a drill-down into one of the compelling results, namely the negative sentiment on ‘call’. This extract shows that there was a controversial call in the New England-Carolina game.

Bubble list

By using Pulse, and without actually watching the games, we were able to get an understanding of the most talked-about topics.  We could drill-down into the most compelling results to get an understanding of why the sentiment of the topic is negative, and take action if needed.  In the case of products and services, you could similarly see the aspects that mattered most to your community.  Negative sentiment could highlight a problem that needs attention.

You can try HP Vertica Pulse on tweets that impact your business.  To download, visit the HP Vertica Community Marketplace, and look for HP Vertica Pulse on the Innovations tab.

The HP Vertica Pulse download contains the following resources:

  • Installation files and documentation
  • three instructional videos
  • A quick start guide,
  • A Tableau template

These resources will help you to evaluate HP Vertica Pulse and make meaning of your Tweets as quickly and easily as possible.  When you create your Marketplace account, please review the Terms and Conditions, as they contain important information about the Innovations efforts.

We want to hear from you!  Post your questions, comments, and feedback to the HP Vertica Community.  Would you use HP Vertica Pulse to make an impact on your business?  What changes would you like to see?  We look forward to your feedback!

Enhancing Big Data Analytics with the HP Vertica Marketplace

marketplace

At the O’Reilly Strata Conference, we will be demonstrating the just-announced HP Vertica Marketplace, an online destination for developers, HP Vertica users, and technology partners to create and share innovative big data analytics solutions built for the HP Vertica Analytics Platform. The marketplace features the HP Vertica Analytics Platform Community Edition, connectors and third-party extensions, business intelligence tools, ETL and data transformation products, and HP Vertica innovations for sentiment analysis, geospatial analysis, and distributed R.

Our partner ecosystem, particularly, is excited to share their complementary technologies in the marketplace to accelerate the development of end-to-end solutions, based on HP Vertica. Logi Analytics, Looker, VoltDB, and a growing number of technology partners have their technologies featured in the marketplace — directly available for you to download today. These partnerships will allow HP Vertica users to find everything they need to create their big data analytics solutions all in a single environment.

What our technology partners are saying
“Looker is excited to offer our modern analytics platform in the HP Vertica Marketplace,” said Keenan Rice, vice president of marketing and alliances for Looker. “Now HP Vertica customers—and anyone in their organizations—can experience the power and flexibility of exploring extremely large data sets, in full detail and in real time.”

“We’re pleased to take part in the HP Vertica Marketplace,” said Bruce Reading, president and CEO of VoltDB. “Our in-memory operational database is ideal for any organization grappling with high-velocity data that is mutating rapidly, is extremely write- and read-intensive and requires immediate analysis not possible with batch processing. The marketplace provides HP Vertica’s community of developers, partners and customers with an easy way to tap into the power and remarkable speed of VoltDB for transactional decision-making on huge, incoming real-time data streams.”

“Logi Analytics is thrilled to help deliver on the “n” Apps HAVEn promise and launch its information platform on the HP Vertica Application Marketplace,” said Kevin Greene, VP Business Development & Channels, Logi Analytics. “Together, Logi Analytics and HP Vertica empower organizations and their people to collect, analyze and act on some of the largest and most diverse data sets in the world to put the power of information at everyone’s fingertips. We are pleased to partner with the HP Vertica team in creating this important new application marketplace.”

Contribute to the HP Vertica Marketplace today
Fostering an open big data analytics ecosystem is essential to bringing innovative products to our Vertica community. To build this diverse community, we are encouraging other developers and technology partners to join our developer program and post their solutions to the HP Vertica Marketplace. Joining is simple. Currently, we are sending out invitations to participate. If you’d like an invitation, send an email to vdz@hp.com and we’ll respond with instructions to join.

We look forward to building up our catalog and offering the HP Vertica community a variety of products in an easy-to-use environment. Be sure to visit the marketplace often and check out the latest and greatest big data analytics offerings.

The HP Vertica Marketplace in the news
HP Debuts App Store for Vertica—Datanami

HP creates marketplace for Vertica analytics platform—IT World Canada

Enter the Flex Zone: Taking HP Vertica Flex Zone on the Road

Howdy HP Vertica Community! A couple of weeks back, I was on the road meeting up with HP Vertica customers, partners, and big data geeks in Atlanta and New York City.

The main topic of interest with customers and partners was the changing landscape of big data. From the 80’s on until about a decade ago, data was coming in from more traditional sources such as enterprise CRM and ERP systems, and data warehouses. This was your tried and tested structured data that fit nicely into tables and columns.
Over the past decade or so there has been a paradigm shift in the industry with the rise of “non-traditional” data sources such as web, social media, sensor data, etc. The semi-structured data that pours out of these sources does not necessarily play nice with old school data analytics platforms built to handle structured data.

The majority of customers and partners I heard from are struggling to analyze this large pool of semi-structured data that often sits in their data warehouse for years. Their common challenge was that it takes “forever” to load semi-structured data into their old analytics platforms. The ETL (Extract Transform Load) process they currently use entails importing the semi-structured data into the Hadoop Distributed File System (HDFS), transforming it into a simple delimited format, and then importing it into their data analytics platform. Prior to import however, Database Administrators need to write a lot of SQL table definitions with hundreds or thousands of columns, parse, sort, perform error checks, and then export it in the right order to load correctly. This manual and error prone process takes hours to days, and causes significant delays reducing the time to value from big data.

Luckily HP Vertica product development talks to customers a lot more often than I do, months ago they responded to this challenge by designing <a= href”http://www.vertica.com/wp-content/uploads/2013/11/Flexzone-data-sheet1.pdf”>HP Vertica Flex Zone a product specifically built for ingesting semi-structured and structured data FAST!

During this trip I had the chance to demo HP Vertica Flex Zone and illustrate how quick and easy it is to load, explore, and analyze structured and semi-structured data. Since most customers in the audience were also users of business intelligence (BI) and visualization tools, they were thrilled to see that HP Vertica Flex Zone works out-of-the-box with major BI tools.

In New York City the HP Vertica channel sales boot camp event was attended by over 40 partners representing 11 companies. The training sessions ran all day and included product training, demos, “a day in the life of a HP Vertica rep” and A LOT of food! We had some great questions from partners and received some awesome feedback from our attendees:


“By far the best Vertica training I’ve seen to so far”

“The Boot Camp was very good. It was one of the best I have taken in a long time. Don’t change anything”

“I absolutely, loved the talks, presentations/demos and meeting key members of the HP Vertica team. Very excited about HP Vertica’s offering and its convincing value proposition.”

We are planning more of these boot camps around the country so be on the lookout for one in your area. [Do we have any on schedule now? If so, might be good place to plug.]

If you want to see the power of HP Vertica Flex Zone for yourself, try it out for free with our <a=href”http://www.vertica.com/wp-content/uploads/2012/10/HP_Vertica_Community_Edition_DS_122113.pdf”>HP Vertica Community Edition software.

Get Started With Vertica Today

Subscribe to Vertica