Vertica

Author Archive

Enter the Flex Zone – Under the Hood

With HP Vertica’s latest release (HP Vertica 7 “Crane”), we introduced HP Vertica Flex Zone, based on the patent-pending flex tables technology, which dynamically adapt to whatever schema is present in the data. Flex tables offer dramatic usability improvements over regular tables. In this post, we take a look under the hood and show how flex tables are similar to regular Vertica tables, with a little pinch of magic thrown in.

The Tour Starts with Storage
Let’s take a peek at a flexible table:

    => create flex table tweets();
    CREATE TABLE
    => \d tweets

Schema
Table
Column
Type
Size
Default
Not Null
Primary Key
Foreign Key
public
tweets
__identity__
int
8
t
f
public
tweets
__raw__
long varbinary(130000)
130000
t
f

(2 rows)

At the heart of the implementation is the simple idea that we will store an entire record in __raw__, a single wide varbinary column. Any record provided, no matter the schema, can be stored – assuming that the column is wide enough to hold the whole record. From the example above, you can see that the default max size is around 128K. We’ve found that this size is sufficient for most use cases, but you can stretch it up to 32 MB if necessary. For performance reasons, Vertica includes the __identity__ column as the column on which to sort and segment the data.

So what does the __raw__ column actually contain? Let’s take a peek:

    => copy tweets from ‘tweets.json’ parser fjsonparser();
     Rows Loaded
    ——————————————–
     542
    (1 row)

    => select __raw__ from tweets limit 1;
    __raw__
    ———————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————-
    \001\000\000\000H\000\000\000\004\000\000\000\024\000\000\000&\000\000\0008\000\000\000@\000\000\0002327342449993973762327342449993973763829120138291201\004\000\000\000\024\000\000\000$\000\000\0008\000\000\000M\000\000\000delete.status.iddelete.status.id_strdelete.status.user_iddelete.status.user_id_str
    (1 row)

First, notice that this isn’t the JSON value that we supplied as input. When you load a flex table, the parser reads each input record, parses it into a collection of key/value pairs, and writes it into an internal map format. Assuming you query the data at least once, it is worth parsing upfront to avoid re-parsing with each query. Vertica also does a little in-record indexing to make finding field values efficient. Additionally, the storage is now agnostic to the input format, freeing you from having to specify the input format upfront or locking the table to a specific format. Finally, Vertica’s storage engine has more flexibility with compression algorithms for storing the data efficiently on disk.

It’s a Kind of Magic
So why can I run “select text from tweets” despite the “text” column not existing in the table? Looking at the query plan shows us what we would expect, namely we are reading the __raw__ column:

    => EXPLAIN SELECT text FROM tweets;

    Access Path:
    +-STORAGE ACCESS for tweets [Cost: 21K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
    | Projection: public.tweets_super
    | Materialize: tweets.__raw__
    | Execute on: All Nodes

If you dig into the “GraphViz” (see http://www.graphviz.org/) query plan, you will discover that the actual query that is running is:

     SELECT MapLookup(__raw__,’text’) FROM tweets;

Flex tables have a special parse rule that rewrites unknown column references into calls to the MapLookup function. The MapLookup function takes a key and returns the value, or null if the key does not exist in the map:

    
   => SELECT maplookup(__raw__,’id’) FROM tweets limit 1;
    maplookup
    ——————————————————————————————————————————————————————————————-
    257000038482460672
    (1 row)    

   => SELECT maplookup(__raw__,’nonexistentkey’) FROM tweets limit 1;
    maplookup
    ——————————————————————————————————————————————————————————————————————————————

    (1 row)

This little piece of magic is what lets you write vanilla SQL queries against data with unknown or varying schema.

Maps to Treasure

Now that you know about the __raw__ column, we can play with a number of the map functions that come with flex tables. Here are a couple of examples:]

Getting JSON back:

    => SELECT maptostring(__raw__) FROM tweets limit 1;
    maptostring
    ———————————————————————————————————————————————————————————————
    {
    “delete.status.id” : “230242346078457856″,
    “delete.status.id_str” : “230242346078457856″,
    “delete.status.user_id” : “524683721″,
    “delete.status.user_id_str” : “524683721″
    }

Shredding a record into rows of key value pairs:

    => SELECT mapitems(__raw__) OVER (partition auto) FROM (SELECT__raw__ FROM tweets limit     1) a;

keys
values
delete.status.id
232734244999397376
delete.status.id_str
232734244999397376
delete.status.user_id
38291201
delete.status.user_id_str
38291201

(4 rows)

Finding keys of all records, similar to computing contents of the keys table:

    
    => SELECT distinct key FROM (select mapkeys(__raw__) OVER (partition auto) as
    key from tweets ) a ORDER BY key;
    ————————————————————————————————————————————-
    contributors
    coordinates
    coordinates.coordinates
    coordinates.type
    created_at
    delete.status.id
    delete.status.id_str
    delete.status.user_id
    …

Checking how close you are to overflowing the record size:

    => SELECT max(length(__raw__)) FROM tweets;
    max
    ——————————————————————————————————————
    8675
    (1 row)

Isn’t that a row store?

By storing the whole record in a single column, Vertica must read the entire record even if the query needs only a subset of the keys – the definition of a row-store database. Vertica’s flex tables perform decently well due to their solid executor and distributed execution but Vertica’s columnar tables are orders of magnitude faster. As described in a previous post, accelerating flex table performance is done by promoting important columns to be real columns in the table. These hybrid tables have the best of both worlds: flexible schema due to a __raw__ column and columnar access to the most important data. You can see this for yourself by creating a hybrid flex table:

    bvandiver=> create flex table logs (time timestamp, sessionid varchar, component     varchar);
    CREATE TABLE
    bvandiver=> \d logs

Schema
Table
Column
Type
Size
Default
Not Null
Primary Key
Foreign Key
public
logs
__raw__
long varbinary(130000)
130000
t f
public
logs
"time"
timestamp
8
f f
public
logs
sessionid
varchar(80)
80
f f
public
logs
component
varchar(80)
80
f f

(4 rows)

Conclusion

Hopefully this whirlwind tour through the back end of flextables has given you a better understanding of how it all works and the performance tradeoffs we made. As usual we encourage you to download our Community Edition and give it a try!

Enter the Flex Zone – Exploring Delimited Data

Here at HP Vertica, we had to solve a technical challenge that many of you might be facing: data analysis from legacy products. When a customer reports an issue to customer support, we commonly ask for a diagnostic dump. This dump contains structured event data on what the database was doing at the time the problem occurred and provides a primary source of information for resolving the customer’s issue. The dump contains a collection of tables, written out in a simple delimited format similar to the output of a ‘SELECT * from T’ statement. Since the data size can be quite large (gigabytes), we load it into a (ha-ha) Vertica database for analysis. The complication is that the contents of this diagnostic dump vary by Vertica version. We therefore needed a way to ingest the data and make use of it despite the schema varying based on the source version.

 

A mostly-fixed but time-varying schema is a great use case for the newly released Vertica 7 (aka Crane) flex tables feature. With flex tables, we can load the data without fully defining the schema and query the part of the data we need. Indeed, reading our own diagnostic data was the first serious use of flex tables and served to refine our design for the feature. In this post, we will walk you through capturing, loading, and querying data from Vertica’s own DataCollector (DC) tables using Vertica’s flex tables. While previous posts have focused on loading JSON data, Vertica Flex tables support a number of different input formats, including delimited data.

A Whirlwind Tour of the DataCollector

HP Vertica’s Data Collector is our mechanism for capturing historical event data for internal database reporting and diagnostic analysis. Many of our system tables that display historical information, such as query_requests or load_streams, are actually views built on top of data collector tables. Because they change frequently and can be opaque to users outside of Vertica, we don’t document them. If you want to muck around, take a look at:

select data_collector_help();

In this example, we examine a couple of tables that track transactions on the Vertica cluster, the first of which is called DC_TRANSACTION_ENDS. This table has a row for every transaction that completes on each node in the cluster:

select * from dc_transaction_ends limit 4;
time | node_name | session_id | user_id | user_name | transaction_id | number_of_statements | is_committed | end_begin_time | epoch_close_time | end_epoch | is_ddl | ros_rows_written | dvros_rows_written | wos_rows_written | dvwos_rows_written

2014-01-22 00:32:08.025521-05 | initiator | localhost.localdoma-10040:0xb | 45035996273704962 | bvandiver | 45035996273704963 | 1 | t | 2014-01-22 00:32:08.016667-05 | | 1 | t | 0 | 0 | 0 | 0
2014-01-22 00:32:08.03535-05 | initiator | localhost.localdoma-10040:0xb | 45035996273704962 | bvandiver | 45035996273704964 | 1 | f | 2014-01-22 00:32:08.03491-05 | | 1 | t | 0 | 0 | 0 | 0
2014-01-22 00:32:09.163234-05 | initiator | localhost.localdoma-10040:0xf | 45035996273704962 | bvandiver | 45035996273704965 | 1 | f | 2014-01-22 00:32:09.163122-05 | | 1 | t | 0 | 0 | 0 | 0
2014-01-22 00:32:09.175018-05 | initiator | localhost.localdoma-10040:0xf | 45035996273704962 | bvandiver | 45035996273704966 | 1 | f | 2014-01-22 00:32:09.174786-05 | | 1 | t | 0 | 0 | 0 | 0

Queries against this table can reveal a number of interesting properties of the database workload, such as commit rate, rows per bulk load, DDL percentages, and so on. A companion table called DC_TRANSACTION_STARTS registers a row for each transaction that starts – it contains less interesting information because the database does not yet know the contents of the transaction. However, joining the two tables tells us the duration of each transaction.

Capturing a Data Dump

The simplest possible way to export the data is merely to capture the output of running a SQL query – and indeed we frequently need to consume this ad hoc format here at Vertica. From within vsql, Vertica’s command line client, you can do this with:

bvandiver=> \a
Output format is unaligned.
bvandiver=> \o /tmp/dce.txt
bvandiver=> select * from dc_transaction_ends;
bvandiver=> \o

We switch to unaligned output to avoid dumping a huge number of useless blank spaces into the output. An alternative is to merely capture the output with shell redirect:

/opt/vertica/bin/vsql -A -c “select * from dc_transaction_ends;” > /tmp/dce.txt

The contents of the file will look something like this:

time|node_name|session_id|user_id|user_name|transaction_id|number_of_statements|is_committed|end_begin_time|epoch_close_time|end_epoch|is_ddl|ros_rows_written|dvros_rows_written|wos_rows_written|dvwos_rows_written
2014-01-22 00:32:08.002989-05|initiator|localhost.localdoma-10040:0×3|45035996273704962|bvandiver|45035996273704961|1|f|2014-01-22 00:32:08.002313-05||1|f|0|0|0|0
2014-01-22 00:32:08.009437-05|initiator|localhost.localdoma-10040:0×3|45035996273704962|bvandiver|45035996273704962|1|t|2014-01-22 00:32:08.008547-05||1|f|0|0|0|0
2014-01-22 00:32:08.025521-05|initiator|localhost.localdoma-10040:0xb|45035996273704962|bvandiver|45035996273704963|1|t|2014-01-22 00:32:08.016667-05||1|t|0|0|0|0
2014-01-22 00:32:08.03535-05|initiator|localhost.localdoma-10040:0xb|45035996273704962|bvandiver|45035996273704964|1|f|2014-01-22 00:32:08.03491-05||1|t|0|0|0|0
2014-01-22 00:32:09.163234-05|initiator|localhost.localdoma-10040:0xf|45035996273704962|bvandiver|45035996273704965|1|f|2014-01-22 00:32:09.163122-05||1|t|0|0|0|0
2014-01-22 00:32:09.175018-05|initiator|localhost.localdoma-10040:0xf|45035996273704962|bvandiver|45035996273704966|1|f|2014-01-22 00:32:09.174

Exploring the data

How would you load this data into a regular SQL database? Well, you’d have to look at the input file, determine the column count and types, write a large CREATE TABLE statement, and then bulk load the data. With Vertica’s flex tables, you need to do only the following:

CREATE FLEX TABLE dce();
COPY dce FROM ‘/tmp/dce.txt’ PARSER FDelimitedParser();

The FDelimitedParser is a flexible parser much akin to Vertica’s regular COPY parser. The delimiter defaults to ‘|’, which is why it need not be specified here (3 guesses why we picked that default…). Flex tables make it really easy to get the data into the database – something that comes in handy when customers are anxiously awaiting diagnosis! Since we never had to describe the schema, no matter which version of Vertica supplied the source data, the load still completes successfully.

 

Queries are equally straightforward. Just pretend the columns exist! The following query shows the transactions that wrote the largest number of rows:

Query 1:

select sum(ros_rows_written + wos_rows_written) from dce group by transaction_id order by 1 desc limit 10;

Alternatively, this query computes commits per second over the interval captured

select sum(ros_rows_written + wos_rows_written) from dce group by transaction_id order by 1 desc limit 10;

Queries are equally easy – just pretend the columns exist! The following query shows the transactions that wrote the largest number of rows:

Query 2

select sum(ros_rows_written + wos_rows_written) from dce group by transaction_id order by 1 desc limit 10;

Alternatively, this query computes commits per second over the interval captured:

select count(time) / datediff(ss,min(time::timestamp),max(time::timestamp)) from dce where is_committed = ‘t’;

Much like many NoSQL solutions, flex tables require the type information to be supplied at query time, in this case in the form of casting the time column to a timestamp. In many cases, automatic casting rules convert the base varchar type to something appropriate, such as the sum() above casting its inputs to integers. You are not limited to trivial SQL, for example you can load DC_TRANSACTION_STARTS and join the two tables to compute average transaction duration:

Query 3:

select avg(ends.end_time – starts.start_time) from
(select transaction_id,max(time::timestamp) as end_time from dce group by transaction_id) ends,
(select transaction_id,min(time::timestamp) as start_time from dcs group by transaction_id) starts
where ends.transaction_id = starts.transaction_id;

This query needs the subqueries because a row for every transaction exists on every participating node and we want to find the earliest and latest entry. When joining flex tables, you need to fully qualify all source columns with their originating table because either table could possibly have any column name you supply! To simplify, you can have Vertica build a view for you:

select compute_flextable_keys_and_build_view(‘dce’);

Querying the view dce_view looks very much like querying the original Data Collector table. In fact, any of the SQL we just wrote against the flex tables is valid when run against the original source tables DC_TRANSACTION_ENDS and DC_TRANSACTION_STARTS! Accessing flex tables is identical to accessing regular tables–a true illustration of how flex tables meet the underlying promise of SQL databases. Users do not have to know the underlying data storage to derive value from the data.

In search of performance

We get some really large diagnostic dumps and can’t always afford the performance degradation of unoptimized flex tables. Improving performance is particularly simple for tables from the Data Collector because we know upfront which columns are important predicate and join columns. Furthermore, these columns are present in most DC tables and are unlikely to vary. Thus, we use a hybrid flex table: specify some important columns and let the rest of the hang out in the “Flex Zone”. We use the following definition for most DC tables:

CREATE FLEX TABLE hybrid (
time timestamptz,
node_name varchar(100),
session_id varchar(100),
user_id int,
user_name varchar(100),
transaction_id int,
statement_id int,
request_id int
) order by transaction_id,statement_id,time segmented by hash(transaction_id) all nodes ksafe 1;

In our performance evaluation shown in Figure 1, we compare running the above queries using an unoptimized flex table with no explicit columns against an optimized flex table with explicit columns. The optimized tables have same columns, sort order, and segmentation of the “hybrid” table shown above and the experiment was run on 3 virtual machines hosted on an HP DL380 with 12 cores and 96GB of RAM. On the optimized table, Query 1 is an aggregation over an explicit column which appears in the sort order, leading to an efficient aggregation mechanism. Query 2 has a selective predicate column which is not explicit in the optimized table and thus doesn’t gain significant benefit. Query 3 derives significant benefit because the join columns are explicitly part of the sort order and segmentation expression for the optimized table, leading to a cluster aware join which is very memory efficient. With our optimized table, we get the best of both worlds: speedy queries and flexible schema.

speedup

Figure 1: Performance comparison of sample queries run on the unoptimized flex table versus the optimized hybrid flex table. Query 1 is a simple aggregate query and Query 3 is a simple join. The graph shows that optimization sped up Query 1 by a factor of 17, and Query 3 by a factor of 121.

And in conclusion

Flex tables have changed our ingest logic from a complicated and brittle collection of static scripts to one simple Python script that handles it all. And I am much more willing to accept general query dumps from customers, since even if they are 300 columns wide I can still trivially pull them into Vertica for analysis. It all works because flex tables are not just about supporting JSON in the database, they are about changing the way we store and explore data in a relational database.

Welcome to the Flex Zone – Dynamic Schemas

Enter the Flex Zone

Much of the data analytics we perform occurs with data whose schema changes over time. Many organizations have developers who define the record types that are collected and analysts who comb through these records looking for important patterns. In today’s fast-changing business landscape, agility is the key to success: developers need to continually define new metrics for their analyst’s consumption. Typically, relational databases have been a significant impediment: schema changes are costly and the database data ingest logic is brittle. Instead of record changes being at the need of the developer, they involve careful coordination with database administrators against a backdrop of tweaking production systems. The Flexible table feature in Vertica 7.0 “Crane” seeks to address this challenge by allowing the table schema to vary from record to record. No database administrator involvement is necessary with day-to-day schema changes. The DBA need not get involved, unless performance tuning is necessary.

A Web-driven database

For our example, we will consider a simple web service that serves a form and stores the results. While simplistic, this isn’t far off from serious web services, mobile games, smart meters, etc. There are two source files for this example, a web server flex-server.py and an HTML file form.html (store them in the same directory).

Getting Started

First, we need to create a table to store the form data. With Vertica Flex tables, we don’t even need to know what the structure of the data is upfront:

CREATE FLEX TABLE webreqs();

The web server, implemented in Python, is quite simple: it serves a form and stores the responses as JSON records.

class FlexServer(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        url = urlparse.urlparse(self.path)
        # form submission?
       if url.path == ‘/data’:
            v.send(json.dumps({“time”:str(datetime.datetime.now()),
                  ;”src”:self.client_address[0],
                   #”header”:dict(self.headers.items()), # uncomment to get header info
                             “data”:urlparse.parse_qs(url.query)})+’\n’)
          self.send_response(302)
          self.send_header(‘Location’,urlparse.urljoin(self.path,’/’))
        else:
self.send_response(200)
self.send_header(‘Content-type’,’text/html’)
self.end_headers()
with open(“form.html”,”r”) as src:
self.wfile.write(src.read())
return

v = SendToVertica()
from BaseHTTPServer import HTTPServer
server = HTTPServer((‘localhost’,8080),FlexServer)
server.serve_forever()

For brevity, we’ve left off the SendToVertica class, which streams the data into Vertica – it’s also quite simple and the full text is in flex-server.py.

The HTML file contains the form – it’s broken out into a separate file to make editing it easier. The initial form merely asks for the submitter’s name.

<html><head><title>Enter Info</title><body>
<form action=”data”>
Name:<input type=”text” name=”name”>
<br><input type=”submit” value=”Submit”>
</form></body></html>

Start the server (python flex-server.py) and point your browser at localhost:8080 – you should see something like:

flex-dyn-schema-blog-scrnshot

Submit the form a couple of times. Then look at the server output – it might look something like this:

{“src”: “127.0.0.1″, “data”: {“name”: ["Ben"]}, “time”: “2013-12-24 01:40:35.475723″}
{“src”: “127.0.0.1″, “data”: {“name”: ["Amy"]}, “time”: “2013-12-24 01:40:39.293608″}

Vertica makes it easy to load this data into Flex tables – no table schema definition necessary and there’s already a parser for JSON data. The flex-server is running the following statement in order to load the data:

copy webreqs from stdin parser fjsonparser(flatten_arrays=true);

The flatten_arrays parameter may be new to you – we need it here because HTML form attributes can be many valued, so there’s an extra level of array we need to flatten out of the input.

Now we can write a query against the data as if the JSON features were columns in the table:

select “data.name.0″ from webreqs;
data.name.0
————-
Amy
Ben
(2 rows)

select compute_flextable_keys_and_build_view(‘webreqs’);
compute_flextable_keys_and_build_view
—————————————————————————————————-
Please see public.webreqs_keys for updated keys
The view public.webreqs_view is ready for querying
(1 row)

select * from webreqs_view;
 data.name.0 |     src   | time
      ——————-+——————————-+———————————————————————————————-
         Ben | 127.0.0.1 | 2013-12-24 01:40:35.475723
         Amy | 127.0.0.1 | 2013-12-24 01:40:39.293608
         (2 rows)

It’s Flex Time

But now for the main event: so far, loading the data was easy – but what if the developer decides the form will also have an additional vote field. What needs to change? Well, let’s change the HTML file to include the new form field:

<html><head><title>Enter Info</title><body>
<form action=”data”>
Name:<input type=”text” name=”name”>
<br>Vote:<select type=”select” name=”vote”>
<option>yes</option><option>no</option></select>
<br><input type=”submit” value=”Submit”>
</form></body></html>

Submit the form a couple more times and look at the output – now there’s an additional key in the JSON!

{“src”: “127.0.0.1″, “data”: {“vote”: ["yes"], “name”: ["Jen"]}, “time”: “2013-12-24 01:40:57.785534″}
{“src”: “127.0.0.1″, “data”: {“vote”: ["no"], “name”: ["Ian"]}, “time”: “2013-12-24 01:41:01.867649″}

What database changes are necessary? None! The new fields are immediately available to query:

select sum(case when “data.vote.0″ = ‘yes’ then 1 else 0 end) as yeses, count(“data.vote.0″) as total from webreqs;
yeses | total
   ——-+——-
    1 | 2
(1 row)

But what about the entries submitted before the “vote” field existed?

select “data.name.0″,”data.vote.0″ from webreqs;
data.name.0 | data.vote.0
       ————-+————-
        Jen | yes
        Ben |
        Amy |
        Ian | no
       (4 rows)

For fields that don’t exist in a particular record, you get SQL null. SQL operators do the “right thing” when they encounter nulls, such as count and sum above.

If you believe the schema has changed, you can update your keys and views to reflect the new reality by re-running the schematization tools:

select compute_flextable_keys_and_build_view(‘webreqs’);
compute_flextable_keys_and_build_view
—————————————————————————————————-
Please see public.webreqs_keys for updated keys
The view public.webreqs_view is ready for querying
(1 row)

select * from webreqs_view;
  data.name.0 |    src    | time                       | data.vote.0
       ——————–+———————————+———————————————————————————–+——————————————-
          Jen | 127.0.0.1 | 2013-12-24 01:40:57.785534 | yes
          Ben | 127.0.0.1 | 2013-12-24 01:40:35.475723 |
          Amy | 127.0.0.1 | 2013-12-24 01:40:39.293608 |
          Ian | 127.0.0.1 | 2013-12-24 01:41:01.867649 | no
         (4 rows)

Note that you are not required to recompute the view to access the new fields, only to access the new fields through the view.

Try it yourself – edit the HTML form to have new fields and see how easy it is to query the resulting data in Vertica! You can also uncomment headers line in the server to get access to the header information (think cookies, sessionization, etc).

Tuning for Performance

While flexible tables are great for exploration, we recommend that you do some tuning before moving into production. Luckily, tuning is easy and still preserves the flexible nature of the tables. The central mechanism is “promotion” of one or more columns to real table columns. To promote, select commonly used columns, particularly those used in joins or predicates and add them to your flex table. You can do this by using ALTER TABLE … ADD COLUMN … DEFAULT. For example, the time and vote columns might be of particular interest:

ALTER TABLE webreqs ADD COLUMN “time” TIMESTAMP NOT NULL DEFAULT time::timestamp;
ALTER TABLE webreqs ADD COLUMN “data.vote.0″ VARCHAR(10) DEFAULT “data.vote.0″::varchar;

As part of adding the column, Vertica populates the column with any same-name data. Future loads will also load data into the newly added columns. Note that we have selected specific data types for the new columns. New data that doesn’t match the declared data type will result in nulls getting stored (or errors if the column is declared NOT NULL).

Don’t want to run ADD COLUMN yourself? No problem, because there’s a helper function to do this step for you:

select materialize_flextable_columns(‘webreqs’);

The following columns were added to the table public.webreqs:
data.name.0
src
time
data.vote.0
For more details, run the following query:
SELECT * FROM v_catalog.materialize_flextable_columns_results WHERE table_schema = ‘public’ and table_name = ‘webreqs’;

There’s nothing special about these newly added columns – you could have done the same thing upfront:

CREATE FLEX TABLE webreqs (
“time” TIMESTAMP NOT NULL,
“data.vote.0″ VARCHAR(10)
);

So if you know a little about your data and how it will be used, you can jump directly to the optimized approach.

Vertica’s Database Designer exists to optimize the physical design (storage layout) and it works great on promoted columns in flexible tables. A typical usage model is explore, promote, and optimize. A key thing to remember is that changes to the schema do not require promotion and optimization. If you add fields that dramatically change your query workload, then some additional performance optimization may be necessary.

Flexible tables from Vertica should dramatically simplify the process of adapting your schema to the needs of your changing business.

Enter the Flex Zone – An Exploration in Processing Twitter Data

With our HP Vertica 7 release, we announced HP Vertica Flex Zone, a new product offering that simplifies the way that you consume and then explore semi-structured data, such as Web logs, sensor data, and other emerging data types. In this blog post, our first “Flex Zone Friday” post, let’s look at how you can use HP Vertica Flex Zone to get a leg up on your latest data analysis problem, using Twitter data as the sample data type.

Tweet child of mine

For this example, let’s look at Twitter data. Tweets are an interesting source because the data volume is large, the tweet format is typically JSON, and many companies are mining tweet data for insights about their business in how the community perceives their brand, product, or service. A tweet might look like the following:

{“id”:257000038495043584,
“text”:”Let’s play uno”,
“created_at”:”Sat Oct 13 06:09:18 2012″,
“user”:{“name”:”scout”,
  &nbsp ;”lang”:”en”,
    “followers_count”:521,…}
…}

It has the tweet text itself, with a lot of interesting metadata. Note that much of the metadata is stored in sub-structures like the “user” field.

You can acquire twitter data quite easily by making requests to the Twitter API. For the purposes of following along here, you can download a small sample of tweets: tweets.json.gz

Look Ma: No Columns!

Typically, loading this type of data into a relational database would have required transforming the raw data into something more easily consumable – or a pre-built custom Twitter data loader. With HP Vertica Flex Zone, you can skip the schema definition phase and just load JSON directly into the database, saving you hours of time:

CREATE FLEX TABLE tweets ();
COPY tweets FROM ‘/path/to/tweets.json.gz’ GZIP PARSER fjsonparser();

As easy as “SELECT … FROM T”
Now we can just write SQL queries against the data, as if it were naturally loaded into columns. For example, to discover what the average tweet length by language is:

select “user.lang”,avg(length(text))::int from tweets group by “user.lang” order by 2 desc;

user.lang | avg
——————————+——–
       de | 119
       ar | 108
       es | 80
       en | 67
       it | 66
       tr | 66
       nl | 64
       id | 61
       fr | 60
       ru | 57
       ja | 44
       pt | 43
       ko | 40
       th | 22

From the SQL point of view, it doesn’t matter that the source data was JSON. The only concession is the column names themselves: we do need to “” the column names if they contain the ‘.’ character.

Let’s make a Schema

But how can you tell what columns are available? The HP Vertica Analytics Platform has a couple of mechanisms for discovering and making explicit the columns available.

select compute_flextable_keys_and_build_view(‘tweets’);
compute_flextable_keys_and_build_view
————————————————————————————————–
Please see public.tweets_keys for updated keys
The view public.tweets_view is ready for querying
(1 row)

The tweets_keys table now contains a row for each available column in the data. For example, to find all the user-related count columns:

select key_name from tweets_keys where key_name::varchar ilike ‘user%count’;
     key_name
————————————————————
user.statuses_count
user.listed_count
user.friends_count
user.followers_count
user.favourites_count
(5 rows)

Armed with these new columns, we can compute some statistics about the followers in our dataset:

select min(“user.followers_count”::int),avg(“user.followers_count”)::int,max
(“user.followers_count”::int) from tweets;

min | avg | max
  —–+—————–+——–
  0 | 1968 | 253435
(1 row)

Clearly someone is tweeting to the void. And we probably have some observer bias in our small sample: there’s probably a correlation between tweeting frequency and follower count.
In addition to building the keys table, the function also whipped up a view that selects all available columns, which can be handy to see what is in the table. Tweets are quite large:

select * from tweets_view limit 1;
contributors | created_at | entities.hashtags | entities.urls | entities.user_mentions | favorited | id | id_str | in_reply_to_screen_name | in_reply_to_status_id | in_reply_to_status_id_str | in_reply_to_user_id | in_reply_to_user_id_str | retweet_count | retweeted | source | text | truncated | user.contributors_enabled | user.created_at | user.default_profile | user.default_profile_image | user.description | user.favourites_count | user.follow_request_sent | user.followers_count | user.following | user.friends_count | user.geo_enabled | user.id | user.id_str | user.is_translator | user.lang | user.listed_count | user.location | user.name | user.notifications | user.profile_background_color | user.profile_background_image_url | user.profile_background_image_url_https | user.profile_background_tile | user.profile_image_url | user.profile_image_url_https | user.profile_link_color | user.profile_sidebar_border_color | user.profile_sidebar_fill_color | user.profile_text_color | user.profile_use_background_image | user.protected | user.screen_name | user.statuses_count | user.time_zone | user.url | user.utc_offset | user.verified | coordinates | geo | place | user.profile_banner_url | delete.status.id | delete.status.id_str | delete.status.user_id | delete.status.user_id_str | retweeted_status.contributors | retweeted_status.created_at | retweeted_status.entities.hashtags | retweeted_status.entities.urls

A key differentiator of HP Vertica Flex Zone is that you can visualize this semi-structured data during data exploration with any of your preferred BI and visualization tools. These tools are suitable for the views, as they expose the metadata about the “flexible table” in a manner that the tools already understand. For example, we used Tableau to generate a report against the tweets_view:

And now for something completely different

It turns out that the tweet stream includes two types of records: tweets and deletes. Deletes are revocations of tweets and they have the following structure:

{
“delete”:{
  “status”:{
   “user_id_str”:”228092142″,
   “id_str”:”218710262428073984″,
   “id”:218710262428073984,
   “user_id”:228092142
  }
 }
}

Our previous load actually loaded both record types into the same table. How did the previous queries work? Well, if you reference a column that doesn’t exist in a record, you get null. We can use this to find out how many deletes we loaded:

select count(*) from tweets where “delete.status.id” is not null;
 count
 —————–
  118
 (1 row)

Lots yet to come

We haven’t talked about a number of key things like column promotion (performance!), dynamic schema, or substructure. Here’s a teaser about substructure — namely the JSON input is quite complex and the HP Vertica Analytics Platform loaded it all. You can descend into the nested structures and get the most out of the data. The following is a query that computes the top hashtags in the last month:

select values,count(*) from
    (select mapitems(“entities.hashtags”) over ()
     from tweets
    where created_at::timestamp > now() – interval ’1 month’) a
where keys::varchar like ‘%text’
group by values
order by count desc
limit 10;

                                 values | count
              —————————————————————————-+————
       10ThingsThatGetsOnMyNerves       |     2
       Massa                            |     1
       ความในใจของติ่ง                    |     1
       Up                               |     1
       RTしてくれた人全員フォローする      |      1
       Turn                             |     1
       50ThingsAboutMyBoyfriend         |     1
       askenchancer                     |     1
       always                           |     1
       kaburrr                          |     1

Note: If you got recent data from twitter, this will work without modification – if you used our data set, use interval ’3 years’ (or more).

We encourage you to give HP Vertica Flex Zone a test drive and actively participate in our community site! Stay tuned for our next Flex Zone Friday blog post.

A Feather in Vertica’s CAP

In this post, I attempt to relate Vertica distributed system properties to the well known CAP theorem and provide a fault tolerance comparison with the well known HDFS block storage mechanism.

The CAP theorem, as originally presented by Brewer @ PODC 2000 reads:

The CAP Theorem

It is impossible for a web service to provide the following three
guarantees:

  • Consistency
  • Availability
  • Partition-tolerance

The CAP theorem is useful from a system engineering perspective because distributed systems must pick 2/3 of the properties to implement and 1/3 to give up. A system that “gives up” on a particular property strives makes a best effort but cannot provide solid guarantees. Different systems choose to give up on different properties, resulting in different behavior when failures occur. However, there is a fair amount of confusion about what the C, A, and P actually mean for a system.

  • Partition-tolerance – A network partition results in some node A being unable to exchange messages with another node B. More generally, the inability of the nodes to communicate. Systems that give up on P assume that all messages are reliably delivered without fail and nodes never go down. Pretty much any context in which the CAP theorem is invoked, the system in question supports P.
  • Consistency – For these types of distributed systems, consistency means that all operations submitted to the system are executed as if in some sequential order on a single node. For example, if a write is executed, a subsequent read will observe the new data. Systems that give up on C can return inconsistent answers when nodes fail (or are partitioned). For example, two clients can read and each receive different values.
  • Availability – A system is unavailable when a client does not receive an answer to a request. Systems that give up on A will return no answer rather than a potentially incorrect (or inconsistent) answer. For example, unless a quorum of nodes are up, a write will fail to succeed.

Vertica is a stateful distributed system and thus worthy of consideration under the CAP theorem:

  • Partition-tolerance – Vertica supports partitions. That is, nodes can fail or messages can fail to be delivered and Vertica can continue functioning.
  • Consistency – Vertica is consistent. All operations on Vertica are strongly ordered – i.e., there is a singular truth about what data is in the system and it can be observed by querying the database.
  • Availability – Vertica is willing to sacrifice availability in pursuit of consistency when failures occur. Without a quorum of nodes (over half), Vertica will shut down since no modification may safely be made to the system state. The choice to give up availability for consistency is a very deliberate one and represents cultural expectations for a relational database as well as a belief that a database component should make the overall system design simpler. Developers can more easily reason about the database component being up or down than about it giving inconsistent (dare I say … “wrong”) answers. One reason for this belief is that a lack of availability is much more obvious than a lack of consistency. The more obvious and simplistic a failure mode is, the easier integration testing will be with other components, resulting in a higher quality overall system.

In addition to requiring a quorum of up nodes, each row value must be available from some up node, otherwise the full state of the database is no longer observable by queries. If Vertica fully replicated every row on every node, the database could function any time it had quorum: any node can service any query. Since full replication significantly limits scale-out, most users employ a replication scheme which stores some small number of copies of each row – in Vertica parlance, K-Safety. To be assured of surviving any K node failures, Vertica will store K+1 copies of each row. However, it’s not necessary for Vertica to shut down the instant more than K nodes fail. For larger clusters, it’s likely that all the row data is still available. Data (or Smart) K-Safety is the Vertica feature that tracks inter-node data dependencies and only shuts down the cluster when node failure actually makes data unavailable. This feature achieves a significant reliability improvement over basic K-Safety, as shown in the graph below.

The key reason Data K-Safety scales better is that Vertica is careful about how it arranges the replicas to ensure that nodes are not too interdependent. Internally, Vertica arranges the nodes in a ring and adjacent nodes serve as replicas. For K=1, if node i fails, then nodes i-1 and i+1 become critical: failure of either one will bring down the cluster. The key take away is that for each node that fails, a constant number (2) of new nodes become critical, whereas in the regular K-Safety mechanism, failure of the K th node makes all N-K remaining nodes critical! While basic K=2 safety initially provides better fault tolerance, the superior scalability of Data K=1 Safety eventually dominates as the cluster grows in size.

Here we can draw an interesting comparison to HDFS, which also provides high availability access to data blocks in a distributed system. Each HDFS block is replicated and by default stored on three different nodes, which would correspond to a K of 2. HDFS provides no coordination between the replicas of each block: the nodes are chosen randomly (modulo rack awareness) for each individual block. By contrast, Vertica storing data on node i at K=2 would replicate that data on nodes i+1 and i+2 every time. If nodes 3, 6, and 27 fail, there is no chance that this brings down a Vertica cluster. What is the chance that it impacts HDFS? Well, it depends on how much data is stored – the typical block size is 64MB. The graph below presents the results of simulated block allocation on a 100 node cluster with replication factor of 3, computing the probability of a random 3-node failure making at least one block unavailable.

Assuming that you’re storing 50TB of data on your 100 node cluster, the fault tolerance of HDFS should be the same as a basic K=2 Vertica cluster – namely, if any 3 nodes fail, some block is highly likely to be unavailable. Data K-Safety with K=1 provides better fault tolerance in this situation. And here’s the real kicker: at K=1, we can fit 50% more data on the cluster due to less replication!

This comparison is worth a couple extra comments. First, HDFS does not become unavailable if you lose a single block – unless it’s the block your application really needs to run. Second, nodes experience correlated failures, which is why HDFS is careful to place replicas on different racks. We’ve been working on making Vertica rack-aware and have seen good progress. Third, the model assumes the mean-time-to-repair (MTTR) is short relative to the mean-time-to-failure (MTTF). In case of a non-transient failure, HDFS re-replicates the blocks of the failed node to any node that has space. Since Vertica aggressively co-locates data for increased query performance, it uses a more significant rebalance operation to carefully redistribute the failed node’s data to the other nodes. In practice, the recovery or rebalance operation is timely relative to the MTTF.

In conclusion, Vertica uses a combination of effective implementation and careful data placement to provide a consistent and fault tolerant distributed database system. We demonstrate that our design choices yield a system which is both highly fault tolerant and very resource efficient.

Notes:

  • The CAP theorem was proved by Lynch in 2002 in the context of stateful distributed systems on an asynchronous network.

 

Are You Ready for the Data Race?

A few hardy souls pressed against the tide of humanity heading home after work last Tuesday to gather at a nondescript loft in downtown Boston. We carefully looked left and right before dodging in past the bouncer to join a select crowd in their new favorite adrenaline-pumping sport… Tweet Racing.

In Tweet Racing, each participant carefully selects a twitter search term for the race, betting on the term they hope the Twitterverse will smile upon. Thrown into the cage and subjected to Vertica’s live twitter sentiment analysis code, the terms dueled for an hour. There are no rules in Tweet Racing – anything goes. We watched as the participants encouraged their Twitter followers to tweet for their terms or brutally tweet down others.

In the end, we even learned a few things. People don’t feel very strongly about kittens on a Tuesday evening.  However, “skrillex” is fairly popular, but it’s hard to beat “jolie” right after her Oscar Night poses.

 

But we weren’t there just to watch the races. New Blood Boston hosted the Vertica Engineering team for a discussion about ”Big Data” and how the Vertica Analytics Platform is a natural fit for many of the data problems facing start-ups today.

At the event, we showed how Vertica can blaze through anything from clickstream data with funnel analysis to graph problems like k-core and counting triangles – problems that may not initially appear to be database problems. We demonstrated what makes Tweet Racing possible in Vertica – the extensibility of the platform and its applicability to things outside the usual scope of the traditional SQL database.

But mostly, we were there to share our passion for Vertica and the engineering challenges that go into making it the industry’s most powerful, extensible analytics database.

Missed the New Blood Boston event? Check out our Vertica Community Edition to test drive Vertica and experience the thrills first hand!

How to Make Pig for Hadoop SQL

A few weeks back, my co-worker Rajat Venkatesh gave a little talk on his work with the Vertica Connector for Hadoop.  When he displayed a Pig program, the de facto programming language for Hadoop, I noted that it looks a lot like SQL.  And also, that it shouldn’t be too hard to convert a Pig program to SQL, allowing us to run Pig on Vertica directly!

So, I spent a few hours over the last week and whipped up an engine for running Pig programs on Hadoop and Vertica.  I’m not trying to start up a Hadoop vs. database war, or even “perfume” Pig (sorry… bad joke).  I just wanted to make life easier for those users who are combining Hadoop and Vertica to get the most out of their data.

The core idea of the Vertica Pig/SQL conversion engine is to rewrite the data flow described by the Pig program into a set of nested SQL queries that produce the query answer. The conversion engine uses the regular PigScriptParser, looks at the LogicalPlan, and transforms each Operator into a SQL statement that implements its functionality.

The Vertica Connector for Hadoop is used to move data from HDFS into Vertica and from Vertica back into HDFS.  We run a mixture of Pig scripts and SQL scripts in order to create tables in Vertica, move the data from Hadoop to Vertica, compute the result, and move it back to Hadoop.  The hope is to seamlessly take advantage of Vertica to run your Pig program faster.

Here’s a fairly trivial Pig script example:

x = LOAD ‘foo.log’ USING PigStorage(‘|’) as (a:int,b:int,c:int);
y = FILTER x BY a > 5;
z = FOREACH y GENERATE b*c as f;
z2 = DISTINCT z;
z3 = FOREACH z2 GENERATE f, f*f as g;
z4 = ORDER z3 by g;
STORE z4 INTO ‘zzz’;

And here is the corresponding code run by the Vertica conversion engine:

Script: schema refresh(SQL)[ErrorsOK]
DROP SCHEMA squeal_PigLatin_simple_pig CASCADE;
CREATE SCHEMA squeal_PigLatin_simple_pig;
Script: sql setup(SQL)
CREATE TABLE squeal_PigLatin_simple_pig.hdfs_L3H9215_verticacorp_com_9935_user_bvandiver_simple_foo_log
(
a int,
b int,
c int,
d varchar(65000)
);
Script: test sql(SQL)
SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT f as f,f * f as g FROM (SELECT DISTINCT * FROM (SELECT b * c as f FROM (SELECT * FROM squeal_PigLatin_simple_pig.hdfs_L3H9215_verticacorp_com_9935_user_bvandiver_simple_foo_log as x WHERE a > 5) AS y) AS z) as z2) AS z3 ORDER BY g) AS z4 LIMIT 3) AS z5
Script: load3(PIG)
x = LOAD ‘hdfs://L3H9215.verticacorp.com:9935/user/bvandiver/simple/foo.log’ USING PigStorage(‘|’) AS (a:int,b:int,c:int,d:chararray);
STORE x into ‘{squeal_PigLatin_simple_pig.hdfs_L3H9215_verticacorp_com_9935_user_bvandiver_simple_foo_log(a int,b int,c int,d varchar(65000))}’ using com.vertica.pig.VerticaStorer(‘localhost’,’verticadb502′,’5935′,’bvandiver’,’””‘);
Script: populate vdata4(SQL)
CREATE TABLE squeal_PigLatin_simple_pig.vdata AS SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT f as f,f * f as g FROM (SELECT DISTINCT * FROM (SELECT b * c as f FROM (SELECT * FROM squeal_PigLatin_simple_pig.hdfs_L3H9215_verticacorp_com_9935_user_bvandiver_simple_foo_log as x WHERE a > 5) AS y) AS z) as z2) AS z3 ORDER BY g) AS z4 LIMIT 3) AS z5
Script: store vdata5(PIG)
vdata = LOAD ‘sql://{SELECT * FROM squeal_PigLatin_simple_pig.vdata}’ using com.vertica.pig.VerticaLoader(‘localhost’,’verticadb502′,’5935′,’bvandiver’,’””‘);
STORE vdata INTO ‘hdfs://L3H9215.verticacorp.com:9935/user/bvandiver/simple/zzz’ USING org.apache.pig.builtin.PigStorage();

While this example only shows filter, distinct, order, and foreach, the conversion engine currently supports limit, join, sample, and group as well.

There are a number of cool tricks we can do to improve performance. One trick is that we’re already taking advantage of the Vertica Optimizer to make data-aware decisions. For example, the Optimizer will chose join order based on the data actually loaded, instead of relying on the programmer to know what the best order was (and specify it correctly!).

Additionally, we can take advantage of the Vertica Database Designer. Our Database Designer automatically chooses a good data layout and compression strategy for storing the data in Vertica. The Designer uses the schema, the queries, and a sample of the data as its input – all of which the conversion engine can provide and involve programmatically. Finally, you can leave the original data in Vertica, so the second time you run the script, you avoid paying the cost of transferring the data.

What’s the take-away? With the Vertica Pig/SQL conversion engine, dislike of writing SQL is not a reason to throw away your database. If scalability is your major concern, I’d encourage you to see how well Vertica scales up to the challenge of large data.

For an off-hours, couple-day hack, the conversion engine is pretty feature-rich! Today, the conversion engine doesn’t support all of Pig (for instance cogroup or user-defined functions), but this is simply a matter of time and effort. We would love to hear from our user community on the interest in using the Vertica Pig/SQL conversion engine if it were a real product. If so, we will improve it and release it as an open source download.

Thoughts?

Get Started With Vertica Today

Subscribe to Vertica