A few days ago, Lakshmikant Shrinivas and I gave an invited talk at BIRTE 2011. Rather than yet another talk on Vertica 101, we chose to talk about Social Graph Analytics, a topic we are truly very excited about because it is something we are learning from our customers! And of course, it is uber cool to talk about Cityville and Zynga these days! We presented the Zynga usecase – how to find the influencers among the active users of a game and several other graph problems Vertica has very successfully solved with SQL. More about these in future blog posts.
The most fun part of this talk (according to the audience) was that it was multi-threaded – we weaved a real-time demo through the talk. The demo was in two parts. The first part calculated an approximate 4-core of a graph of 90M nodes and 405M edges representing active users and their interactions in a game. A k-core of a graph picks out a subgraph where every node has at least k neighbors in the subgraph – some research shows that the set of influencers in a social network is often a 3 or 4-core. The demo was run on 4 blades of the HP/Vertica Analytics System with 12 cores and 96GB RAM each. The animation below show an approximate visualization of the actual graph with 10K nodes and 45K edges (we could not find any graph visualizing tool that could draw a graph of 405M edges!) and how the algorithms iteratively reduces it to find the 4-core. The actual computation (which cannot be visualized) took about 1 minute to process 1TB of in-game event data to compute the initial graph of 90M active users and only 4.5 minutes to run 8 iterations resulting in the 4-core of 34K users!
The second part showed how this graph of influencers could be used to do A/B testing for in-game product placement. We simulated giving a group of users from the graph of influencers , Coke, and a group of randomly chosen users, Pepsi, and loading their in-game interactions data into Vertica, every 15 seconds. In the 5-minutes or so it took us to describe the setup, you could see how the relative penetration of the two products changed in real-time. This was really a demo of Vertica’s awesome real-time analytics engine – we loaded data continuously at the rate of 40GB/minute (on 4 nodes with 2 copies of the data), which translates to 20TB/hr on the full rack HP/Vertica Analytics System. That’s fast, eh?!!
This little demo and what our customers are doing in the real world, makes for a very convincing case that SQL can be used to express many practical graph problems. Vertica’s MPP architecture and sorted projections provide a highly scalable and fast engine for iterative graph algorithms. Such problems are relevant not only to the Zyngas and Facebooks of the world but to any company that has a connected user community. From online forums to friend-and-family call logs, social networks are everywhere and finding influencers is the key to monetizing them. Gartner believes that social CRM will be a $1B market by end of 2012!
We hope that our talk planted a small seed for the database research community to more formally explore SQL solutions to graph problems! Watch for more blogs from Vertica on solving graph problems with SQL.