> Using Vertica as a Structured Data Repository for Apache Hadoop
|
The Hadoop MapReduce implementation can be a powerful tool for running complex procedural algorithms on structured data stored in a distributed collection of relational databases. If the data being processed happens to reside in Vertica Databases, then Hadoop developers can crunch more data, on less hardware, faster than MySQL, Postgres or any other DBMS.
![]() Vertica has implemented a version of the Cloudera DBInputFormat interface that makes it easy for Hadoop developers to push down Map operations to Vertica databases in parallel by specifying parameterized queries which result in pre-aggregated data for each mapper. The interface can also be used by Hadoop Reduce operations to stream data into Vertica for on-going reporting and analysis by end users. In summary, developers can take advantage of the full power of both tools, and break Map operations into pieces that run inside Vertica, and inside Hadoop, to best use the processing power and expressive elegance of the two different tools.
Why does Vertica make such a good database for Hadoop?
Vertica - Hadoop Applications
|


















