This blog post was authored by Steve Sarsfield.
Hadoop and HDFS is capable of storing massive volumes of data, but performing analytics on Hadoop can be challenging. Despite the apparent low-cost cost of Hadoop, it is best suited for data lake and data science solutions, where the number of concurrent analytical users is low. In our own internal benchmarks,
analytical performance has a steep degradation when concurrency is needed. Our tests have also shown that the depth of analytical functions also is lacking on many SQL on Hadoop solutions.
This is why it’s great to see Vertica
working in partnership with Cisco on developing a reference architecture and proving that Cisco hardware will work well with Vertica. We’ve been working with the Cisco team on testing Cisco UCS® Integrated Infrastructure for Big Data and Analytics with Vertica. We’ve tested Vertica on both the Cisco UCS C240 and C220 M5 Rack Servers, the likes of which have the fastest memory, networking capability, processing power and the fattest, fastest disk drives to get the analytics job done. You can read more about the servers here
What are the advantages of Cisco and Vertica?
According to Cisco, the benchmarks were impressive, so count analytical performance as one advantage. Notably, Cisco was able to run benchmarks on 1 TB, 10 TB and they are working on the 100 TB TPC-DS benchmark. Cisco told us that Vertica was the only engine they tested to run all 99 TPC-DS queries, a standard test for our industry, without modification; this confirms our own internal tests
While SQL is the primary query and analysis language, Vertica also supports Java, Python, R, and C. The analytical functions are vast and include time-series, data prep, geospatial and other advanced functions. What’s more, Vertica can perform in-database machine learning like linear regression, logistic regression, SVM, K-means, Naïve-Bayes and more to deliver predictive analytics. You can stop copying data out of your data warehouse when you want to do predictive analytics. Vertica and the Cisco infrastructure work together to create an extremely efficient MPP architecture.
Reach Out and Analyze your Hadoop Data
In a modern enterprise architecture, gaining access to data that’s stored in multiple locations is crucial to delivering analytics. Vertica allows you to perform analysis and do JOINs on data that is sitting in HDFS (Apache Hadoop) with data that is stored in the database. It doesn’t really matter if the data is stored in your database or in ORC or Parquet formats, Vertica can perform analysis on it without having to first copy it.
Not long ago, I spoke to a database administrator who was serving up analytics for customer experience management. The DBA wanted to improve customer support by leveraging both customer data (data warehouse) and machine log data (Hadoop) to gain better knowledge of customers and devices. The goal being to improve service quality, reduce churn, and deliver incremental revenue growth. With a single Vertica license, he was able to easily meet their needs. An added benefit was that he did not have to move the data from Hadoop to the database. Vertica’s external data features allowed him to analyze data in-place without big data movement. It’s this flexibility that turns out to be one of the most powerful features of Vertica.