Modern analytic databases such as Vertica often need to process a myriad of workloads ranging from the simplest primary-key lookup to complex analytical queries that include dozens of large tables and joins between them. Different types of load jobs (such as batch type ETL jobs and near real-time trickle loads) keep the data up-to-date in an enterprise data warehouse (EDW). Therefore, an enterprise class database like Vertica must have a robust yet easy-to-use mixed-workload management capability.
This blog is just the first in a series that addresses frequently asked tech support questions. For now, we?ll talk about optimizing your database for deletion.
In this, the second of the multi-part ?de-mythification? series, I?ll address another common misconception in the Big Data marketplace today ? that there are only two types of data an enterprise must deal with for Big Data analytics ? structured and unstructured, and that unstructured data is somehow structure-free.
In the first of this multi-part series, I?ll address one of the most common myths my colleagues and I have to confront in the Big Data marketplace today: the notion of ?real-time? data visibility. Whether it?s real-time analytics or real-time data, the same misconception always seems to come up. So I figured I?d address this, define what ?real-time? really means, and provide readers some advice on how to approach this topic in a productive way.
ROLLUP is a very common Online Analytic Processing (OLAP) function and is part of ANSI SQL. Many customers use ROLLUP to write reports that automatically perform sub-total aggregations across multiple dimensions at different levels in one SQL query.
When I?m on a flight sitting next to someone, and we?re making polite conversations, often the question comes up ?what do you do?? In these situations, I have to assess whether the person works in the IT industry or is otherwise familiar with the lingo. If not, my stock response is ?I fix databases?. This usually warrants a polite nod, and then we both go back to sleep. This over-simplified explanation generally suffices, but in truth, it is wholly inadequate. The truth of the matter is that my job is to ensure that databases don?t get broken in the first place; more specifically ? an Vertica database. But our clients have different, complex goals in mind, they sometimes configure their systems incorrectly for the kind of stuff they?re doing. I?m constantly looking for ways to empower clients to understand problems to look for before they become bigger problems.
With the explosion of data volumes all enterprises are capturing, new technological solutions, such as Vertica, offer a solution to non-expert users who need to analyze and monetize their Big Data. If you are a non-expert user, the Database Designer (DBD) module in Vertica can help you choose a physical database design that minimizes storage footprint while optimizing the performance of the input query workload. The DBD can recommend good physical designs as quickly as possible using minimal computing resources.
I have just come back from a business trip to China where I visited several large Chinese telecom customers to talk about the recent big Vertica win at Facebook. Two questions these customers had constantly asked me were: What?s the future of MPP databases? Will Hadoop become one database that rules the whole analytic space?
The answer is YES if it is the right kind of tree. Here ?tree? refers to a common data structure that consists of parent-child hierarchical relationship such as an org chart. Traditionally this kind of hierarchical data structure can be modeled and stored in tables but is usually not simple to navigate and use in a relational database (RDBMS). Some other RDBMS (e.g. Oracle) has a built-in CONNECT_BY function that can be used to find the level of a given node and navigate the tree. However if you take a close look at its syntax, you will realize that it is quite complicated and not at all easy to understand or use.
With Vertica’s latest release (Vertica 7 Crane”), we introduced Vertica Flex Zone, based on the patent-pending flex tables technology, which dynamically adapt to whatever schema is present in the data. Flex tables offer dramatic usability improvements over regular tables. In this post, we take a look under the hood and show how flex tables are similar to regular Vertica tables, with a little pinch of magic thrown in.