Vertica

Archive for May, 2014

Introducing HP Vertica “Dragline”

Today, we announced “Dragline,” the code name for the latest release of the HP Vertica Analytics Platform. Focused on the strategic value of all data to every organization, “Dragline” includes a range of industrial-strength features befitting its code name for serious Big Data initiatives.

Our data sheet provides detailed feature descriptions on this release’s full range of capabilities and benefits, so we’ll just focus on three top features that are sure to capture your attention (after all, they came highly requested from our growing customer base).

By the way, learn about these and all of the new “Dragline” features in our upcoming Webinar.


“Project Maverick” – Speed in All Directions

Danger Zone

Speed is a given … but what about in every possible direction? “Project Maverick” has multiple elements including fast, self-service analytics about discrete individuals or devices. Live Aggregate Projections, a key, new feature of “Project Maverick,” accelerates the speed and performance of these live lookups by up to 10x – more frequent or highly concurrent queries on an individual customer basis — by computing metrics on the data as it arrives for targeted and personalized analytics without programming accelerator layers.

For telecommunications companies as well as utilities and energy providers who, due to deregulation, are pursuing smart metering initiatives to differentiate from competitors, this capability is hot. With Live Aggregate Projections, these providers can deliver smart metering reports that educate their customers on consumption rates vis a vis their neighbors in promoting a greener planet and establishing a tighter relationship with their provider. Interested in learning how HP Vertica easily chews through the enormous volume and scale of smart meter data and the Internet of Things? Check out our newly published white paper, including detailed benchmarks.


Mixed Dynamic Workload Management – Make ’em All Happy

Coffee

Another major advancement of “Project Maverick” is Dynamic Mixed Workload Management. Commonly found in many data warehouse technologies and requested by some of our largest customers, this powerful new feature identifies and adapts to varying query complexities — simple and ad-hoc queries as well as long-running advanced queries — and dynamically assigns the appropriate amount of resources to meet the needs of all data consumers.

So, now, you can preserve your precious hardware and system resources, while pleasing even your most demanding internal and external constituents – from analysts who seek ad-hoc insights to data scientists who manage complex analytics to business executives who need customized views or dashboards on Monday mornings.


Cost-Optimized Storage – Don’t Break the Bank

pig

Hadoop is commonly used as a data lake to store bulk data. That’s why with each release, we offer tighter integration that delivers the most open SQL on Hadoop. “Dragline” builds on that strong heritage by helping you to manage multiple storage tiers for cost-effective analytics. You can perform advanced SQL queries on bulk data stored in HDFS using the HP Vertica Analytics Platform without moving the data or using any connectors. You can move data into HP Vertica when your organization requires faster performance in-depth analytics.

HP Vertica supports all of the major Hadoop distributors but the power of MapR and HP Vertica on the same cluster is something special. Join our upcoming Webinar with MapR to get the full details behind this solution and to build your business case for SQL on Hadoop.

Try HP Vertica 7 today and stay tuned for more blog posts and materials to support this release.

Work hard, have fun and make a difference!

JaminBlogPic

My name is Jaimin and I work as a Software Engineer in the Distributed Query Optimizer Team at HP Vertica. I wanted to share with you what I think makes Vertica the best place to work! I will explain the kind of impact you can make as an employee/intern at HP Vertica, while sharing my personal experiences.

As a student, I researched many companies I might want to work for to get a better understanding of the everyday life of software engineers. However, what I was most interested in learning about was the kinds of things engineers might do that went above and beyond the normal day-to-day stuff.

Is writing code something unique to the job?

No! Right?

As Software Engineers, we write code, develop algorithms, and implement them. But here at HP Vertica, we do lots of other things besides simply writing code.

Go above and beyond!

Vertica is different from other companies as far as normal day-to-day stuff goes.

Let me ask you this question: How many new graduates would you guess could get a chance to file a patent within their first 6 months of joining a company? How many would get chance to write a paper within first six months? Not a lot, right?

In my experience at HP Vertica, I’ve seen that just about all new graduate engineers file at least one patent in their first year at work. This speaks to the fact that the work we do here at Vertica is completely innovative. Our projects have a huge business impact.

Be the captain of your ship!

Vertica offers engineers incredible opportunities! All you have to do is be willing to accept them. One of the best things about HP Vertica is that you work in an environment where other engineers are smarter than you! You’ll find yourself constantly challenged to learn new, interesting, and exciting things. You’ll get better exposure and, more importantly, you have a massive role to play in the company’s growth and development.

Something else that’s unique about HP Vertica—the projects you work on as an intern become part of the shipping product! As a result, you’ll get the chance to see your code in action and sometimes you can learn what customers have to say about your feature in particular. You won’t be allowed to sit idle for a minute because we have a very short release cycle. This will keep you on your toes and encourage you to think something new day in and day out.

Here, engineers are not forced to work on this and that—they have a great deal of autonomy and frequently get to choose the things they work on. If you have an idea you think can help improve the product, you are encouraged to see it through. And, you’ll also get a chance to participate in various technical events that take place within HP and submit your ideas.

Taking initiative is always encouraged and you’ll be expected to make, discuss, and defend your design decisions with your mentors instead of just following directions. You’ll also be able to learn about the complexities of building a database and how we achieve the performance advantages in HP Vertica.

It is also easy to move between the teams. It is entirely up to you and the only question is what you want to do?

Share and gain knowledge!

Knowledge Sharing is another important thing at Vertica. We do a lunch talk where we discuss any new paper related to database systems. Every now and then people from various teams give tech talks so that each team is aware of what people in other groups are doing.

As a fresh graduate before joining Vertica, I did not have any experience working on a database optimizer product, though I had worked a bit on optimizations when I took a compiler class. Because of the great culture and environment at Vertica, I didn’t find the transition difficult at all. Sometimes it was challenging, but it allowed me to learn a lot by working with incredibly smart people at the company while working on challenging projects (I wonder how many people have the opportunity to work on the design and implementation of queries involving Set Operators during their first year of work).

Have fun!

We frequently unwind doing fun things at work, including watching the Olympics games or other sporting events during lunch, or playing table-tennis and board games when we can. Vertica provides a lot of flexibility and it comes with huge responsibility. You’re expected to get your work done on time—if you do that, no one will have any problem with having a little fun. Interns also go on outdoor field trips, including horseback riding, hiking to Blue Hills, going for a movie, participating in a bocce tournament, and water activities such as motor boat racing. Once, we went to the Boston Harbor and tried to learn how to sail a boat from one of our in-house experts in Vertica.

We are looking for people to join Vertica! Do you have any interest in being challenged in an innovative design environment? Then apply today!

How to make ROLLUP fly in Vertica?

ROLLUP screenshot

ROLLUP is a very common Online Analytic Processing (OLAP) function and is part of ANSI SQL. Many customers use ROLLUP to write reports that automatically perform sub-total aggregations across multiple dimensions at different levels in one SQL query. The following is an example:

Select DEPT_CODE, STORE_CODE, SUM(SALE_TOT_QTY) As TOTAL_SALES
From AGG_CUST_MONTHLY A INNER JOIN DIM_DATE B
ON ( A.DATE_KEY = B.DATE_KEY )
WHERE B.DATE_KEY BETWEEN 20120301 AND 20120331
GROUP BY ROLLUP(DEPT_CODE, STORE_CODE);

Note: ROLLUP syntax is not supported in the current version of Vertica 7.0. This
is used only as an illustration. See blog text for more details

The business question the above query intends to answer is: For the month of March, 2014, show the total sales across both departments (dept_code) and stores (store_code) to generate sub-total/total sales figures at three levels:

  1. At the detailed level of each pair of (dept_code, store_code)
  2. At the level of only (dept_code) but aggregating across all stores
  3. Finally a grand sales total aggregating across all departments and stores

Assuming performance is not an issue, clearly ROLLUP is a very powerful and useful function that can do a lot of data crunching in the background and deliver a quite complex report.

Unlike the more familiar GROUP BY SQL function in which the column order is not material, ROLLUP works by creating subtotals that “roll up” from the most detailed level to a grand total and requires an ordered list of grouping expressions to be supplied as arguments. First ROLLUP calculates the standard aggregate values specified in the standard GROUP BY (without ROLLUP) and then ROLLUP moves from right to left through the list of grouping columns, starting with the lowest level of aggregation to create progressively higher-level subtotals. This process continues until all grouping columns have been consumed and aggregations computed. The operator essentially “rolls up” its grouping operations, culminating in a grand total in the end.

If N is the number of grouping columns, ROLLUP function will create (N+1) levels of subtotals (including a final grand total). The column order in a ROLLUP operator is critical since it works by removing the left most column at each step. For example:

ROLLUP(A, B, C) creates 4 groups: (A, B, C), (A, B), (A), ().

Note there are NO groups such as (A, C) and (B, C).

Unfortunately the current version of Vertica 7.0 does not support ROLLUP. So what is the alternative or workaround for the existing Vertica customers? Will the workaround perform on a large data set? Fortunately, the answers to both questions are yes. However it does take some ingenuity in physical design and SQL re-write to make ROLLUP work and perform in the currently supported Vertica releases (version 5.1 and later).

The key considerations are to create and pre-load a local temp table which is sorted on ALL the ROLLUP columns in the same exact order, rewrite SQL at each level using GROUP BY into a sub-query and finally UNION ALL of these sub-queries together. By applying this strategy to the ROLLUP SQL posted at the beginning of this blog, we arrive at the following alternative/workaround:

DROP TABLE IF EXISTS AGG_TEMP CASCADE;

CREATE LOCAL TEMPORARY TABLE AGG_TEMP
ON COMMIT PRESERVE ROWS
AS
/*+ DIRECT */
( Select DEPT_CODE, STORE_CODE, SUM(SALE_TOT_QTY) As TOTAL_SALES
From AGG_CUST_MONTHLY A INNER JOIN DIM_DATE B
ON ( A.DATE_KEY = B.DATE_KEY )
WHERE B.DATE_KEY BETWEEN 20140301 AND 20140331
GROUP BY DEPT_CODE, STORE_CODE )
ORDER BY DEPT_CODE,
STORE_CODE
SEGMENTED BY HASH (DEPT_CODE, STORE_CODE) ALL NODES;

SELECT ANALYZE_STATISTICS(‘AGG_TEMP’);

SELECT DEPT_CODE, STORE_CODE, SUM(TOTAL_SALES)
FROM AGG_TEMP
GROUP BY DEPT_CODE, STORE_CODE
UNION ALL
SELECT DEPT_CODE, NULL As STORE_CODE, SUM(TOTAL_SALES)
FROM AGG_TEMP
GROUP BY DEPT_CODE
UNION ALL
SELECT NULL As DEPT_CODE, NULL As STORE_CODE, SUM(TOTAL_SALES)
FROM AGG_TEMP;

The fact that temp table AGG_TEMP is sorted on the ROLLUP columns (dept_code, store_code) is important for performance because this means that all the GROUP BY sub-queries at individual levels will be able to exploit the highly efficient GROUPBY PIPELINED operator (explicit in the query’s explain plan) in Vertica to get the job done quickly and accurately.

A nice property of GROUPBY PIPELINED operator (in contrast to the more resource intensive GROUPBY HASH) is that its performance is largely independent of the actual table size. As a result this workaround can easily scale up to include tables with 10+B or even 100+B rows without any noticeable drop in query performance (as measured in first fetch time).

To conclude I would like to mention that there has been serious work going on in Vertica Engineering to implement ROLLUP (and many other OLAP functions) in the up-coming Vertica releases. Stay tuned!

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle.

Meet The Team: Patrick Day

Pat day

Welcome back to another edition of our “Meet the Team” feature! This week we sat down with our inside sales manager Patrick Day, and talked about everything from what it’s like to manage the badass sales team we have here in Cambridge MA, to ballroom dancing.

HP Vertica 7 Installer Changes for Optimum Performance

Installing Vertica title pic

HP Vertica 7 introduces a new installer that helps you configure your system for the best possible performance. How does it do this? Well, the new installer checks your systems and lists any changes that you need to make for optimum performance. Not only that, it gives you a direct link to the documentation page that will help you change your system appropriately. Once you change your system configuration, you can rerun the install and know that your system is optimized to run HP Vertica 7.

Check out the following video to see a demonstration of how to change typical settings for default installs on RedHat or CentOS systems.

https://vimeo.com/95906728

Vertica on MapR SQL-on-Hadoop – join us in June!

We’ve been working closely with MapR Technologies to bring to market our industry-leading SQL-on-Hadoop solution, and on June 3, 2014 will be jointly delivering a live webinar which will feature this joint solution and related use cases. To register and learn how you can enjoy the benefits of a SQL-on-Hadoop analytics solution that provides the highest-performing, tightly-integrated platform for operational and exploratory analytics, click here.

This joint solution is a unified, integrated solution that reduces complexity and costs by running a single cluster for both HP Vertica and Hadoop. It tightly integrates HP Vertica’s 100% ANSI SQL, high-performance Big Data analytics platform with the MapR enterprise-grade Distribution for Apache Hadoop, providing customers and partners with the highest-performing, most tightly-integrated solution for operational and exploratory analytics with the lowest total cost of ownership (TCO).

This solution will also be presented live by HP Vertica and MapR executives at HP Discover on June 11, 2014. For more information, visit the HP Discover website.

In addition, a specially-optimized version of the MapR Sandbox for Hadoop is now available in the HP Vertica Marketplace. To download this and other add-ons for the HP Vertica Analytics platform, click here.

 

Inside the Secret world of the Workload Analyzer

WLA image

When I’m on a flight sitting next to someone, and we’re making polite conversations, often the question comes up “what do you do?” In these situations, I have to assess whether the person works in the IT industry or is otherwise familiar with the lingo. If not, my stock response is “I fix databases”. This usually warrants a polite nod, and then we both go back to sleep. This over-simplified explanation generally suffices, but in truth, it is wholly inadequate. The truth of the matter is that my job is to ensure that databases don’t get broken in the first place; more specifically – an HP Vertica database. But our clients have different, complex goals in mind, they sometimes configure their systems incorrectly for the kind of stuff they’re doing. I’m constantly looking for ways to empower clients to understand problems to look for before they become bigger problems.

That’s why I’m a huge fan of the HP Vertica Workload Analyzer (also referred to as WLA). The WLA is a great tool in the war against problems. The WLA’s goal in life (if it has one) is to find and report on problems – 20 specific ones to be exact. If you are using Management Console, the WLA runs once per day and produces a report. This report indicates actions to take, like tables that should have statistics updated, or queries that aren’t performing adequately. This report (actually just entries in a table) is reported in the Management Console in a window called “Tuning Recommendations”.

But you don’t need Management Console to take advantage of the Workload Analyzer – you can run it manually by running the following command:

SELECT ANALYZE_WORKLOAD(”) ;

Interestingly, even though it’s called “Workload Analyzer”, the actual command is ANALYZE_WORKLOAD. You could toss this into CRON and run it multiple times per day if you’d like, but it’s probably not necessary.

The output of this command comes back to your screen – it reports on all the various “problems” that it is programmed to identify and returns them. It also creates entries in two separate tables V_MONITOR.TUNING_RECOMMENDATIONS and TUNING_RECOMMENDATION_DETAILS.

As I mentioned earlier, there are 20 queries that WLA runs in order to look for problems. These queries are called rules. WLA rules can be disabled or enabled, and many of them have parameters that can be modified- increasing or decreasing the scope of the rule. These rules have clever names like sys_r1 and sys_r13. It’s ok though – WLA doesn’t allow you to run rules directly, but I’ll break down the most useful rules, and how they can be modified for maximum effect.

sys_r1: Stale and Missing Statistics

sys_r1 looks for tables that have stale or missing statistics. It recommends updating statistics on those tables. sys_r5 and sys_r6 find queries that have reported JOIN_SPILLED or GROUP BY SPILLED events. These are indications that there is no good projection design which would enable these queries to operate at maximum efficiency. These three rules generate the most typical output I see at client sites.

sys_r16 and ssys_r20: System Utilization

sys_r16 and sys_r20 are rules that check for system utilization. sys_r16 checks for a CPU utilization that exceeds a 95% threshold for more than 20 minutes. sys_r20 looks for memory utilization of 99% for more than 10 minutes.

Many of the other rules cover very specific edge-case scenarios, and are unlikely to fire in most client sites and we hope to cover those in future blogs. For now, let’s stick to this handful, and talk about how we can modify these rules, and add our own rules!

Altering Rules

What you may not know about the WLA is that you can modify certain rules. That is, you can change some parameter settings such as threshold values that the rule uses to find problems. Let’s take rules sys_r16 and sys_r20. Each looks for CPU and memory utilization in your system. The default thresholds might be too high for your liking–but that’s ok, you can adjust them.

dbadmin=> ALTER TUNING RULE sys_r16 SET USAGE_THRESHOLD_PCT=80;

ALTER TUNING RULE

dbadmin=> ALTER_TUNING_RULE sys_r16 SET DURATION_MIN=10;

ALTER TUNING RULE

Now this rule will fire under less extreme circumstances. Instead of looking for a CPU usage of 95% for 20 minutes, it will now report on a CPU usage of 80% for 10 minutes. All the tuning rule parameters (if applicable) are defined in another system table called VS_TUNING_RULE_PARAMETERS. You can use that table to determine which parameters are available to which rules. You can tweak them if you desire.

Furthermore, you can disable or enable rules with the following commands:

dbadmin=> ALTER_TUNING_RULE sys_r16 disable;

ALTER TUNING RULE

dbadmin=> ALTER_TUNING_RULE sys_r16 enable;

ALTER TUNING RULE

Creating your own rules

The coolest part of the WLA is that you can actually create your own rules. The first step is to write your rule. This is just a regular SQL query whose purpose it is to find a problem. The example I’m going to include here is not currently a rule, but it would make a great addition – finding too many delete vectors. This rule has a threshold defined as a percentage of delete vectors as compared to the number of rows in a table. When this percent exceeds 20%, this rule will fire.

CREATE TUNING RULE dv_rule (DELETE_VECTORS, PURGE, dv_threshold=20) AS SELECT CURRENT_TIME AS time, ‘Delete vectors account for ‘ || round(deleted_row_count/row_count, 2)*100::char(4) || ‘% of rows for projection ‘ || ps.projection_name AS observation_description

, ps.anchor_table_id AS table_id

, ps.anchor_table_name AS table_name

, s.schema_id AS table_schema_id

, ps.projection_schema AS table_schema

, NULL AS transaction_id

, NULL AS statement_id

, (SELECT current_value || ‘%’ FROM VS_TUNING_RULE_PARAMETERS

WHERE tuning_rule = ‘dv_rule’ AND parameter = ‘dv_threshold’)

AS tuning_parameter

, ‘run purge operations on tables with delete vectors’ AS tuning_description

, ‘SELECT PURGE_PROJECTION(”’ || ps.projection_name || ”’);’ AS tuning_command

, ‘LOW’ as tuning_cost

FROM delete_vectors dv

JOIN projection_storage ps ON dv.projection_name = ps.projection_name

JOIN schemata s ON s.schema_name = dv.schema_name

WHERE ps.row_count > 0

AND deleted_row_count / row_count >

(SELECT current_value/100::numeric FROM VS_TUNING_RULE_PARAMETERS

WHERE tuning_rule = ‘dv_rule’

AND parameter = ‘dv_threshold’) ;

The first line sets up my rule. I give it a name “dv_rule” and I define some parameters. The first two are like labels. They are arbitrary. Technically, these are referred to as the “observation type” and the “tuning type.” You can see examples by looking in the TUNING_RECOMMENDATON_DETAILS table. All remaining parameters are optional — and I can optionally define tuning parameters. Here, I’ve defined my 20% threshold for delete vectors. This query follows the order of the columns in the TUNING_RECOMMENDATION_DETAILS table. If you compare the SELECT list, and the columns in that table, you’ll note the similarities.

My query actually references the VS_TUNING_RULE_PARAMETERS table in order to get my % threshold. This creates a bit of a chicken and egg problem – the query won’t work since the parameter doesn’t yet exist in this table. So, the first thing you have to do is create the rule, which also creates the parameter. Then you can modify the query as needed. If you need to drop the query, you can do so with the following command:

dbadmin=> DROP TUNING RULE dv_rule ;

DROP TUNING RULE

Because you might go through this process several times, I recommend putting your work in to a file so you can reference it easily.

And that’s it! That’s everything you need to know in order to create and manage WLA rules. Not only can you use these rules to find problems in your HP Vertica cluster, you could use them to look for issues in your own data.

In the community forums, I’d like to continue the discussion – come up with your own rules, and post them. Who knows – the good ones we might include in a future HP Vertica release!

Get Started With Vertica Today

Subscribe to Vertica