Vertica

Archive for the ‘Uncategorized’ Category

The Top Five Ways to Botch a Vertica Evaluation

shutterstock_96833425

In the years I’ve been working with Vertica and other large scale data technologies, I’ve been a party to a large number of technology evaluations. Most businesses are familiar with these – they’re often called either a “proof of concept”, “proof of value”, or “pilot”. Technology evaluations are a key part of the technology selection process, wherein the business identifies a set of criteria which the candidate technology must meet (or exceed). These evaluations are tightly scoped operations, with clear cut input data, test scenarios, and defined metrics to measure success which are sponsored by company leadership.

At least, that’s the theory.

While some evaluations are very much the way I describe them above, many aren’t. In fact, many evaluations fail to demonstrate measurable value, and can in fact muddy the waters around technology selection – exactly the opposite of what they’re supposed to do. While there are all manner of things that can go wrong with evaluating a big data platform, I’ve seen organizations struggle with specific areas when conducting a Vertica evaluation. Here are the top five.

Mistake number 5: Don’t talk with any Vertica people at all

We’ve all bought cars, and have had to deal with car dealers. For many of us, talking with sales people can leave a bad taste in our mouths. This is unfortunate, because there is unique value to be found in talking with the right sales team. A skilled sales executive will know how to work with an organization’s leadership to map technology to strategy – which greatly increases the likelihood that an investment in that technology will pay off. A skilled presales engineer will know how to deploy the technology in ways that fit a particular business and use case(s) – which can serve as an accelerator in the project, and mitigate the risk of failure. Moreover, these teams accumulate knowledge on best (and worst) practices, and can be a powerful source of knowledge and guidance. By ignoring sales people, organizations run the risk of repeating mistakes made by others and possibly selecting the wrong technology for their needs.

Mistake number 4: Use 100% IT-defined success criteria

First, I have to say that I have nothing but respect for IT teams. I worked in various IT departments for many years before moving to the vendor side of the world. In my experience, they’re incredibly hard working, talented folks. But the people working in the technology trenches tend to think about the technology, not why it’s there. Rather than thinking of that Oracle operational store as “a key resource for business stakeholders to optimize day to day decisions,” they tend to think of it as “an Oracle database that needs to stay up at all times or the CEO will call the CIO and complain.”

This shapes expectations. And when it’s time to select new technology, IT will focus on the things it cares about – SQL completeness, availability, fault-tolerance, backup and recovery, and so forth. I’ve seen evaluations where the IT team made their “wish list” of criteria, and the vendor demonstrated every single one of them, only to see another technology get chosen. Because the test criteria didn’t matter to the business stakeholders.

Mistake number 3: Never, ever run the Database Designer

The other mistakes discussed here are pretty much technology agnostic – they can be issues in all sorts of evaluations. This one, however, is specific to Vertica. That’s because the Vertica team re-invented storage as opposed to borrowing somebody else’s storage engine and bolting on column-like features. While this bit is somewhat longer than the others, it bears reading because it is often the moment when the light bulb goes on for the database folks as to why Vertica has done so well in the market in recent years.

When a user creates a table in Vertica, two things happen:

  1. A logical table is created. . This is the structure that all users will query, insert to, update, delete from, and so forth. It is just a stub however.
  2. A super-projection is created. The superprojection is identical to the logical table.. However, it is the actual storage structure for the data. It uses certain rules for things like data distribution, sort and encoding – which are all part of the “secret sauce” of Vertica’s performance and scalability. The super projection is required because Vertica is a database – we need a spot where data can go in an ACID compliant form immediately.

But the beauty of the Vertica storage engine is that additional projections can be created, and they don’t all require every column. This is why we built our own engine from the ground up – so Vertica establishes a loose coupling between logical data model and the physical storage of that data. Additional projections can use fewer columns, other sort orders, different distribution keys, other forms of compression, etc. to deliver maximum performance. And the database will decide – when a query is submitted – which set of projections will make the query perform the best.

To make projections easier for our users to leverage, we’ve created a tool which is included with Vertica, called the Database Designer. This is unique in the industry as far as I know. A user only needs to create the desired tables and load a modest amount of data, then package up their queries and pass them to the Database Designer. The Database Designer will then test the queries and write SQL to create a set of optimized projections. In this way, the Database Designer can make just about anyone as effective as a skilled DBA when it comes to performance tuning.

Unfortunately, much of the market doesn’t understand Vertica and projections. So I often walk into conversations where the technology team has been told – usually by another vendor – that projections are “cheating” because they optimize performance. And so the business decides to deliberately avoid using the database designer to optimize performance. This is like telling yourself that breathing more oxygen during a foot race is cheating, so the runners should hold their breath during the race in order to slow the faster runners down and give the slower ones a chance of winning. I think I’m being generous when I call this a bad idea.

Mistake number 2: Don’t take it seriously

Sometimes, the technology team already knows which technology they want. And the technology evaluation is just a rubber stamp – the outcome is predetermined, and the team just needs the window dressing to make it look like they evaluated other vendors. This is a bad idea for two reasons. First, even if it’s all about putting a rubber stamp on a predetermined choice, it’s still a new use case for the technology. So the team has to plan to mitigate risk. And a well-executed technology evaluation is one good way to mitigate risk. Second, going into an evaluation having already chosen the technology will put blinders on the team – rather than looking for unique ways in which new technologies can be applied, the focus instead is on doing things very much the way they’ve been done before.

A few years ago, I was managing a field engineering team when we found ourselves in one of these evaluations. The company clearly had already chosen another vendor, but because they were already using Vertica (happily), a technology evaluation was required. The company didn’t take the evaluation very seriously, and despite the fact that our team executed flawlessly, the company went with their original choice. They didn’t pay attention to the fact that the Vertica team started (and finished) the evaluation within seven days, which was how long it took the other vendor to pack their equipment and prepare it for shipping to the customer. They didn’t want to see the findings our team uncovered highlighting revenue opportunities hidden within the data. They selected the other vendor as they’d planned all along. And after six months trying to implement it, the folks who had selected the other vendor were looking for new jobs. Moreover, most of the data science team quit in frustration. So in one fell swoop, they significantly damaged their analytics capabilities

So take it seriously, even if the choice seems predetermined.

Mistake number 1: Do an unrealistic test

One way to create an unrealistic test is to fit the test to the conditions, rather than the conditions to the test. The most frequent mistake here is using Vertica Community Edition, which is limited to three nodes and a terabyte of data, and then forcing the data used in the test to fit that limit. This is a bad idea for several reasons. First, the benefits of a distributed computing technology like Vertica don’t really show up at a terabyte. While you can run queries on the data, old school strategies such as indexing can make it look like row-oriented databases may perform as well. Second, it means “chopping down” the data – or making it fit the one terabyte threshold. This often results in artificial data, which brings with it all sort of problems. The biggest problem, however, is that it may no longer allow you to derive the insights which solve the problems you’re trying to solve. So test with a realistic volume of data. What is “realistic”? It’s a relative thing, but it should be more than just a token amount of data. Don’t feel compelled to limit your evaluation to a terabyte just because you want to run Vertica CE. This often goes hand in hand with mistake number 5 (don’t talk to any Vertica people). Don’t worry about talking with Vertica folks! We’re a friendly bunch with a commitment to the success of our customers. And we’re happy to set you up with an evaluation license that fits your data, so you don’t have to cram the data to fit the license.

Finally, there’s another way in which we see unrealistic evaluations. Particularly when the evaluation is driven by the IT team (see Mistake Number 4), the use case is often “run our existing queries faster”. While this is helpful, this is not what keeps the executive leadership awake at night. What keeps them awake? Fraud detection, personalized marketing, optimized business operations, new data products, and so forth. Note that the phrase “run our queries faster” did not appear on that list. So make the test realistic by asking bigger questions. What can’t the company do today because it can’t cope with big data? Why does it matter? These are the use cases which take a technology evaluation and translate it into terms leadership can understand – how is this technology going to enable the strategy of the business?

So there, in a nutshell, are the problems we see the most often in Vertica evaluations. We do a lot of these, and are quite good at it. So don’t hesitate to let us know when you want to try it out so we can help you avoid the pitfalls, and tie the technology to your strategy. If you’d like to talk with our team, click here to arrange a conversation.

HP Women at Grace Hopper

A couple weeks ago, the Anita Borg: Grace Hopper Conference for Women in Technology took place in Phoenix, Arizona, from October 8 -11, 2014. A Platinum Sponsor of the event, HP sent many women from across the organization to attend the conference.

HP Vertica had a significant presence at the event. Bo Hyun Kim, an HP Vertica developer, was chosen to present a technical paper she authored on Sentiment Analysis and Big Data. Shilpa Lawande, now General Manager of HP Software’s Big Data Platform Business unit, spoke at a breakfast given for technical women at HP. Seven of our own HP Vertica women attended the conference with the purpose of recruiting and screening talented candidates for the growing HP Vertica team.

Bo Hyun’s Presentation on Sentiment Analysis

In March 2014, Shilpa Lawande reached out to the technical women of HP Vertica, encouraging volunteers to submit session proposals for the conference.

Bo Hyun Kim, of the HP Vertica Management Console team, accepted the challenge. She collaborated with Lina Chen in authoring a paper called “Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree.” After several months of waiting anxiously, Bo Hyun learned that her paper was accepted!

Bo Hyun

Bo Hyun’s presentation became part of a larger Data Science in Social Media Analysis presentation. On Friday, October 10, she presented to a room packed with technical women of all ages and backgrounds.

Bo Hyun rocked—she handled the presentation with good humor, poise, and confidence. The presentation was held in one of the large ball rooms, and Bo Hyun prepared herself well before she stood in front of hundreds of bright, technical women who came from all over the world. She engaged with the audience right away by stepping down from the podium and asking them questions related to her research work, which was enhancing the sentiment analysis analytic package by HP Vertica. She had proposed to improve the performance of HP Vertica Pulse earlier this year, and the proposal was accepted. With many nights of hard work, Bo Hyun created a presentation that was both informative and educational. Bo Hyun made sure the audience members followed and understood each slide.

After the presentation ended, audience members asked her various questions about her work on sentiment analysis. Most were students, seeking advice and guidance in knowing more about sentiment analysis for their senior thesis research projects. Students also asked about the skill set required to work at a software engineering company. Bo Hyun did not have enough time to answer all their questions, so she handed out her business card so that the students could contact her later and ask as many questions as they wanted. On her way out, Bo Hyun was stopped by the Panel, who congratulated her on the talk. After the conference, Bo Hyun received many emails from the eager, enthusiastic students, who completed the presentation questionnaire, and she was happy to be able to help them further.

Bo Hyun Presentation

Vertigals

HP VertiGals at Bo Hyun’s talk.

HP Technical Women’s Breakfast

On Thursday, 10/9, conference attendees from HP were invited to a technical women’s breakfast. Shilpa Lawande, General Manager of HP Software’s Big Data Platform business unit, was the key speaker at the breakfast. Shilpa shared stories of her own journey as a woman in the male-dominated field of computer science. She spoke of the experience of being “the only woman in the room,” spanning from her undergraduate education in India, as an individual contributor at Oracle and Vertica, to her leadership roles as VP of engineering and, most recently General Manager of HP Software’s Big Data Platform business unit. Among the questions she answered for the attendees, Shilpa shared that her own personal role model is her mother, who raised her family while pursuing a career, and always met every challenge with a smile.

shilpa

Shilpa Lawande speaking at the HP Technical Women’s breakfast

Recruiting

Lina Chen, Beth Favini, Dan Huang, Bo Hyun Kim, Shilpa Lawande, Pratibha Rana, and Amy Miller were among the women from HP Vertica who staffed the HP recruiting booth located in the career fair area of the conference. The staff met countless talented young candidates eager to learn more about HP and about Vertica. The recruiting team collected literally hundreds of resumes and even held several “on-the-spot” interviews.

recruiting booth

The HP Recruiting Booth

One of the big attractions of the HP booth was the daily raffle. At the end of each of the three days of the college fair, an HP Slate was raffled off. The raffle was immensely popular, drawing visits (and repeat visits) from women across the conference.

booth 2

Crowds gathering at our daily HP Slate raffle
The HP Vertica women also scoped out the surrounding scene. Companies like Google, LinkedIn, Facebook, and Pinterest brought incredible creativity and enthusiasm to their booths. The team gathered ideas and hope work with HP recruiting to make HP recruiting booth even more eye-catching and memorable than ever.

Summary

The conference was nothing short of a success for HP Vertica attendees, who gathered a large pool of candidates, in addition to cheering on both Bo Hyun and Shilpa at their presentations. And, last but not least, everyone got to know each other a bit better, laughed a lot, and enjoyed a sense of camaraderie, teamwork, and enthusiasm at being part of HP Vertica.

Connecting HP Vertica 7.x to Tableau Desktop 8.2

Connecting HP Vertica to Tableau Desktop from Vertica Systems on Vimeo.
Have you ever wanted to visualize your HP Vertica Analytics Platform with graphs, tables, maps, or other formats? The Tableau Desktop by Tableau Software visualization tool lets you do just that in a couple steps. Use the Tableau Desktop HP Vertica specific ODBC driver connector to access your data from HP Vertica and create different views for visual analysis. Watch this video to learn how to connect HP Vertica to Tableau Desktop using both the HP Vertica specific ODBC driver connector and the generic connector.

What’s New in Dragline (7.1.0): Resource Pool Routing

Resource Pool Routing from Vertica Systems on Vimeo.

HP Vertica 7.1.x introduces new features that allow you to dynamically reroute queries to secondary resource pools.

Use the new CASCADE TO parameter when creating or altering a resource pool to indicate a secondary resource pool to which queries can cascade and execute when they exceed the RUNTIMECAP of the pool on which they started running. This way, if a query exceeds its initial pool’s RUNTIMECAP, it can cascade to a designated secondary pool with a larger RUNTIMECAP instead of causing an error.

Because grant privileges are not considered on secondary pools, you can use this functionality to designate secondary resource pools where user queries can cascade to without giving users explicit permission to run queries on that pool.

Check out this video to learn more about dynamically rerouting queries.
For more information, see the documentation.

All Aboard the Modern Data Architecture Roadshow!

Following fresh on the heels of HP and HortonWorks’ partnership, HP is sponsoring the upcoming #Hadoop and the Modern Data Architecture roadshow. We are proud to be Gold sponsors of the event in Tyson’s Corner, VA on September 23rd, 2014. The day long workshop will focus on gaining insight into the business value derived from Hadoop, understanding the technical role of Hadoop within your data center, and looking at the future of Hadoop as presented by the builders and architects of the platform.

Our HP team at the event will focus on HAVEn, our big data analytics solution, and we’re sending some of our top experts from Vertica and Autonomy to answer any questions you might have. Vertica is HP’s next generation analytics platform and is focused on handling structured data. The HP Vertica engine is helping data analysts all over the world perform ad hoc queries in record time. With HAVEn, this is paired with HP Autonomy, which handles the unstructured data and analysis portion of your big data needs.

HAVEn pic

  • Hadoop/HDFS: Catalogue massive volumes of distributed data.
  • Autonomy IDOL: Process and index all information.
  • Vertica: Analyze at extreme scale in real-time.
  • Enterprise Security: Collect & unify machine data.
  • nApps: Powering HP Software + your apps.

Come by to the conference tomorrow to learn all about how HP HAVEn and Hortonworks work together to meet your Big Data needs. Get more information here.

We look forward to seeing you there!

That Giant Sucking Sound is Your Big Data Science Project

shutterstock_144005719

Vertica recently hosted its second annual Big Data Conference in Boston, Massachusetts. It was very well attended with over eight hundred folks, and about two hundred companies represented. We at Vertica love these events for a few reasons – first because our customers tend to be our best spokespeople because it’s such a sound product, but also because it’s a chance for us to learn from them.

In one of the sessions, the presenter asked the audience how many of them had Hadoop installed today. Almost all the hands went up. This wasn’t too surprising given that the session was on Hadoop and Vertica integration. Then the presenter asked how many of those folks had actually paid for Hadoop. Most of the hands went down. Then the presenter asked how many of those folks felt that they were getting business value out of their investment. Only two or three hands stayed up. This was eye-opening for us at HP, and it was surprising to the audience as well. Everyone seemed to think they were doing something wrong with Hadoop that was causing them to miss out on the value.

Over the next few days, I made a point to track down folks in the audience I knew and get their thoughts on what the issues were. Since most of them were Vertica customers I knew many of them already. I thought it would be helpful to identify the signs indicative of a big data science project – a project where a team has installed something like Hadoop and is experimenting with it in the hope of achieving some new analytic insights, but isn’t on a clear path to deriving value out of it. And some clear themes emerged. And these align with what I and my colleagues in the industry have been observing over the last few years. So, without further ado, here are the top five signs that you may have a big data science project in your enterprise:

    1. The project isn’t tied to business value, but has lots of urgency. Somebody on the leadership team went to a big data presentation and has hit the panic button. As a result, the team rushes ahead and does…something. And maybe splinters into different teams doing different things. We all know how well this will turn out.
    2. The technologies were chosen primarily because they beef up resumes. There’s so much hype around big data and the shortage of people with relevant skills that salaries are inflated. And in the face of a project with high urgency, nobody wants to stand still. So download some software! That open source stuff is great, right? While it’s generally true that multiple technologies can solve the same big data problems, some will fit with the business more readily than others. Maybe they’re easier to deploy. Maybe they don’t require extensive skill retooling for the staff. Maybe the TCO is better. Those are all good things to keep in mind during technology selection. But selecting technology for “resume polishing”? Not so much.
    3. The project is burdened with too much process. Most organizations already have well-defined governance processes in place for technology projects. And, so the reasoning goes, big data is basically just a bunch more of the same data and same old reporting & analysis. So when it’s time to undertake a highly experimental big data analytics project which requires agility and adaptability, rigid process usually results in a risk-averse mindset where failure at any level is seen as a bad thing. For projects like these, failure during the experimentation isn’t just expected, it’s a critical part of innovation.
    4. The “can’t-do” attitude. It’s been a well understood fact of life for decades that IT departments often feel under siege – the business always asks for too much, never knows what it wants, and wants it yesterday. As a result, the prevailing attitude in many IT teams today is to start by saying “no”, and then line up a set of justifications for why radical change is bad.
    5. The decision-making impedance mismatch. Sometimes, organizations need to move fast to develop their insights. Maybe it’s driven by the competition, or maybe it’s driven by a change in leadership. And…then they move slooooowly, and miss the opportunity. Other times, the change represents a big one with impact across the company, and requires extensive buy-in and consensus. And…then it moves at a breakneck pace and causes the organization to develop antibodies and reject the project.

     

    So if your organization has one or more big data projects underway, ask whether it suffers from any of these issues. If so, you may have a big data science project on your hands.

No limits: How Big Data changes competition Data drives the bottom line, and technology is no longer limiting your competitors.

This post is condensed from a full article in the latest issue of Discover Performance, HP Software’s hub for IT thought leadership.

Business technology has always been a world of give and take. The more you ask for, the longer you wait. As technology improves, we compromise less—and in the case of Big Data, we can’t afford to compromise at all.

Today’s Big Data analytics platforms are making it possible for organizations to give the business everything: all the data, from all sources, in all formats, in real time, without limits. It’s a novel idea for most organizations, but it’s in the DNA of young, agile companies. This new breed of business is killing the competition by holding technology to the highest possible standard and putting data at the top of the value pyramid.

To compete, the rest of the market will need to act urgently to change their data ideologies and reject limitations as they store and explore data, and serve analytics insights to the business.

Competing with the new natives
“Leading companies today are changing the user experience while it is happening,” says HP Vertica VP Joy King. King says Twitter, as an example, is using real-time analysis of user demographics and usage trends to deploy new features and UI variations on the fly to limited “cohort” populations. The result is that people who use Twitter differently get a different experience—immediately.

“Compare a company using that approach to a company that’s relying on a report that comes once a week or once a month,” King says. “Who do you think will win?”

To stay on top of the new competitive pace set by the data-native enterprise, join Discover Performance, and get all our Big Data insights in your inbox.

Get Started With Vertica Today

Subscribe to Vertica