In the years I’ve been working with Vertica and other large scale data technologies, I’ve been a party to a large number of technology evaluations. Most businesses are familiar with these – they’re often called either a “proof of concept”, “proof of value”, or “pilot”. Technology evaluations are a key part of the technology selection process, wherein the business identifies a set of criteria which the candidate technology must meet (or exceed). These evaluations are tightly scoped operations, with clear cut input data, test scenarios, and defined metrics to measure success which are sponsored by company leadership.
At least, that’s the theory.
While some evaluations are very much the way I describe them above, many aren’t. In fact, many evaluations fail to demonstrate measurable value, and can in fact muddy the waters around technology selection – exactly the opposite of what they’re supposed to do. While there are all manner of things that can go wrong with evaluating a big data platform, I’ve seen organizations struggle with specific areas when conducting a Vertica evaluation. Here are the top five.
Mistake number 5: Don’t talk with any Vertica people at all
We’ve all bought cars, and have had to deal with car dealers. For many of us, talking with sales people can leave a bad taste in our mouths. This is unfortunate, because there is unique value to be found in talking with the right sales team. A skilled sales executive will know how to work with an organization’s leadership to map technology to strategy – which greatly increases the likelihood that an investment in that technology will pay off. A skilled presales engineer will know how to deploy the technology in ways that fit a particular business and use case(s) – which can serve as an accelerator in the project, and mitigate the risk of failure. Moreover, these teams accumulate knowledge on best (and worst) practices, and can be a powerful source of knowledge and guidance. By ignoring sales people, organizations run the risk of repeating mistakes made by others and possibly selecting the wrong technology for their needs.
Mistake number 4: Use 100% IT-defined success criteria
First, I have to say that I have nothing but respect for IT teams. I worked in various IT departments for many years before moving to the vendor side of the world. In my experience, they’re incredibly hard working, talented folks. But the people working in the technology trenches tend to think about the technology, not why it’s there. Rather than thinking of that Oracle operational store as “a key resource for business stakeholders to optimize day to day decisions,” they tend to think of it as “an Oracle database that needs to stay up at all times or the CEO will call the CIO and complain.”
This shapes expectations. And when it’s time to select new technology, IT will focus on the things it cares about – SQL completeness, availability, fault-tolerance, backup and recovery, and so forth. I’ve seen evaluations where the IT team made their “wish list” of criteria, and the vendor demonstrated every single one of them, only to see another technology get chosen. Because the test criteria didn’t matter to the business stakeholders.
Mistake number 3: Never, ever run the Database Designer
The other mistakes discussed here are pretty much technology agnostic – they can be issues in all sorts of evaluations. This one, however, is specific to Vertica. That’s because the Vertica team re-invented storage as opposed to borrowing somebody else’s storage engine and bolting on column-like features. While this bit is somewhat longer than the others, it bears reading because it is often the moment when the light bulb goes on for the database folks as to why Vertica has done so well in the market in recent years.
When a user creates a table in Vertica, two things happen:
- A logical table is created. . This is the structure that all users will query, insert to, update, delete from, and so forth. It is just a stub however.
- A super-projection is created. The superprojection is identical to the logical table.. However, it is the actual storage structure for the data. It uses certain rules for things like data distribution, sort and encoding – which are all part of the “secret sauce” of Vertica’s performance and scalability. The super projection is required because Vertica is a database – we need a spot where data can go in an ACID compliant form immediately.
But the beauty of the Vertica storage engine is that additional projections can be created, and they don’t all require every column. This is why we built our own engine from the ground up – so Vertica establishes a loose coupling between logical data model and the physical storage of that data. Additional projections can use fewer columns, other sort orders, different distribution keys, other forms of compression, etc. to deliver maximum performance. And the database will decide – when a query is submitted – which set of projections will make the query perform the best.
To make projections easier for our users to leverage, we’ve created a tool which is included with Vertica, called the Database Designer. This is unique in the industry as far as I know. A user only needs to create the desired tables and load a modest amount of data, then package up their queries and pass them to the Database Designer. The Database Designer will then test the queries and write SQL to create a set of optimized projections. In this way, the Database Designer can make just about anyone as effective as a skilled DBA when it comes to performance tuning.
Unfortunately, much of the market doesn’t understand Vertica and projections. So I often walk into conversations where the technology team has been told – usually by another vendor – that projections are “cheating” because they optimize performance. And so the business decides to deliberately avoid using the database designer to optimize performance. This is like telling yourself that breathing more oxygen during a foot race is cheating, so the runners should hold their breath during the race in order to slow the faster runners down and give the slower ones a chance of winning. I think I’m being generous when I call this a bad idea.
Mistake number 2: Don’t take it seriously
Sometimes, the technology team already knows which technology they want. And the technology evaluation is just a rubber stamp – the outcome is predetermined, and the team just needs the window dressing to make it look like they evaluated other vendors. This is a bad idea for two reasons. First, even if it’s all about putting a rubber stamp on a predetermined choice, it’s still a new use case for the technology. So the team has to plan to mitigate risk. And a well-executed technology evaluation is one good way to mitigate risk. Second, going into an evaluation having already chosen the technology will put blinders on the team – rather than looking for unique ways in which new technologies can be applied, the focus instead is on doing things very much the way they’ve been done before.
A few years ago, I was managing a field engineering team when we found ourselves in one of these evaluations. The company clearly had already chosen another vendor, but because they were already using Vertica (happily), a technology evaluation was required. The company didn’t take the evaluation very seriously, and despite the fact that our team executed flawlessly, the company went with their original choice. They didn’t pay attention to the fact that the Vertica team started (and finished) the evaluation within seven days, which was how long it took the other vendor to pack their equipment and prepare it for shipping to the customer. They didn’t want to see the findings our team uncovered highlighting revenue opportunities hidden within the data. They selected the other vendor as they’d planned all along. And after six months trying to implement it, the folks who had selected the other vendor were looking for new jobs. Moreover, most of the data science team quit in frustration. So in one fell swoop, they significantly damaged their analytics capabilities
So take it seriously, even if the choice seems predetermined.
Mistake number 1: Do an unrealistic test
One way to create an unrealistic test is to fit the test to the conditions, rather than the conditions to the test. The most frequent mistake here is using Vertica Community Edition, which is limited to three nodes and a terabyte of data, and then forcing the data used in the test to fit that limit. This is a bad idea for several reasons. First, the benefits of a distributed computing technology like Vertica don’t really show up at a terabyte. While you can run queries on the data, old school strategies such as indexing can make it look like row-oriented databases may perform as well. Second, it means “chopping down” the data – or making it fit the one terabyte threshold. This often results in artificial data, which brings with it all sort of problems. The biggest problem, however, is that it may no longer allow you to derive the insights which solve the problems you’re trying to solve. So test with a realistic volume of data. What is “realistic”? It’s a relative thing, but it should be more than just a token amount of data. Don’t feel compelled to limit your evaluation to a terabyte just because you want to run Vertica CE. This often goes hand in hand with mistake number 5 (don’t talk to any Vertica people). Don’t worry about talking with Vertica folks! We’re a friendly bunch with a commitment to the success of our customers. And we’re happy to set you up with an evaluation license that fits your data, so you don’t have to cram the data to fit the license.
Finally, there’s another way in which we see unrealistic evaluations. Particularly when the evaluation is driven by the IT team (see Mistake Number 4), the use case is often “run our existing queries faster”. While this is helpful, this is not what keeps the executive leadership awake at night. What keeps them awake? Fraud detection, personalized marketing, optimized business operations, new data products, and so forth. Note that the phrase “run our queries faster” did not appear on that list. So make the test realistic by asking bigger questions. What can’t the company do today because it can’t cope with big data? Why does it matter? These are the use cases which take a technology evaluation and translate it into terms leadership can understand – how is this technology going to enable the strategy of the business?
So there, in a nutshell, are the problems we see the most often in Vertica evaluations. We do a lot of these, and are quite good at it. So don’t hesitate to let us know when you want to try it out so we can help you avoid the pitfalls, and tie the technology to your strategy. If you’d like to talk with our team, click here to arrange a conversation.