Races can be won or lost with fractions of seconds separating first, second, and third.

Many factors determine who stands on the podium, awarded with points and champagne, and who returns to the pit on a recovery truck.

One factor that plays a critical role in helping Sam or Mitch win that race is optimal tuning of the vehicle. With everything happening from Free Practices, Qualifying, Superpole, and the ePrix races taking place in a single day – or more precisely, within just eight hours – time is critical.

Here’s what happens for the Jaguar Racing team: From the moment the car stops in the pit, and the umbilical cord gets connected between the car’s data logger and trackside computer, it’s a race to transfer data into Vertica. We’re not talking about a couple of sensor readings here. Each car during a single race could easily generate 500,000,000 data points, which adds in Jaguar Racing’s case to the hundreds of billions of data points already stored.

Thankfully with Vertica, data starts landing in seconds, with all the race data loaded in just a couple of minutes.

That is when crunching the numbers begins. Race Engineers need to quickly glean insights to the data just captured, along with all that historic data – insights that will help them finely adjust the car so Sam and Mitch can take the advantage on the track. Time is of the essence, as the engineers may have just tens of minutes before the car is back on track for the start of the next session.
This is why Jaguar Racing chose Vertica as their analytics database. Renowned for its impressive query performance over huge volumes of data, Vertica and its in-database functions outperform and outnumber any of the competition.

However, there is always room for improvement. Just as with optimally tuning Formula E cars being a continually evolving process, the same applies to Vertica.

Case-in-point: the APPROXIMATE_PERCENTILE function. This function computes the approximate percentile of an expression over a group of rows. Taking as parameters a single integer or float column, and the required percentile (0 to 1) the function returns a single float value.

As expected, when Jaguar Racing first tried this out, it could return that result in a couple of seconds. The problem was that they did not want to run with just one percentile, rather they wanted the 1%, 5%, 25%, median, 75%, 95%, and 99% percentiles. Sure, Vertica could handle seven calls to the APPROXIMATE_PERCENTILE function, but as they were to discover, each one added approximately 2 seconds to the query execution.

As I mentioned, time is critical at track side. Waiting 14 seconds for a query to complete was just not going to cut it for them.

Bring in Vertica Engineering. Within just a couple of days, this team had designed and built a Python UDx (User Defined eXtension) that replaced the original in-database function with one that can accept as many percentiles as you want in a single call to the function. And rather than returning a single float value, it now returns an array of floats – one for each percentile. What’s more, although increasing the runtime from 2 seconds for a single percentile to 3 seconds for generating the seven percentiles, this was still a huge improvement on running the function seven times.

Vertica Engineering saw the potential benefit to other Vertica customers. So, this was not left as a standalone UDx, but was included as part of the next release of Vertica, bringing the function to everyone.