Vertica Blog

Vertica Blog

Under the Hood

A key with Anomaly Description written on it

Detect Outliers Using Isolation Forest

The Isolation Forest (IForest) algorithm is a powerful and scalable algorithm for identifying outliers in your data. It uses a unique approach which focuses directly on characterizing outliers instead of normal data points. Specifically, IForest focuses on the two quantitative properties of outliers: they are few and different. A scalable implementation of the Isolation Forest algorithm is available...

When the typewriter is broken: Vertica’s Event-Based Windows to the rescue

Those of you who read my blog post “Take nothing but memories. Leave nothing but footprints” may recall my passion for using Raspberry Pi computers to capture IoT data from aeroplane transponders and GPS satellites, then feeding that streaming data into Apache Kafka and to integrate it with Vertica and perform near real-time analytics and...

What’s needed for a happy software dev team in data analytics?

For an organization to excel at data analytics, the IT team needs to coordinate a number of different disciplines and personnel with experience in those disciplines. This usually includes data analysts, data engineers, and, increasingly, data scientists. The data engineering discipline is sometimes thought of as the plumbing that, like pipes in a house, delivers...

VerticaPy Unify 2022 Sessions

Vertica Unify 2022 is a great time to learn about Vertica, its new features, and best practices. To complement the many great presentations at Vertica Unify 2022 both in Boston and Paris, I’m very excited to present two sessions: one on VerticaPy best practices, and another general session on VerticaPy and its features. VerticaPy is...

No need to extract data from your database to do your analytics!

While taking a long-awaited, and, IMHO some well-deserved R&R in the sun, I heard that ping notification coming from my backpack, and not being one who can ignore such things, reached for my iPad. With my OOTO email response already set, I should be able to ignore most incoming messages, texts and otherwise, safe in...
Disk usage, traffic, and other usage concerns

All-new Vertica Advisor report designed to improve health of customers’ analytics

As Vertica’s Vice President for Customer Success, I wanted to let you know that the Vertica Advisor report has been recently updated to help Vertica customers understand their database and usage, and to get the most from their analytical investments. If you are not familiar with the Vertica Advisor report, it is a Database Health...
Vertica wishes you a Happy Holidays

New VerticaPy Features Added – Just in Time for the Holidays

The team behind VerticaPy wishes you happy holidays and a happy new year from Boston, and from Grenoble in the French Alps! Make 2021 your year to operationalize Machine Learning at Scale with Vertica and Python.    VerticaPy is a Python library that exposes scikit-like functionality for conducting data science projects in your Vertica database....

Extracting Log Data into Vertica With Regex Parsing

Formatted text such as system logs hold a huge amount of actionable data if you can extract and discover the content. If there is a pattern to the lines, then you can build a Vertica regex parser to extract fields and contents into Vertica for query and analysis at scale. Vertica’s regex parser uses Perl-like...

Take Nothing but Memories. Leave Nothing but Footprints

For those who have read my previous blog postings, attended one of our Big Data & Machine Learning Meetups, or have met me at one of the many trade shows or conferences over the years, you will be all too aware of my love for tracking aircraft using a Raspberry Pi, Kafka and Vertica. For...

Vertica outperforms Redshift and Snowflake in third-party benchmark study

Vertica proven to be faster, more affordable, and more scalable in terms of concurrency in third-party benchmark over Amazon Redshift and Snowflake Choosing a cloud-based big data analytics platform for your organization’s analytical needs is seldom an easy task, and it can’t be the first step on your big data journey. Earlier steps should include...
The Earth

“awk” sed Grep, “whoami?”. Time CC if bash makes head or tail and sort > plain.txt and make us API

Things to do whilst confined to barracks due to COVID-19 For those who know me well, having worked in IT for four decades, will know that I can be a bit of a Luddite when it comes to new-fangled technology. Give me a character-based, green-screen PuTTY application over any pointy-clicky interface.  Let me use vsql...
Unified Data Analytics Warehouse

How can you get the value out of Hadoop that you were promised?

Since Hadoop’s initial release 14 years ago, untold volumes of data have been stored in HDFS (Hadoop Distributed File System). Spread across a virtual landscape of data-inspired organizations, those data lakes are wide, and deep. Companies have made tremendous investments in Hadoop over the years, and data continues to pour into their data lakes. You...