This fascinated me. He created a simple machine learning model, then tested it on individual wines. I love this because it shows a practical application of my favorite SQL database, Vertica (AND, IT’S ABOUT WINE).
The purpose of the study was to determine all the components which make a wine good. Badr created this model using Kaggle data. The author behind the data was a French chemist who extracted the chemicals behind 5,000 wines. The chemical components can be found in the Kaggle database. You can use this wine analyzer model to test against new specific wines.
There are three impartial factors of a “good” wine.
- Zero faults: no visual defects (like grapes) and absolutely no bad odors.
- Good equilibrium related to the chemicals. Good balance between three variables [tannin, acidity, and smoothness].
- Fine length in the mouth. This means the taste of the wine stays in the mouth for a while, and is related to the quality of the grapes.
- Wines of the same type (red, white, or rose)
- Wine tasted by an impartial wine expert
We can create two types of model for this study:
- Regression: The model will predict the rate (between 0 and 10)
- Classification: We consider that a wine is good when the rate is greater than 6.5 and we create another variable good=1 if rate>6.5, 0 otherwise. This case is less precise than the first one.
Badr used the Kaggle data related to Vinho de Verde, a Portuguese wine. However, as the guy who tested all the wine didn’t include tannins in his features, the predictive accuracy of our algorithm will stay quite weak. That is a main variable. That’s why Badr pointed out that a very good business understanding is important, even before data ingestion.
Thanks, Badr for helping us create a systematic approach to identifying good quality wine with machine learning in Vertica.
Check out the full recording here: LINK TO THE VIDEO.
Also, we decided to put this model into action at a Partner event. Join Vertica, CB Technologies, Inc., & Tableau Software at 4PM on May 30th at JM Cellars in Seattle, WA. We will be talking about big data, analytics, and machine learning while tasting delicious wine!