Skip to content

Event Recap: Beer & Analytics - November 2022

  • November 10, 2022

I was honored to be asked to present at Continuus’s Alteryx-sponsored Beer + Analytics event earlier in November. By “honored” I mean stoked. Like, seriously geeked out at the thought of talking about two of my favorite things.

The presentation I gave was a demo as to how we might use Alteryx to build a tool that makes beer recommendations. Given that we didn’t have a history as to which beers our event attendees like, the workflow started with a file containing more than 5,500 detailed beer reviews and focused on two lines of analysis to categorize them:

1)     Analyzing a written description field and

2)     Doing some cluster analysis on fields with scores

The resulting set of both analyses taken together can make for some pretty ‘dead on’ suggestions when the user inputs a beer they know they like.

But I’m getting ahead of myself.

221207_Beer+Analytics_1

Sample snapshot of the beer reviews file.

The first thing we analyzed was the copy in the ‘description’ field by doing some text analysis. We removed words that wouldn’t add any value to the categorization of the beers, but that were common in text description fields, such as ‘beer’, ‘brewed’, and ‘ale’.

221207_Beer+Analytics_2

Building a word cloud showed us which words were used most often, and therefore may not provide value to the categorization of the beer.

Doing some Topic Modeling, we found that categorizing beers into four groups provided enough difference between the groups to be useful. Based on the most prevalent words in those beer descriptions, we renamed the categories Traditional, Dark, IPA and Other, and determined which of those had the highest score for each beer, calling this our Top Profile.

 

221207_Beer+Analytics_3The keywords prevalent for Group 3 showed these are likely dark beers.

Pretty good so far! Only if we figure we’ve split our 5,500 beers into four groups, that means we may have as many as 1,375 beers in each group, which isn’t exactly useful when trying to make a single recommendation. To get more specific, we then focused on our additional beer data.

Each beer in this review file contained rankings for characteristics such as body, bitterness, sweetness and maltiness. We then used a K-Centroids Cluster Analysis tool to do some partitioning cluster analysis based on mean values of our beers.

Here’s the point where I reassure you not to be intimidated by some of these terms. A centroid is the center point of our data – so of all the beers in our file, what is the average or mean value for, as an example, sourness? And from there, what is the difference between that overall mean and the mean value for the beers in that cluster? We’re just grouping like-items here – beers that are similar in certain aspects.

221207_Beer+Analytics_4

This is telling us that beers in cluster 1 have a mean value of sour scores that is .3 less than the average mean value for all beers. Those beers are likely less sour than the average, whereas beers in cluster 10 are much more sour than the average.

If this is all still a bit above your head, rest assured. Alteryx’s K-Cluster Analysis tool has the python code embedded in it, so it’s doing all the heavy lifting for you. I certainly didn’t write it!

Once we append our cluster number to our previously identified profile groups based on our text analysis, we then get a combination score that provides a much more specific grouping.

221207_Beer+Analytics_5There are more commonalities between beers in the same profile and cluster than beers in the same profile and different clusters. So IPA 6’s are much more alike than IPA 2’s.

This information is great, but what do we do with it? Well, we can build another workflow using some of Alteryx’s interface tools. Using the file we just created with the profile and cluster values appended to it, we can give users a way to select a beer from the drop down they know they like. The workflow then filters the full beer list by the profile and cluster values of the beer selected to provide an output of beers in the same groupings.

 

RELATED NEWS

Contribute on my charity work by your donation.

Easy Button: Alteryx Server Log Collection

Collecting Logs for Alteryx Support 

Background

Occasionally you may encounter an issue with your Alteryx server that...

by Conrad Kite

The Power of Snowflake and FactSet

The days of hosting big technology and staff or spending thousands to access fast information are over. The Data Cloud...

by Andy Leichtle