Explaining Machine Learning to Fido 3.0
In the first (and second) posts in this series, we walked through how we’d build a model to predict the likelihood of someone receiving surgery. We had data and in the data we could see if an individual received surgery or not - we had an outcome variable. We used the data to train an algorithm. In machine learning parlance, we worked through a supervised learning problem.
But what if we didn’t have an outcome to forecast, we just had data. We couldn’t predict surgery, but there’s lots of insight we might want to draw out, including:
● Identifying similar treatment patterns
● Segmenting claims into more defined categories
● Grouping classes together (e.g. ACL, MCL -> knee)
We can avoid manual work, and avoid inserting our own subjectivity, by using the tools of machine learning to flesh this information out. In the jargon, these use cases are called - take a moment, see if you can guess - unsupervised learning problems. Unsupervised learning algorithms do not require outcome variables in the data. A common unsupervised learning approach is clustering, in which we train an algorithm to group observations in a dataset together.
To illustrate how this can work, I’ve mocked up some data of popular beers; assume this is a sample from a larger dataset.
We can use a popular algorithm called K-means to put these beers into a cluster. “K” is for the number of clusters that we want to find. There is no “right answer” in clustering, and while there are metrics that can guide us, we have to choose a number before training the algorithm. The K we choose - let’s use 3 - creates k clusters, called centroids. In the image below (from this excellent interactive resource), I’ve added the centroids randomly.
Each datapoint - our beers - are simply assigned to the closest centroid. The algorithm then moves the centroid to the center of its distribution.
And then reassigns the data points to the closest centroid.
It repeats this process - moving the centroids to the center of its distribution, reassigning the data points - until the centroids stop moving. That’s all there is to the algorithm.
We can look at the average values of our clusters to parse what our k-means algorithm found. Our first cluster is differentiated by higher cost beers, our second cluster by lower calories and ABV, and our third cluster by higher sodium.
Applying the cluster labels to our samples, we find Heinken and Kirin in the high-cost cluster, both imported beers. Bud Light is tagged as a light beer, while our domestic lagers fall into the high sodium cluster.
With a larger sample of beers, we would find many more potential clusters. But we can also appreciate that the larger the sample of beers - the larger the sample of anything - the more challenging identifying those natural groupings would be.
And so we can stop for today with our better beer taxonomy. But more than that, you are now equipped to think about what clustering does, and how it can help you understand your data.
To learn more about how you can achieve RADICAL results for claims in 2021,
We Invite You to follow our LinkedIn Page
Join Us Each Wednesday for a 30-minute power-packed educational session at CAREForwardLIVE!
12:30pm EST ⏰ 11:30am CST ⏰ 10:30am MST ⏰ 9:30am PST ⏰5:30pm UK
For More Information:
To Initiate a REFERRAL for Services
Visit Our Website
Call Toll-Free: 888-434-9326 Ext. 101