New Daily Resource: Batter vs Pitcher Clustering Analysis
I have developed a new script that I am pretty excited about. I think people will find useful for analyzing daily hitter matchups (mainly DFS, but valuable for season-long start/sit decisions as well). Here is the breakdown of what it is, how to use it, and where to find it!
The Idea
As you know, every day is different for a baseball hitter depending on the pitcher they are facing that day. Bryce Harper can be a must-play one day if he’s facing a bad righty like Mitch Keller, but then the next day if he comes up against a nasty lefty like Carlos Rodon, we probably don’t want to pay the exorbitant price for him.
We like to look at hitter splits (vs. RHP and LHP) when making decisions about hitters on a given day, and that’s all very useful - but I wanted to take it one step further.
The idea is, we want to find pitchers that are most similar to each other (based on handedness and the types and frequencies of the pitches they throw), and then see how each hitter has done against that larger group of pitchers. This splits the difference between BvP and just splits.
Looking at a hitter vs. an individual pitcher is far too narrow - it doesn’t give us nearly enough data to rely on
Looking at a hitter vs. a pitcher handedness is often too broad, it lumps hundreds and hundreds of pitchers together
We get a little bit of both here. We see how a hitter has performed against a larger group of pitchers that are similar to the pitcher he is facing today.
An Example
Byron Buxton and the Minnesota Twins face Madison Bumgarner today. Bumgarner’s arsenal looks like this (the data used is 2021 to current day).
Hand: L
Cutter: 37% Usage: 87 MPH
Four-Seam: 31% Usage, 91 MPH
Curveball: 20% Usage, 78 MPH
Changeup: 9% Usage, 84 MPH
Sinker: 3% Usage, 90 MPH
So we take this data, and then use K-MEANS clustering to look at all of the other pitchers in the league with arsenals similar to Bumgarner’s.
We get a list of 52 pitchers that cluster together with Bumgarner (these lists will be longer for right-handed pitchers since there are more of them in the league). Some names on the list:
Martin Perez, Hyun-Jin Ryu, Jon Lester, Jordan Montgomery, Marco Gonzalez, Patrick Corbin, Tyler Anderson, Joey Lucchesi, Ty Blach, Brett Anderson, Framber Valdez
Then we look at every plate appearance Byron Buxton has had against that long list of pitchers, and we find that he has done this against those pitchers since 2021 began
PA: 34
Brl%: 30.8%
K%: 20.6%
AVG: .424
SLG: 1.182
HR: 7
FB%: 34.6%
34 PA is far from the sample we want to put our trust in, so in this case, we would want to take that with a big grain of salt. However, there are plenty of hitters today that we have PA samples for well above 200 PAs.
Another example, Freddie Freeman faces Zach Plesac today, here’s what he’s done against righties that cluster with Plesac:
PA: 266
Brl%: 11.8%
K%: 9.8%
AVG: .346
SLG: .596
HR: 12
FB%: 26.6%
How To Access the Data
I have automated this entire process thanks to Baseball Savant and Python. There is a Google Sheet that will hold this data everyday, refreshed every morning with the new matchups. Here is what it looks like:
Here is the link to the live file, if you’re a member of this SubStack, your Google Account has been granted access already.
Here is the free sample of it, it has all of today’s data, but this one won’t be updated daily.
The live version of the file that is updated daily will only be available to paid members of this SubStack. I think it’s a great resource for DFS and season-long players alike, so Subscribe for just $5/month if haven’t already! My content will include tons of MLB offseason content and 2023 preparation, as well as fantasy football weekly notes - so you’re getting more than just the next 3 months of baseball with your subscription.