Projection Clustering - Hitters

I use a clustering algorithm to group hitters by their five category projection to spot similar hitters up and down the draft board

Mar 05, 2024

"Why draft _____ when you can draft _____ way later?"

It’s a common trope in the fantasy game, and most of the time at this point it’s used to make fun of people rather than to make a serious point.

But it’s still

Slightly useful
Really fun

So that’s what we’re here to do today, and the methodology I use is unique.

A clustering algorithm can be defined as:

a way for a computer to group similar things together
Example:
Imagine you have a bunch of different fruits, and you want to organize them by type. A clustering algorithm would look at the characteristics of each fruit, like its color, shape, and size, and then put them into groups like "apples," "bananas," and "oranges." In the same way, clustering algorithms can be used to group data points in a dataset based on their features. This can help us find patterns and make sense of large amounts of data.

We can treat hitters like fruits and use their projections to cluster them together. Since we’re mostly concerned with standard roto leagues, we’ll use:

R, HR, RBI, SB, AVG

I used seven clusters just to separate hitters off a bit. We’ll go through each one and then find some of these similarly projected hitters that are separated quite a ways in drafts.

The Clusters

Here’s a summary of each cluster with the average stats from each:

So the algorithm doesn’t put them in any particular order. Let’s just pick through them and see what we can find.

Cluster 2 - The Five Category Studs

There’s a huge gap between Ronald Acuna Jr. and Trea Turner (projections don’t consider the recent injuries, by the way), but the algorithm decided not to put Acuna in his own cluster. What we have here are solid to great contributors in all five categories. None of them go outside of the top 10 picks on average, so kudos to the algo for that one!

Cluster 2 Replacements

This is the section where I’ll be talking about the “why draft BLANK when BLANK is cheaper” or whatever. For this cluster, there’s nothing here since we’re all in the first round. But Trea Turner could do the same thing as Carroll this year, it’s not crazy to think it!

Cluster 5 - Studs Minus Steals

Ohtani must have just barely missed the cluster 2 cut there. This is largely a list of round 1-2 hitters. They all have elite R+HR+RBI projections, and I’m guessing that was the main input in the model that made this determination. Kyle Schwarber is the cheapest (ADP-wise) of the bunch, impressing the algorithm so much with his 99-41-95 that it included him even with the .224 batting average.

In a roto league, these hitters are absolute studs for your fantasy team as long as you’re getting steals elsewhere. They all pair extremely well with a high steals batter like Corbin Carroll or Trea Turner. That’s one of my favorite ways to start a draft this year, Carroll+Riley or Turner+Olson.

Cluster 5 Replacements

Kyle Schwarber does a pretty good Pete Alonso impression a few rounds later
Austin Riley/Rafael Devers aren’t much different than Aaron Judge if Judge misses a few weeks like he usually does

Cluster 7 - Steals & Something Else

We have a group of steals sources here, although there’s a pretty wide range here from 34 projected steals down to 13. But they all give you some steals without killing you in multiple other categories.

There is a lot of variety in this cluster. You’d be hard-pressed to find two players more different than Elly De La Cruz and Steven Kwan, so this could probably be split into two different clusters based on power output.

This player type is the one I want to hit often in the middle of the draft. The studs are gone by round three or four, and ideally, we want to be continually adding incrementally to our steals count while filling in other needs as well.

Getting a steals and homers player like Jazz Chisholm pairs really nicely with a steals and batting average guy like Nico Hoerner. You can also balance a very low batting average player (Daulton Varsho?) with a high batting average like Steven Kwan.

None of these players are great standalone bats, but they gel very well together if you do it right.

Cluster 7 Replacements

Jarren Duran projects a lot like guys you’ll find in rounds 4-5, and that’s without even considering the power potential he has. He’s a great value and a staple on my teams this year.
I also can’t help but pick out Trevor Story and Daulton Varsho. They both go really late are just a lucky .250 batting average away from performing like players that go much, much earlier in the drafts
Ezequiel Tovar looks a lot like Xander Bogaerts and Dansby Swanson…
Zack Gelof could put up a season that looks like a lot of the years we’ve gotten from Francisco Lindor

Cluster 1 - Steals & Shakiness

Here’s your Esteury Ruiz cluster. He is unlike anybody else, but the general rule from his profile follows here.

If you’re in desperate need of steals, you can catch up with some of these names, but they are most likely going to hurt you in multiple other categories.

It’s tough to project rookies. Most of the time, the projections on them are very light. We see a handful of those names here with Jackson Chourio, Parker Meadows, Noelvi Marte, Wyatt Langford, and Jackson Holliday all showing up here. They all stole bags in the minors, and there’s a high correlation between minor league and Major League steals, so we project them to steal bases in the Majors.

The other translations are tougher. There’s a much weaker correlation between home run and batting average output, so we cannot be confident at all that the homers will be here for these names this year.

There’s a big difference between the young guys here and the veterans. We know that Brandon Marsh, Brendan Donovan, Starling Marte, and Tim Anderson will do little else other than steal bases, so the preference goes to the young guys from this cluster.

Cluster 1 Replacements

It’s a cluster of cheap hitters, but super late round picks like Parker Meadows, Tommy Pham, and Maikel Garcia could be solid contributors across the board if things go right for them. I like all three of those names this year.
Jose Siri looks like a legit 25-25 threat. He doesn’t play shortstop, but you could definitely do the “WHY DRAFT ANTHONY VOLPE WHEN YOU CAN HAVE JOSE SIRI 100 PICKS LATER”
Tim Anderson can do a darn good Jeff McNeil or Steven Kwan impression. Those four all fit the mold of low power, higher runs, higher batting average, and Anderson is the cheapest one of the bunch.

Cluster 0 - Counting Stats Non-Studs

Remember that when computers start counting, they start at zero!

I couldn’t come up with a very catchy name here. This is a huge cluster (40 hitters), so there’s not one common thread running through it. But the general rule that I spot here is that these are hitters that help your team in R+HR+RBI, but hurt you in steals.

Overall though this is a safe cluster of hitters. I don’t see anybody here I vehemently don’t want on my fantasy teams at all, and there are some young guys with huge upside (Harris, Henderson, etc.).

Cluster 0 Replacements

Rhys Hoskins is a poor man’s Pete Alonso
Teoscar Hernandez in the Dodgers lineup could give you what Adolis Garcia does this year given the declining steals on Garcia
Jake Burger and Josh Jung in the corner infield spot projects mighty similarly to the more expensive Triston Casas and Spencer Torkelson (although I want two of those four names to fill 1B/3B/CIF)
Speaking of Triston Casas, I don’t think it’s crazy to think he couldn’t have a Matt Olson type season. There won’t be nearly as many RBI opportunities for him, but the raw power and is similar.
It’s hard to ever know what to expect from Marcell Ozuna, but the 30-homer, 86-RBI projection has him looking pretty similar to a guy like Mike Trout and Anthony Santander (the DH only eligibility, age, and inconsistencies hurt him significantly though)

Cluster 3 - We’ve Got Some Problems

These are guys who project very poorly in multiple categories. That sounds like a cluster of hitters you don’t want, but that is not the case with all of them. We have a lot of catchers here, so an adjustment has to be made there. We also have a lot of solid hitters with playing time concerns. There are plenty of power-hitting injury risks (Stanton, Bryant, Lowe, Muncy), and plenty of players that could be quite good but are in platoons or uncertain situations (CES, Suwinski, Kelenic, Rooker, Wallner).

Mixed bag of players here. It would have been helpful to include PAs in the algorithm, which I didn’t do, and that’s why you see potential studs like Jordan Walker, CES, and James Outman lining up in the same cluster as pretty bad fantasy hitters like Keibert Ruiz, Austin Hays, Orlando Arcia, and Alex Verdugo.

Cluster 3 Replacements

WHY DRAFT LUIS ARRAEZ WHEN YOU CAN HAVE JUNG HOO LEE 100 PICKS LATER? I put that in capitals to make light of the overly-simplistic idea here, but actually I agree pretty hard with that one.
With a little bit improvement, Jack Suwinski could put up a very good homer total with a good supply of steals, and that could have him looking a lot like an Adolis Garcia or Royce Lewis much later in the drafts

Cluster 4 - They Suck

Don’t get me wrong, these guys are all at least 250 times better than me at baseball, but they aren’t cutting it for fantasy purposes.

It’s the same as cluster four but with less skill and/or more playing time questions. We have mostly catchers and utility-type players in this cluster.

But this is where a lot of the post-hype prospect names live. I can see a world where the Rookie of the Year Award race includes Kyle Manzardo, Junior Caminero, Pete Crow-Armstrong, and Jordan Lawlar more than it includes Jackson Holliday, Evan Carter, and Wyatt Langford. And the former names there are free in drafts while the latter are quite pricey.

The Shea Langeliers projection to me looks like what another down year for Salvador Perez could look like, so I like him as an emergency catcher option in the final rounds.

So there you go, I’m not sure if this was the most helpful piece, but I enjoyed it.

MLB Data Warehouse

Discussion about this post