One of the most useful algorithms in the machine learning world is that of data clustering. Clustering, in this case, is defined as:
grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)
There are all kinds of different ways to apply this to fantasy sports. I wanted to see if we could cluster hitters together based on their projected statistical outputs in 2021. This provides a bit of a shortcut to see which players are similar in their fantasy contributions, and could even point out some undervalued players.
Editor's Note: Our incredible team of writers received five total writing awards and 13 award nominations by the Fantasy Sports Writers Association, tops in the industry! Congrats to all the award winners and nominees including Best MLB Series, NFL Series, NBA Writer, PGA Writer and Player Notes writer of the year. Be sure to follow their analysis, rankings and advice all year long, and win big with RotoBaller! Read More!
Hitter Clusters
Using Python coding and aggregate projections (the averages of multiple projection systems), I clustered all hitters that are projected for 400 or more at-bats into seven different clusters. Here is a quick overview of what the clusters look like.
Cluster 1
These ended up being the worst players. They have the lowest average ADP, and are tied for the lowest tier in average runs, homers, and RBI. This cluster does have a higher batting average than many of the other clusters. Examples: Tommy Edman, Lorenzo Cain, David Fletcher, Starlin Castro
Cluster 2
These players are in the middle of the pack in runs, homers, and RBI but are the worst in steals and batting average. Examples: Salvador Perez, Jared Walsh, Hunter Dozier, Adam Duvall
Cluster 3
You see a ton of homers and RBI in this group, but almost no steals and a below-average batting average. Examples: Eloy Jimenez, Nolan Arenado, Matt Olson, Max Muncy
Cluster 4
High in steals while being average or better in a few other categories as well. Examples: Trea Turner, Ozzie Albies, Bo Bichette, Tommy Pham
Cluster 5
Elite steals without much to speak of. Examples: Adalberto Mondesi, Victor Robles, Leody Taveras, Myles Straw
Cluster 6
Elite players. Examples: Ronald Acuna Jr., Fernando Tatis Jr., Mookie Betts
Cluster 7
A mixed bag, these are the players that didn't cluster well with the rest of the league. Examples: Luis Robert, Javier Baez, Brandon Lowe, Ramon Laureano, Ian Happ
Now that we have some data on which players are similar, we can take a look into the different tiers and spot players that project very similarly but are separated in drafts.
Stud Clones
We'll start at the top and look into cluster six, which had all of the best fantasy players in the game. Interestingly, some names with much lower ADP's ended up in the grouping.
George Springer: If you take what Springer has done over the last two seasons and pace it out to a full year, you have a guy that would score 111 runs, hit 44 bombs, drive in 107 runs, and steal six bases all while hitting .284. That's not a far cry from what we're expecting out of Juan Soto this year. Now, of course, that is likely a "best case" scenario, but Springer has gone a little bit under the radar with production recently. His upside seems doesn't seem to match up with his ADP near 50 right now.
Marcell Ozuna: In the same vein as Springer, Ozuna has been on a 94 run, 37 homers, 116 RBI, 10 steal, .272 pace over the last two seasons. The guy has just been awesome for fantasy purposes and his cost is only slightly higher than Springer with an average ADP of 45.
Corey Seager: He is the lowest projection in stolen bases of all players in this cluster, but the production from his last two seasons has been elite. The pace he's been on since 2019 is 97 runs, 29 homers, 97 RBI, and a .291 batting average, but with just three steals. He's potentially one of the most valuable bats in the whole league in runs, homers, and RBI and he's going in the third or fourth round in most drafts.
Power Specialist Clones
Jose Abreu is near the top in ADP in this HR/RBI specialist cluster. He is going at pick 38 on average with a projection of 87 runs, 33 homers, 111 RBI, 2 steals and a .280 batting average.
The thing is, by giving up a little bit of batting average, the Abreu type player is pretty darn replaceable. Here are some comparisons.
Player | Projection | |||||||
Name | ADP | R | HR | RBI | SB | AVG | ||
Jose Abreu | 38 | 87 | 33 | 111 | 2 | .280 | ||
Luke Voit | 61 | 86 | 34 | 94 | 1 | .264 | ||
Gleyber Torres | 68 | 83 | 30 | 89 | 6 | .272 | ||
Michael Conforto | 73 | 91 | 29 | 90 | 7 | .264 | ||
Eugenio Suarez | 73 | 87 | 38 | 103 | 3 | .254 | ||
Matt Olson | 75 | 82 | 36 | 98 | 2 | .246 | ||
Paul Goldschmidt | 95 | 89 | 28 | 84 | 4 | .275 | ||
Matt Chapman | 115 | 90 | 33 | 91 | 1 | .251 | ||
Mike Moustakas | 119 | 76 | 33 | 95 | 3 | .255 |
Clearly, Abreu out-paces everybody in RBI and batting average by the projection, but Chapman and Moustakas are basically a poor man's Abreu and they go 70+ picks later. When you're comparing those two guys with names like Voit and Torres going 60 picks ahead of them, the discrepancy really drops. Strip off the names and you would never guess those players would be separated by so far in the draft.
Goldschmidt, Chapman, and Moustakas seem undervalued here.
Average / Steal Clones
As I talked about here, there really is no good way to be competitive in steals that doesn't involve spending at least one very high draft pick to address the need. However, there are players that project pretty similarly despite going at very different spots in the draft.
The name that stands out here is Tommy Edman. His ADP sits at 129 and his projection is 73 runs, 13 homers, 55 RBI, 15 steals, and a .268 batting average. Admittedly, I think that batting average projection is really low. However, let's go with it.
We look in the same cluster and find these names, all with very similar projections.
Player | Projection | |||||||
Name | ADP | R | HR | RBI | SB | AVG | ||
Jean Segura | 185 | 72 | 14 | 64 | 10 | .282 | ||
Andrew Benintendi | 223 | 70 | 15 | 63 | 12 | .261 | ||
Raimel Tapia | 250 | 69 | 10 | 53 | 17 | .280 | ||
Lorenzo Cain | 255 | 67 | 10 | 45 | 15 | .270 | ||
Kolten Wong | 306 | 68 | 10 | 54 | 15 | .265 | ||
Amed Rosario | 327 | 54 | 10 | 48 | 12 | .276 |
You shouldn't forego drafting someone just because there is someone similar in a later round, I mean you need a bunch of players on your team so it's fine to take a handful of guys that project the same, but does it really seem like Raimel Tapia and Tommy Edman should be separated by 120 picks? Doesn't Lorenzo Cain feel awfully cheap as a guy that can steal bases, score runs, and keep you afloat in batting average? Seems to me he does.
"Little Bit Of Everything" Clones
The next names on my knock list are Bo Bichette and Ozzie Albies. Here's how they project this year:
Player | Projection | |||||||
Name | ADP | R | HR | RBI | SB | AVG | ||
Bo Bichette | 24 | 92 | 24 | 85 | 21 | .284 | ||
Ozzie Albies | 35 | 92 | 24 | 84 | 15 | .282 |
What they really have going for them is the upside. They are both super young players with great prospect pedigrees and massive ceilings. It wouldn't be surprising in the slightest to see those two guys be top-20 fantasy hitters with ease. However, if we're just considering median projections, there are a lot of guys who look pretty darn similar:
Player | Projection | |||||||
Name | ADP | R | HR | RBI | SB | AVG | ||
Starling Marte | 51 | 85 | 19 | 70 | 24 | .278 | ||
Charlie Blackmon | 89 | 88 | 24 | 81 | 5 | .292 | ||
Jose Altuve | 98 | 93 | 22 | 73 | 10 | .283 | ||
Alex Verdugo | 129 | 83 | 17 | 63 | 8 | .288 | ||
Tommy Pham | 134 | 75 | 19 | 67 | 18 | .268 | ||
Michael Brantley | 151 | 75 | 17 | 75 | 5 | .295 |
In this case more than the previous cases, it makes more sense to still attack Bichette/Albies, and then just take some of these under-valued names later as well, just because of the game-breaking upside Albies and Bichette offer if they make strides forward that we all suspect they're capable of making.
However, if you really like Albies because he's solid across the board, it's good to be aware of Altuve there going 60 picks later and looking pretty similar by the projections.
HR + SB Clones
The most valuable fantasy players are often the guys that are well above average in both homers and steals. That is pronounced this year with Ronald Acuna Jr. (projected for 39 homers and 29 steals) and Fernando Tatis Jr. (36 and 26) atop most draft boards. There's certainly no way to replace those guys later in the draft, but there are some guys that have both the pop and the speed to flirt with 30/30 while not having premium draft costs.
Some of those names by the projections are Luis Robert (36 ADP, 28 HR, 23 SB), Randy Arozarena (57 ADP, 24 HR, 19 SB), Trent Grisham (78 ADP, 24 HR, 16 SB), Tim Anderson (42 ADP, 22 HR, 17 SB), and the lone man outside of the top 100, Byron Buxton (114 ADP, 25 HR, 18 SB). If you're going hunting for upside, the power/speed combination is a good thing to set your sites on.
Download Our Free News & Alerts Mobile App
Like what you see? Download our updated fantasy baseball app for iPhone and Android with 24x7 player news, injury alerts, sleepers, prospects & more. All free!
More Fantasy Baseball Advice