Fantasy baseball is, most of the time, a category-centered game. Unlike fantasy football, where all you want are raw points, the roto category format of baseball really separates certain players in terms of how they can help your fantasy team. Some hitters will boost your team's batting average while bringing down your home run pace, and vice versa.
Because of this fact, it is important to have a good feel for how different players are likely to help your team. You cannot just sort by projected points scored and draft accordingly, you have to be careful in allocating your draft capital. The best way to go about the draft is to attack each and every category and try to end up with a well-balanced team (especially in rotisserie leagues, it's almost impossible to win a roto league if your team is drawing dead in a category or two).
With the help of the richness of baseball data, we can use a data science technique called clustering to separate these players for us. What this technique does is take a data table and separate the different row entities into categories based on what columns we choose. This will "cluster together" like data points to show which data points (in this case, which hitters) are most alike.
The Process
For this analysis, I chose four categories: barrel rate, strikeout rate, ground-ball rate, and sprint speed. I think these categories give us a general picture of how a given player will help a fantasy team. A high-barrel, high-strikeout player will add home runs to your team while typically bringing down your batting average. It is important to choose categories that aren't strongly correlated together, which is why I did not also include something like exit velocity in this because barrel rates capture that pretty well already.
Using these four data points, I clustered together every fantasy-relevant, qualified 2021 hitter. They were separated into five categories, which I'll describe here:
Cluster 1: Average Brl%, High K%, High GB%, Average to High Speed
example players: Adalberto Mondesi, Randy Arozarena, Javier Baez
Cluster 2: Low Brl%, Low K%, High GB%, High Speed
example players: Trea Turner, Starling Marte, Myles Straw
Cluster 3: High Brl%, High K%, Low GB%, Average Speed
example players: Fernando Tatis Jr, Shohei Ohtani, Bryce Harper, Salvador Perez
Cluster 4: Low Brl%, Low K%, Average GB%, Low Speed
example players: Nolan Arenado, Jesse Winker, Anthony Rizzo
Cluster 5: Mixed bag. Lots of the studs are here, but also some very boring players and some total duds
example players: Jose Ramirez, Kyle Tucker, Luis Robert, Christian Walker, Didi Gregorius, Rougned Odor
The end goal here is to find players with very similar profiles with very dissimilar ADP. We want to locate some players going very late that can contribute to your fantasy team in the same (but possibly not as forceful) way as a player you may have missed out on earlier in the draft.
Clone Search #1: The Five-Category Stud
This is the most exciting player to try to locate. Everybody wants to draft the guy in the 10th round that will be going in the second round next year.
Model player: Jose Ramirez
Player | ADP | Brl% | K% | GB% | Speed |
Jose Ramirez | 4.6 | 11% | 14% | 36% | 28.2 |
Clone Group #1: Expensive
Player | ADP | Brl% | K% | GB% | Speed |
Ozzie Albies | 19.8 | 9% | 19% | 32% | 28.4 |
Trevor Story | 40.1 | 10% | 23% | 37% | 28.7 |
Bryan Reynolds | 100 | 10% | 19% | 40% | 28.7 |
Albies is a second-round pick in most leagues. It would not be super surprising to see him really push into the first round next year, as you can see the very strong all-around profile here. The barrels and strikeouts are steps below Ramirez, but as a whole, this is a really impressive collection of numbers for a guy that is still just 25 years old.
As for Story, he was a first-round pick a year ago, and now he's falling into the third and fourth rounds. The reason for that is the departure from Coors Field, which is quite likely to hurt the batting average and homer count to some extent, but a new city isn't taking away this guy's power and speed combination. There's some value here.
Reynolds is faster than he's given credit for! You don't find a ton of guys with a double-digit barrel rate and a strikeout rate under 20. It is a bit deflating to draft a player from such a pitiful offense, but Reynolds posted strong numbers all over the place last year and he deserves some recognition as a fantasy near-stud.
Other notables in the cluster: Francisco Lindor, Daulton Varsho, Austin Meadows
Clone Group #2: Cheap
Player | ADP | Brl% | K% | GB% | Speed |
Willy Adames | 140 | 11.4% | 28% | 37% | 28.0 |
Austin Hays | 206 | 9% | 20% | 43% | 27.4 |
Max Kepler | 280 | 11% | 20% | 38% | 27.4 |
Yes, it is crazy to say that Max Kepler is in any way like Jose Ramirez, but I bet you didn't realize he posted an 11% barrel rate last year with 10 steals. The guy has real pop and a lot of speed. My favorite of this group, and someone I think could really take a huge step in 2022, is Adames. With the Brewers, he slugged .521 with a great 20.6 PA/HR and an acceptable strikeout rate of 25.5%. He only stole five bags, but four of them were with the Brewers and the guy clearly has the speed to elevate that part of his game.
Then there's Austin Hays who at this point I think you would have to call "post-post-hype". The Orioles moving that left-field fence back hurts his power upside, which was limited from the jump, so I wouldn't expect much from Hays this year. However, he does profile for some upside with the above-average marks in everything we're looking at here.
More notables in the cluster: Robbie Grossman, Gleyber Torres, LaMonte Wade Jr.
Clone Search #2: The Power Specialist, No Huge Holes
These are those guys that will be elite contributors in homers and RBI and just "fine" in the other three standard categories.
Model player: Bryce Harper
Player | ADP | Brl% | K% | GB% | Speed |
Bryce Harper | 11 | 18% | 23% | 42% | 27.8 |
Harper's strikeout rate and speed marks are better than the average player in this cluster, so he is not the perfect model player for what we're trying to do here. However, I wanted to use first-round players as reference points. Let's look.
Clone Group #1: Expensive
Player | ADP | Brl% | K% | GB% | Speed |
Austin Riley | 55 | 13% | 26% | 39% | 27.8 |
George Springer | 57 | 15% | 23% | 33% | 28.4 |
Brandon Lowe | 84 | 14% | 27% | 35% | 27.6 |
Giancarlo Stanton | 97 | 16% | 27% | 46% | 24.7 |
Kyle Schwarber | 124 | 18% | 27% | 39% | 26.8 |
In the case of Riley, Lowe, Stanton, and Schwarber - these are guys we formerly saw much worse strikeout rates from. All of these guys have made more contact in recent years in their career and that has helped elevate their batting averages. In the case of Springer and Lowe, you might even squeeze out 10 steals or so as well. Given these names are healthy (a big if in the case of Springer and Stanton), they can be confidently relied on for a bunch of homers while not cratering your team anywhere else.
Other notables in the cluster: Pete Alonso, Rhys Hoskins, Mitch Haniger
Clone Group #2: Cheap
Player | ADP | Brl% | K% | GB% | Speed |
Joey Votto | 164 | 17% | 24% | 33% | 25.1 |
Hunter Renfroe | 176 | 14% | 24% | 33% | 27.3 |
Jorge Soler | 196 | 13% | 24% | 41% | 26.9 |
Josh Donaldson | 221 | 17% | 21% | 44% | 24.5 |
Brandon Belt | 231 | 17% | 27% | 29% | 25.6 |
More guys with high barrel rates and middling strikeout rates. In the case of Renfroe and Soler, we saw quality strides last year in making more contact without losing much power in the process. Those two remain quite cheap for 2022, at least for now. As for Donaldson, Votto, and Belt - these guys can certainly hit the long ball but there are questions given their ages and health issues. Votto has stayed on the field, but in the case of Donaldson and Belt, it's hard to see either guy playing 150 games. Either way, these are names that can catch your team up in homers in the middle of the draft without crushing your soul in batting average.
Other cheap notables in this cluster: Kyle Lewis, Sean Murphy, Evan Longoria
Clone Search #3: The Power Specialist - All Or Nothing
Model player: Tyler O'Neill
Player | ADP | Brl% | K% | GB% | Speed |
Tyler O'Neill | 51 | 18% | 31% | 36% | 29.7 |
O'Neill isn't the best model player here, but I wanted someone near the top-50. The reason for that is that O'Neill will contribute in steals, while the rest of these names are unlikely to do so.
Clone Group
Player | ADP | Brl% | K% | GB% | Speed |
Franmil Reyes | 132 | 17% | 32% | 46% | 26.1 |
Joey Gallo | 185 | 19% | 35% | 36% | 27.4 |
Matt Chapman | 188 | 14% | 33% | 34% | 28.1 |
Adam Duvall | 221 | 16% | 31% | 31% | 28.6 |
Bobby Dalbec | 231 | 20% | 35% | 37% | 28.2 |
Miguel Sano | 266 | 18% | 35% | 39% | 26.5 |
Luke Voit | 282 | 16% | 31% | 41% | 24.9 |
Patrick Wisdom | 321 | 16% | 41% | 32% | 28.1 |
Sam Hilliard | 358 | 15% | 37% | 44% | 28.9 |
If you're a seasoned fantasy baseball player, none of these names will surprise you. These are the classic swing-for-the-fences guys who will smash homers at a high rate but will absolutely murder your team in batting average (and OBP in most cases). It is tough to draft any of these guys, but if you start your offense off with a couple of batting average studs, maybe you can take one of these names to boost your homer count.
Clone Search #4: Steals!
This one is tougher since I used sprint speed instead of actual stolen bases. There is much more to stealing bases than just being fast. But we'll do our best.
Model player: Whit Merrifield
Player | ADP | Brl% | K% | GB% | Speed |
Whit Merrifield | 31 | 3.5% | 14% | 42% | 28.6 |
In recent years, steals have come down. We have seen the Dee Gordon type (guy that leads off and steals tons of bases while doing very little of anything else) really dissipate. Nowadays, the guys stealing a bunch of bases are actually the league's best overall hitters. That makes it pretty tough to find steals after the first few rounds are gone, but Merrifield is a good guy to look at here nonetheless. You can see that he can't be counted on for much besides steals (Adalberto Mondesi would have been another good choice). We'll try to locate a few of these types that go much later in the draft in case you miss out on steals early on.
Clone Group
Player | ADP | Brl% | K% | GB% | Speed |
Adalberto Mondesi | 54 | 13% | 32% | 44% | 28.5 |
Randy Arozarena | 57 | 8% | 28% | 49% | 28.8 |
Jazz Chisholm | 75 | 9% | 29% | 49% | 29.1 |
Myles Straw | 135 | 1% | 19% | 41% | 29.3 |
Akil Baddoo | 162 | 9% | 27% | 40% | 28.9 |
Amed Rosario | 168 | 3% | 20% | 51% | 29.5 |
Harrison Bader | 242 | 7% | 22% | 44% | 29.5 |
Jo Adell | 244 | 9% | 23% | 48% | 29.9 |
Isiah Kiner-Falefa | 277 | 2% | 13% | 54% | 28.0 |
Garrett Hampson | 285 | 5% | 24% | 42% | 29.9 |
The guys here that might actually contribute in homers: Mondesi, Arozarena, Chisholm, Badoo, Adell. That makes Badoo and Adell the only guys outside of the top-100 there, and there are plenty of fair questions about if those guys can stay in the lineup all season for 2022.
Steals are tough to find, I really suggest you invest in that category early (makes guys like Starling Marte much more valuable despite the age and lack of power).
Clone Search #5: Batting Average
Much like steals, you really don't want to depend on the David Fletcher (tons of contact, no power whatsoever, and limited steals) types to get your team over the hump in batting average. Slotting in a guy that is only going to hit you a handful of homers is a pretty big anchor that you don't want. So we'll focus on guys like Edman.
Model player: Tommy Edman
Player | ADP | Brl% | K% | GB% | Speed |
Tommy Edman | 83 | 4% | 14% | 46% | 28.9 |
He actually hasn't posted great batting averages recently, which is a different story - but this is the model we're looking for. Players that don't strike out and keep the ball out of the air (ground balls and line drives are best for batting average provided the guy isn't absolutely cranking the ball - which doesn't happen in this cluster of players). Let's see it.
Clone Group
Player | ADP | Brl% | K% | GB% | Speed |
DJ LeMahieu | 117 | 4% | 14% | 52% | 26.5 |
Jake Cronenworth | 120 | 7% | 14% | 43% | 28.5 |
Alex Verdugo | 154 | 7% | 16% | 50% | 27.0 |
Tyler Stephenson | 159 | 5% | 19% | 50% | 26.9 |
Yuli Gurriel | 199 | 4% | 11% | 42% | 27.0 |
Charlie Blackmon | 250 | 7% | 16% | 47% | 27.7 |
J.P. Crawford | 305 | 3% | 17% | 47% | 28.1 |
Yandy Diaz | 385 | 7% | 16% | 52% | 26.5 |
I left out those Fletcher-types, but if you really just need batting average with no regard to power numbers, you could also look into Raimel Tapia, Nick Madrigal, Adam Frazier, Fletcher, or Jeff McNeil.
The most interesting bats here to me is Cronenworth, who we did see a stretch of power-hitting from last season (it's certainly possible for a guy to go from 10-15 homers to 20-25 the next year). Next would be Stephenson, who should get more catcher starts this year and could also fill in at first base if Joey Votto were to get hurt. J.P. Crawford could be more than a batting average guy as well if he's leading off - a dozen steals and 80+ runs scored is reasonable.
There are lots of interesting applications to clustering analysis in the fantasy sports world, and I appreciate any other similar idea recommendations for future posts. Hit me up on Twitter with feedback!
Download Our Free News & Alerts Mobile App
Like what you see? Download our updated fantasy baseball app for iPhone and Android with 24x7 player news, injury alerts, sleepers, prospects & more. All free!
More Fantasy Baseball Advice