One of the most-often used algorithms in the data science world is called clustering. Clustering is defined, by Wikipedia, as:
grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)
In baseball data analytics, we can use this in several ways. My goal with this analysis was to see if I could cluster the league's different pitches by their acceleration in three dimensions, and then check to see how the pitches in those clusters performed. The application of this would be to get an idea of what pitchers have the "stuff" to be more or less successful than what we have seen from them in the past. Statcast captures the forward, horizontal, and vertical acceleration of each pitch in feet per second measured after the ball has traveled 50 feet. These three numbers encompass most of what we want to look at when looking at how a pitch moves. The forward acceleration is essentially the velocity, the horizontal acceleration is the side-to-side movement, and the vertical acceleration is the up-and-down movement. This can help us isolate the best pitches in the league by how they come out of a pitcher's hand.
Featured Promo: Get any full-season MLB and DFS Premium Pass for 50% off. Exclusive access to our Team Sync platform, Premium articles, daily Matchup Rating projections, 15 lineup tools, DFS cheat sheets, Research Stations, Lineup Optimizers and much more! Sign Up Now!
Pitch Type Breakdown
Before getting started, here is the breakdown of how often each pitch type has been thrown over the last three seasons:
Pitch Name | % Thrown |
4-Seam Fastball | 35.5% |
Slider | 17.4% |
Changeup | 10.9% |
Sinker | 9.3% |
Curveball | 8.5% |
2-Seam Fastball | 8.2% |
Cutter | 6.1% |
Knuckle Curve | 2.4% |
Splitter | 1.5% |
Four-Seam Fastballs
For context, here is the acceleration breakdown of the Major League four-seamer, classified by percentiles. So in the case of horizontal acceleration here, 28.65 feet per second is the average and 26.82 is the 25th percentile, which would mean 25% of the values fall below that number and 75% of the values fall above it. This will be more useful when we show the clusters a bit later.
What the clustering does is group all these fastballs together based on which other fastballs they are most like. If a pitcher throws a pretty consistent fastball in terms of how it moves, almost all of their fastballs will fall into the same cluster. After we have these clusters, we can check how each cluster performed and then determine which type of movement on a fastball is most effective.
Percentile | Forward | Horizontal | Vertical |
25% | 26.82 | 5.98 | -17.09 |
50% | 28.65 | 9.01 | -14.91 |
75% | 30.50 | 11.96 | -12.90 |
Using five clusters, here's what we get.
Cluster | Average Forward | Average Horizontal | Average Vertical |
1 | 31.00 | 10.90 | -13.07 |
2 | 29.99 | 5.62 | -12.85 |
3 | 29.63 | 15.31 | -16.59 |
4 | 25.96 | 3.31 | -17.29 |
5 | 26.49 | 9.50 | -16.60 |
Cluster 1
Highest velocity, average to high horizontal movement, low downwards movement
Cluster 2
High velocity, low horizontal movement, low downwards movement
Cluster 3
Average velocity, highest horizontal movement, high downwards movement
Cluster 4
Lowest velocity, low horizontal movement, high downwards movement
Cluster 5
Low velocity, average horizontal movement, high downwards movement
Now we can take each cluster of fastballs and see how they performed. I've checked five metrics:
1. Whiff Rate (number of swings and misses divided by total pitches thrown)
2. Called Strike + Whiff Rate (called strikes plus swings and misses divided by total pitches)
3. Slugging Percentage Against
4. Average Exit Velocity
5. Median Exit Angle
Here are the results:
Cluster | Whiff% | CSW% | SLG | Exit Velo | Exit Angle |
1 | 10.1% | 27.6% | .452 | 88.9 | 25 |
2 | 10.2% | 27.1% | .419 | 87.8 | 25 |
3 | 8.6% | 27.3% | .456 | 89.3 | 20 |
4 | 7.9% | 26.4% | .476 | 87.9 | 17 |
5 | 7.2% | 26.6% | .533 | 89.7 | 20 |
Cluster two appears to be the winner, with the highest whiff rate and the lowest slugging percentage on contact, but cluster one is right there with it. Cluster five is the clear loser here.
So what pitchers are in these clusters? Here are the top ten names in terms of the number of fastballs thrown that ended up in the cluster
Cluster One:
Justin Verlander, Gerrit Cole, Trevor Bauer, Zack Wheeler, Reynaldo Lopez, Nick Pivetta, Robbie Ray, Lucas Giolito, JA Happ, Shane Bieber
Cluster Two:
Jacob deGrom, Vince Velasquez, Walker Buehler, Tyler Glasnow, Dylan Bundy, Sean Doolittle, Chad Green, Emilio Pagan, Mike Clevinger, John Means
Cluster Three:
Max Scherzer, Tyler Mahle, Aaron Nola, James Paxton, Luis Castillo, Gerrit Cole, Charlie Morton, Chris Sale, Richard Rodriguez, Caleb Smith
Cluster Four:
Zack Greinke, Brad Keller, Clayton Kershaw, Brent Suter, Max Fried, Mike Minor, Trevor Williams, Anibal Sanchez, Antonio Senzatela, Spencer Turnbull
Cluster Five:
Jon Lester, Julio Teheran, Matthew Boyd, Homer Bailey, Madison Bumgarner, Lance Lynn, Rich Hill, Jake Odorizzi, Jon Gray, Kyle Freeland
Admittedly these top-10 lists aren't very informative because they are just showing guys that have thrown a ton of fastballs overall the last three years. Any pitcher that debuted in 2020 really did not have a chance to show up here.
I went ahead and sliced the data up to show what percent of each pitcher's fastballs ended up in each cluster. What we are looking to find is some surprising names that threw most of their fastballs in cluster one or two - because those are the most effective fastballs. Some names that stood out:
Ian Anderson (93%), Brendan McKay (87%), Dinelson Lamet (83%), Frankie Montas (83%), Triston McKenzie (83%), Shohei Ohtani (79%), Tony Gonsolin (77%), Zac Gallen (70%)
Doing the same with clusters four and five, here are some names that threw a lot of "bad" fastballs that you might not have expected:
Kwang Hyun Kim (99%), Adam Wainwright (96%), Max Fried (91%), Lewis Thorpe (88%), Justus Sheffield (88%), Mike Soroka (85%), Joe Musgrove (84%), Dakota Hudson (84%), Spencer Turnbull (79%), Sonny Gray (73%)
Sliders
Summary table for sliders:
Percentile | Forward | Horizontal | Vertical |
25% | 21.61 | 1.51 | -32.92 |
50% | 23.17 | 3.21 | -30.19 |
75% | 24.82 | 5.61 | -27.45 |
Here's how each cluster looks:
Cluster | Average Forward | Average Horizontal | Average Vertical |
1 | 25.53 | 1.81 | -27.19 |
2 | 22.93 | 4.63 | -30.78 |
3 | 22.67 | 8.31 | -32.32 |
4 | 21.51 | 1.56 | -30.69 |
5 | 23.80 | 12.94 | -32.83 |
Cluster One:
High velocity, low horizontal movement, low vertical movement
Cluster Two:
Average velocity, average to high horizontal movement, average vertical movement
Cluster Three:
Average velocity, high horizontal movement, high vertical movement
Cluster Four:
Low velocity, Low horizontal movement, Low vertical movement
Cluster Five:
Average to high velocity, very high horizontal movement, high vertical movement
Here are the results of the sliders in these clusters:
Cluster | Whiff% | CSW% | SLG | Exit Velo | Exit Angle |
1 | 17.2% | 29.3% | .361 | 85.5 | 12 |
2 | 16.3% | 31.6% | .367 | 85.5 | 13 |
3 | 15.4% | 33.2% | .335 | 84.5 | 18 |
4 | 16.7% | 31.2% | .402 | 86.0 | 13 |
5 | 15.3% | 34.2% | .267 | 82.1 | 17 |
Cluster five is the clear winner and cluster four the clear loser. The biggest difference in those clusters is that horizontal movement, as they were at opposite ends of the spectrum there.
Let's see the most frequent names in each cluster.
Cluster One
Jacob deGrom, Clayton Kershaw, Zack Wheeler, Justin Verlander, Robbie Ray, Amir Garrett, Miles Mikolas, Jon Gray, Sam Gaviglio, Edwin Diaz
Cluster Two
Brad Keller, Chris Archer, Jack Flaherty, Zack Greinke, Mike Minor, Trevor Williams, Gerrit Cole, Clayton Kershaw, Masahiro Tanaka, Tyson Ross
Cluster Three
Jhoulys Chacin, Chris Sale, Jakob Junis, CC Sabathia, Andrew Miller, Steve Cishek, Marcus Stroman, Luis Severino, Sergio Romo, Trent Thornton
Cluster Four
Patrick Corbin, Matthew Boyd, Shane Bieber, Carlos Carrasco, Dylan Bundy, Jaime Barria, Robbie Ray, Caleb Smith, Luis Cessa, Kyle Freeland
Cluster Five
Chaz Roe, Mike Clevinger, Jakob Junis, Adam Ottavino, Brad Hand, Trevor Bauer, Sonny Gray, Kyle Crick, Paul Fry, Collin McHugh
No surprise to see slider god Chaz Roe leading the charge in cluster five, but pretty weird to see Corbin and Bieber showing up in cluster four among the worst-performing sliders in the game.
It should be noted that only 7% of sliders ended up in the elite cluster five. 13% of them ended up in cluster two, and then clusters one, three, and four all had 25-27%.
Here are some of the more surprising names that showed up in cluster five with the best sliders in the game:
Shohei Ohtani (57.4%), Jordan Yamamoto (52.5%), Jakob Junis (52%), Max Fried (39.5%)
Cluster one had the highest whiff rate while maintaining a solid slugging percentage. Here are some of the surprising names from that cluster:
Jake Odorizzi (77%), Trevor Williams (76%), Steven Brault (71%), Tejay Antone (62%)
Here are more names from that less successful cluster four:
Kevin Gausman (82%), Randy Dobnak (74%), Caleb Smith (74%), Carlos Carrasco (69%), Luis Castillo (65%)
No one cluster really stands out in terms of swinging-strike rate here, which might suggest that getting whiffs on a slider is more about deception and pitch selection rather than the actual movement of the pitch. It's obviously not true that Shane Bieber's slider was really in the same class as Randy Dobnak's. Location and pitch selection means a heck of a lot with breaking pitches, and this analysis does not capture either of those things.
Curveballs
Summary table for sliders:
Percentile | Forward | Horizontal | Vertical |
25% | 19.90 | 3.73 | -41.71 |
50% | 21.53 | 6.13 | -39.11 |
75% | 23.21 | 8.57 | -35.93 |
Here's how each cluster looks:
Cluster | Average Forward | Average Horizontal | Average Vertical |
1 | 22.80 | 2.20 | -37.47 |
2 | 23.55 | 11.41 | -38.51 |
3 | 19.87 | 7.89 | -36.91 |
4 | 23.13 | 6.36 | -42.03 |
5 | 18.97 | 3.54 | -37.49 |
Cluster One:
Average velocity, low horizontal movement, low to average vertical movement
Cluster Two:
High velocity, very high horizontal movement, average vertical movement
Cluster Three:
Low velocity, average to high horizontal movement, low vertical movement
Cluster Four:
High velocity, average horizontal movement, high vertical movement
Cluster Five:
Low velocity, low horizontal movement, low to average vertical movement
Here are the results of the curveballs in these clusters:
Cluster | Whiff% | CSW% | SLG | Exit Velo | Exit Angle |
1 | 14.5% | 30.3% | .345 | 85.8 | 9 |
2 | 14.0% | 33.1% | .313 | 83.4 | 11 |
3 | 10.1% | 32.2% | .415 | 84.9 | 15 |
4 | 12.2% | 29.8% | .336 | 85.8 | 5 |
5 | 9.7% | 31.9% | .469 | 86.4 | 14 |
Clusters one and two are the winners with whiff rates soaring above the rest. Here are the top ten names in each cluster.
Cluster One:
Domingo German, Ivan Nova, Jordan Lyles, Blake Snell, Andrew Heaney, Nick Anderson, Keone Kela, Alex Young, Matt Magill, Matt Barnes
Cluster Two:
Charlie Morton, Jose Berrios, Sonny Gray, Adam Wainwright, Rich Hill, Stephen Strasburg, Corey Kluber, Ryan Pressly, Framber Valdez, Aaron Sanchez
Cluster Three:
Jon Lester, Madison Bumgarner, Zack Greinke, Noe Ramirez, Kyle Hendricks, Jerry Blevins, Rick Porcello, Hyun-Jin Ryu, Ryne Harper, Ryan Yarbrough
Cluster Four:
Miles Mikolas, Tyler Skaggs, Max Fried, Mike Fiers, Tyler Glasnow, Jordan Lyles, Gio Gonzalez, Justin Verlander, Sean Newcomb, Merrill Kelly
Cluster Five:
Jose Quintana, Clayton Kershaw, Drew Smyly, Andrew Heaney, Marco Gonzales, Eric Lauer, Jordan Zimmermann, Kyle Gibson, Wei-Yin Chen, Patrick Corbin
You really want to be in cluster one or two here. Here are some of the interesting names that showed up with really high percentages of their curveballs falling into those clusters.
Jesus Luzardo (93%), Sonny Gray (93%), Ian Anderson (84%), Jordan Montgomery (81%), Kyle Wright (77%), Brandon Woodruff (74%), Dinelson Lamet (69%), Griffin Canning (68%), Sandy Alcantara (68%), Michael Lorenzen (62%)
Cluster five is the one to avoid, and here are some of those names:
Anibal Sanchez (89%), Patrick Corbin (88%), Reynaldo Lopez (81%), Drew Smyly (80%), Yusei Kikuchi (70%), Clayton Kershaw (70%), Sean Manaea (54%), Dylan Bundy (49%), Zach Plesac (45%), Andrew Heaney (44%)
Takeaways
- Ian Anderson, Shohei Ohtani, Frankie Montas, Dinelson Lamet, and Jesus Luzardo have awesome fastballs and very good secondary breaking pitches to go with it
- Kwang-Hyun Kim, Max Fried, and Justus Sheffield may have limited upside due to pretty unimpressive fastballs
- Jakob Junis has one of the best sliders in the league
- Zack Greinke and Patrick Corbin really do not have much in terms of "stuff" and will continue to have to rely on location, deception, and experience
Download Our Free News & Alerts Mobile App
Like what you see? Download our updated fantasy baseball app for iPhone and Android with 24x7 player news, injury alerts, sleepers, prospects & more. All free!
More 2021 Fantasy Baseball Advice