Pitcher Predictability Analysis for Fantasy Baseball

3 years ago by Jon Anderson

Link copied to clipboard!

Follow @Jon

TESTING NEW WIDGET hide

2. The Results

3. Conclusions

4. Download Our Free News & Alerts Mobile App

5. More Fantasy Baseball Advice

There are different ways to have success as a pitcher. You can have wipe-out stuff, pinpoint command, superior deception, or an arsenal of different pitch types that make it tough for a hitter to see what is coming next. The best pitchers will have all of those things going for them.

For this fantasy baseball analysis, I wanted to focus on that last point - an unpredictable arsenal. Most starters have at least three pitches in their arsenal, but plenty of pitchers dedicate more than 80% of their arsenal to just two different pitches, making them quite predictable. On the flip side of that, you have some guys with five or six different pitches that they are comfortable with throwing - that makes them very tough to predict.

It is definitively not true that you need a deep arsenal to have success in the league. If you have two great pitches with great command, that is more than enough to have success in the big leagues. It does seem true that having a deep pitch arsenal gives a pitcher a wider safety net. Some nights you just don't have the feel for certain pitches, and having a few extra to fall back on can really help a pitcher avoid those disastrous outings.

Featured Promo: Get any full-season MLB and DFS Premium Pass for 50% off. Exclusive access to our Team Sync platform, Premium articles, daily Matchup Rating projections, 15 lineup tools, DFS cheat sheets, Research Stations, Lineup Optimizers and much more! Sign Up Now!

The Process

I used a Python script for this in conjunction with the baseball savant pitch-by-pitch dataset. This dataset is a massive table of data with one row for every single pitch thrown in Major League Baseball games. Every data point you could think of is in this dataset, and it opens the door for limitless analysis with a coding language like Python.

My script works like this:

Pick an individual pitcher
Isolate that pitcher from the dataset, giving us a table of all of their pitches thrown
Order the table chronologically
Skip forward to the pitcher's 251st pitch of the year
From then on, for every pitch, look at the count and handedness of the batter being faced on every pitch
Look back to the past times the pitcher has been in that exact situation (say an 0-2 count against a left)
Find the pitcher's most commonly thrown pitch in that situation
Predict that he will throw that pitch
Compare that prediction to what the pitcher actually threw
Calculate what percent of the time the prediction was correct
Repeat for every single pitcher that threw at least 1,500 pitches last year

Long story short, we just tried to predict every pitch type based on what the pitcher threw in similar situations in the past, and then we see which pitchers we were right and wrong about most often. If you are interested in seeing the Python script I wrote, you can view that here on Google colab, a Python notebook environment hosted on the Google Cloud. Just collapse the "Setup" code blocks with the arrow at the top and then look at the "Example of how it works" block.

The Results

Here is the results table, it includes every pitcher we looked at. The "score" is just the percent of the time the prediction was right, so the lower the score - the less predictable the pitcher was.

Cleveland's Aaron Civale proved to be the league's least predictable pitcher last year. He threw six different pitches at a clip above 10% (cutter, four-seamer, curveball, splitter, slider, and sinker). Baseball Savant has a nice feature of showing how a pitcher distributes his pitches in all different counts, you can check out Civale's plot here for a visual of just how tough he was to predict.

The rest of the top five perfectly shows that this is not a foolproof way of finding good pitchers. Mike Foltynewicz, Merrill Kelly, Jon Lester, and Matt Harvey were four of the worst pitchers in the entire league last year despite their unpredictability. All of those names threw at least four pitches over 10% of the time. The problem was just the small detail of their pitches not being very good. It's quite possible that the deep arsenal is the reason these pitchers are even staying in the league. If Foltynewicz was out there only throwing his four-seamer and slider last year, he probably would not have been able to stay in the rotation as long as he did.

If you sort the score column the other way around, we'll see the most predictable pitchers. Turns out that you see some pretty good pitchers on this side of things.

Logan Gilbert had a pretty successful rookie year despite coming in "last place" in this analysis. Maybe we should not judge rookies the same way as veterans here, since it would have taken some time for the hitters to learn his arsenal and to potentially capitalize on the limitations, but nonetheless, we proceed. Gilbert threw a four-seamer 61.5 of the time and a slider 23.9%, leaving just 14.6% of the time dedicated to his changeup and curveball. It was pretty easy for a hitter to just expect a fastball in neutral and advantageous counts, and then sit on the slider when they were behind. But this is a good example that just because a hitter knows what pitch is coming, that surely doesn't mean they'll be able to hit it.

Triston McKenzie, Carlos Rodon, Robbie Ray, Clayton Kershaw, and Trevor Rogers also show up here in the top-15. These pitchers were all pretty much the same mold as Gilbert, throwing a fastball a high percentage of the time to get ahead, and then using a breaking ball to attempt to put hitters away.

Here's a table you can use to search for a pitcher's name and see their arsenal breakdown. I limited this to only pitchers that threw at least 1,500 pitches last year to save some memory.

Conclusions

I think the biggest conclusion here is that we can overestimate the depth of the arsenal. The quality of the pitches turns out to be much, much more vital to success than the quantity of them. Most fantasy players probably understood that before, but I think this table really drives it home. There does not seem to be any correlation whatsoever between depth of arsenal and success on the field.

That said, I don't think it's proper to throw this out entirely. While the Aaron Civale types (pretty bland "stuff" wise, but quite successful on the mound) are somewhat rare, they can exist because of this. Civale makes his living by throwing a bunch of different pitches in all different counts and locating them well. A young pitcher coming into the league and having five pitches in his bag gives him a lot of different things to try out and more options to work with as he figures out what works at the big league level and what doesn't.

This could be good "tiebreaker" material in the fantasy baseball world. Despite what we see above, I would still rather roll the dice on a pitcher with five pitches rather than three if everything else is more or less equal.

Thanks for reading, if anybody has any questions about Python or requests for the full data extracts, I can be reached on Twitter @JonPGH

Download Our Free News & Alerts Mobile App

Like what you see? Download our updated fantasy baseball app for iPhone and Android with 24x7 player news, injury alerts, sleepers, prospects & more. All free!