Stats to Understand - Pitching
I go through my favorited and most-cited pitcher statistics, telling what they are, what's good, what's bad, and how to use them.
Podcast Version of This
Check out the feed on Apple Podcasts
We are about to start a new MLB season, which means I will be writing about baseball stats daily very soon. Since I will be talking a ton about certain statistics, I figured it would be a good idea to put out some content about each of those statistics so people know what I’m talking about.
There will also be a podcast episode on this if you prefer to ingest the information that way. These are the stats that will show up in the notes, and they are the stats more prominently displayed on the main MLB Tableau dashboard, which you can view here. The 2024 version will be for paid subs only, though.
K-BB%
Starting it off very simply, but it’s the most important single stat we can look at (ERA indicator stats like SIERA would be a bit better, but I don’t know how to calculate that one exactly so it doesn’t make it onto the dashboard).
You know what it is already. K% and BB% are two of the three inputs into those ERA indicator stats like FIP, xFIP, and SIERA (the third is home runs allowed and the different calculations do that differently).
K% = Strikeouts / Batters Faced
BB% = Walks / Batters Faced
The idea at the core of this is that there are three things a pitcher controls all by himself, which are these three things (strikeouts, walks, and homers). No fielder behind the pitcher has anything to do with any of those three events, so it really strips out of lot of the randomness (FIP = Fielding Independent Pitching, SIERA = Skill Interactive ERA.
At the end of each of these, I will give you the “distribution” numbers, which will show you what is good and what is bad. Anything below the 25th percentile should be considered “bad”, anything above the 75th percentile should be considered “good”, and I’ll give you the minimums and maximums as well so you know where the limits lie (using 2023 data from pitchers with at least six starts).
Distribution
Min: 4.8%
25th: 14.9%
Avg: 17.2%
75th: 19.9%
Max: 35.6%2023 Leaders & Losers:
1. deGrom 36%
2. Strider 29%
3. Skubal 27%
1. Luis Ortiz 2.8%
2. Wainwright 3.1%
3. Hudson 3.1%
Strike%
Strike% = (Whiffs + Called Strikes + Fouls) / Pitches Thrown
This takes things to a more granular level because it’s a pitch-level stat. What I mean by that is that total pitches thrown is the denominator, meaning the stat changes with every pitch thrown. With K% and BB%, it’s a batter-faced level stat.
For reference, Gerrit Cole threw 3,281 pitches last year and faced 821 batters. That’s four pitches per batter, so we get data at least four times faster for Strike% than for K%. This means the number stabilizes more quickly, making it more powerful in small samples. It is also predictive of K%. This means the best use of Strike% is to see pitchers in small samples and know in general what their K% is likely to be in the future.
One very important note is that the way I calculate Strike% is different than other sources. A lot of websites count balls in play as a strike. I don’t, because a batted ball is… not a strike. A batted ball is also a bad thing a good portion of the time. If I went out there and pitched I’d probably put up a very high Strike% because I think I’d be able to get the ball near the strike zone and hitters would just be teeing off of me, so why should we have a stat that makes me look good when I’m doing terribly?
Distribution
Min: 43.6%
25th: 46.7%
Avg: 47.9%
75th: 49.2%
Max: 54.5%2023 Leaders & Losers:
1. Strider 54.5%
2. Skubal 53.2%
3. Ryan 52.6%
1. Wainwright 39.3%
2. Keller 39.7%
3. Cessa 39.7%
SwStr%
SwStr% = Whiffs / Pitches Thrown
This takes Strike% and takes out called strikes and foul balls, so it’s just the percentage of your pitches that get a swing and a miss. It is significantly more predictive of K% than Strike%, so it’s the best one to use if you don’t have enough data to trust K% yet.
Distribution
Min: 5.4%
25th: 11.8%
Avg: 12.7%
75th: 14.0%
Max: 21.7%2023 Leaders & Losers:
1. deGrom 21.7%
2. Strider 20.5%
3. Glasnow 17.2%
1. Wainwright 5.4%
2. Woodford 6.4%
3. Cessa 6.9%
Ball%
Ball% = Called Balls / Pitches Thrown
Note that it’s pitches called a ball, not pitches in the strike zone. So if you throw a pitch in the zone that gets called a ball, it still hurts your Ball%. This is one I did not hear people talking about, but I picked it up last year because I really liked the idea of viewing Strike%, Ball%, and BIP% all in concert (more on that in a bit).
Much like SwStr% and K%, Ball% has a strong relationship with BB%. That’s incredibly obvious, but it’s still true.
Distribution
Min: 29.1%
25th: 34.0%
Avg: 35.1%
75th: 36.5%
Max: 41.6%2023 Leaders & Losers:
1. Kirby 29.1%
2. Varland 30.9%
3. Littell 30.9%
1. Keller 45.3%
2. Kopech 41.8%
3. Snell 41.6%
BIP%
BIP% = Balls in Play / Pitches Thrown
This is just the third piece of the pie here. Every pitch must either be a
Strike
Ball
BIP
That means Strike% + Ball% + BIP% will equal 100%. It doesn’t correlate super well with anything, and I will talk about it much less often than these other stats, but I just include it so you can see it alongside the other two pieces of the pie.
You’d think I’d use a pie graph at some point on the dashboard to show these three, but I had a professor in grad school that basically said if you ever show a pie chart in public you should kill yourself, so I use a scatter plot with colored dots, much more professional.
It does make for a pretty picture:
It’s generally a good thing to have a low BIP% because balls in play are dangerous. But it doesn’t standalone well, because some of the lowest BIP% pitchers are there because they throw so many balls (Edward Cabrera). So this is the stat I use the least of all of these.
Distribution
Min: 10.8%
25th: 15.8%
Avg: 17.4%
75th: 18.8%
Max: 22.4%2023 Leaders & Losers:
1. Snell 12.8%
2. Strider 13.4%
3. E Cabrera 13.7%
1. Freeland 22.4%
2. Gomber 21.8%
3. Syndergaard 21.7%
GB%
GB% = Ground Balls Generated / Balls In Play
You know about this one, but I do have some stuff to add. GB% has very little to say about ERA by itself, but there’s a small relationship there:
And if you zoom in on the top half of pitchers (say, pitchers with a K-BB% above 17%), you do tend to see lower ERA’s among the high ground ball pitchers.
This is because home runs result in earned runs [almost] every single time, and you cannot give up a home run if the ball is hit on the ground.
So I much prefer K-BB% and the like, but I do like to see higher GB% as well.
Distribution
Min: 25.6%
25th: 37.0%
Avg: 42.7%
75th: 45.9%
Max: 62.7%2023 Leaders & Losers:
1. Webb 62.7%
2. Fried 59.2%
3. Cobb 57.8%
1. Cortes 26.1%
2. Javier 26.1%
3. E Perez 27.5%
Stuff+
This is a much more complicated explanation. I handled it last year here, so you can check that out if you need to.
Basically, Stuff+ just takes the full movement of the pitches and grades each one (each individual pitch) on a scale that centers around 100 (100 being the average pitch of that pitch type) based on what kind of movement profile usually works well and which does not.
It has something to say about K%, but not as much as SwStr%.
It has nothing at all to say about walk rates or the quality of contact allowed. It’s most useful when you have just a couple or a handful of starts from a new pitcher. Stuff+ stabilizes almost immediately because it’s just about the movement of an individual pitch. So you can know at least something about a pitcher now after one start, while as before we we would have to wait 3-5 starts really.
The way you get one number for a pitcher is to just take the average of all of their pitches, but you can and should break it down by pitch type as well. The “Arsenals” tab of the main dashboard shows this, and there are a few other tabs of the dashboard centered around these pitch modeling data.
But when I have a full season’s worth of data, I don’t care much about Stuff+. I would rather just use SIERA, K-BB%, SwStr%, Ball%, etc.
My Stuff+ data does not match what you find elsewhere (FanGraphs, The Athletic), because it’s a different model from the great Drew Haugen.
Distribution
Min: 49
25th: 89
Avg: 103*
75th: 118
Max: 168*That average is three points higher because I’m only using pitchers that made six starts, so that sample is a bit better than average
2023 Leaders & Losers:
1. Glasnow 168
2. Bobby Miller 153
3. Ohtani 142
1. Josiah Gray 49
2. Gomber 51
3. Stripling 52
Location+
This is similar to Stuff+, but it’s only about where the pitches are located. Movement has nothing to do with it, it’s only about locations.
Here’s a heat map I made:
It’s not the greatest work, but you can see that the best locations as judged by the model are around the corners of the strike zone. Anything way out of the strike zone or right down the middle does not grade well, and yes it does distinguish between right-handed and left-handed batters.
It is strongly correlated with BB%.
It is much more tightly centered around 100, so most of the values you see will be between 90 and 110, so one point of movement makes a big difference.
Distribution
Min: 92
25th: 99
Avg: 101
75th: 103
Max: 1092023 Leaders & Losers:
1. Stripling 109
2. Kirby 109
3. Lopez 107
1. Kopech 92
2. Senga 93
3. Medina 94
xwOBA Allowed
expected weighted on-base average
I don’t talk much about this one because it’s not a very sticky stat. But it gives you a good idea of what kind of contact a pitcher has allowed in the past. A high number is bad, and a low number is good. If you give up a 120mph batted ball at 25 degrees, that generates an xwOBA of around 2.0. A swing and a miss will be zero. So you just take the average of all pitches and get a final number.
The league average is around .320, the worst values will be above .380, and the best values will be under .250. But regression to the mean is to be expected.
And that is it, check out the podcast version of this for more and further explanations and details.