If you've ever wondered why I spend so much time talking about swinging strikes (or contact rate, which are basically the same idea), this is why:
2005-2008 data, based on 567 pitchers who threw at least 100 innings in a season. As expected, there is a very strong correlation between missing bats and racking up strikeouts. I've looked at this sort of thing before, and it comes as no surprise.
However, the correlation isn't 1 (or -1, as it were), and what's always interested me is how certain pitchers can exceed their expected strikeout rate, while other pitchers undershoot. For example, last year AJ Burnett struck out 24.2% of the batters he faced even though, based on his contact rate, we would've expected him to come in at 22.1%. This isn't just an anomaly. It does appear to be at least somewhat within the pitcher's control.
This is a chart showing 283 matched pairs of consecutive pitcher seasons with 100+ innings pitched (2005-2008). On the x axis is the difference between K% and expected K% in one year, while on the y axis is the same difference in the year following. What you see is that, though the correlation isn't as strong as in the first chart, it's still very much significant. It's clear that, though swinging strikes are important, they aren't the only factor when it comes to generating strikeouts.
Based on some preliminary investigation, the following factors are correlated to the difference between K% and expK%:
Called strike%, curveball% and changeup% have the strongest correlations among those listed. That is, pitchers who throw a lot of curveballs or get a lot of called strikes may be able to exceed their expected strikeout rate, while pitchers who throw a lot of changeups may be the opposite.
There's a lot more work to be done on this matter, though. Just maybe not by me.
In case you're curious, here are the pitchers who, between 2005-2008, showed the biggest differences between K% and expK%.
Top Five
1) Erik Bedard (+4.4%, three-year average)
2) Mike Mussina (+4.3%, four-year average)
3) Josh Beckett (+3.8%, four-year average)
4) Esteban Loaiza (+3.5%, two-year average)
5) Curt Schilling (+3.5%, two-year average)
Bottom Five
1) Runelvys Hernandez (-4.6%, two-year average)
2) Brandon Backe (-4.4%, two-year average)
3) Ramon Ortiz (-3.7%, three-year average)
4) Kelvim Escobar (-3.5%, two-year average)
5) Brian Burres (-3.3%, two-year average)
So far in 2009, the biggest positive differences belong to Tim Lincecum, Justin Verlander, Josh Beckett, Zack Greinke, and Jon Lester, while the biggest negative differences belong to Trevor Cahill, Micah Owings, Ryan Dempster, Francisco Liriano, and Armando Galarraga.
3 recs | 47 comments
Can you run the comparison between curveball% and called strike%
My theory is that those two are highly similar.
Matthew - August 16, 2009
Yeah, this harkens back to our conversation earlier in the season.
It would be ideal to separate out the power curves like Felix’s that are similar to vertical sliders and seem to get more swinging strikes, as opposed to the loopy curves thrown by Bedard, Mussina, and Beckett.
I’d be curious to see called strike % correlated to curveball horizontal and vertical movement as well as velocity.
abender20 - August 16, 2009
For velocity, r = 0.0525
Jeff Sullivan - August 17, 2009
Believe it or not
The correlation sucks.
Jeff Sullivan - August 17, 2009
Yay for science!
Matthew - August 17, 2009
Now if only I had more trust in Fangraphs' reported pitch type percentages
Jeff Sullivan - August 17, 2009
I would guess that Wandy Rodriguez is on the plus side too, given his curve.
abender20 - August 16, 2009
Out of curiosity...
what is the correlation between K% and ERA, FIP, or tRA?
The reason I ask, is that I ran this once and got a negative number. Didn’t seem to make sense.
PLU Tim - August 16, 2009
The more Ks, the lower the ERA/FIP/tRA
ergo, negative correlation.
Matthew - August 16, 2009
Oops...let me rephrase...
I was getting a positive correlation between the two. Which seemed counter intuitive. It was a very small constant, but positive nonetheless.
PLU Tim - August 17, 2009
I'm getting r = -0.6360
Jeff Sullivan - August 17, 2009
Yeah, there's no way it would be positive
unless some weird shit was going on with pitchers who’d thrown a couple of innings. Did you use an IP cutoff PLU Tim?
marc w - August 17, 2009
I can't recall...
I know that I did use some point of reference. I didn’t allow any schmuck into the sample. I know that it was less than Jeff’s and included relief pitchers. Which I wonder if relief pitchers just threw the entire sample off because they are a statistically volatile in nature.
PLU Tim - August 17, 2009
Adding relievers shouldn't matter. I'd just check it again, because there's counterintuitive results
that are interesting and point out something new and exciting, and then there are counterintuitive results that don’t make sense and point out errors.
Especially the FIP thing… I mean, you see the equation for FIP; HOW can that be positively correlated to K%?
marc w - August 17, 2009
Well..when I did this..
was like 2-5-3 years ago so it was based on ERA. FIP wasn’t terrible “mainstream” as far as advanced metrics go at the time.
If I used FIP the result would likely make sense. ERA has enough noise in it to screw everything up anyways.
PLU Tim - August 17, 2009
Considering the lowest three year average is -2.1 and the highest is 2.0
Couldn’t we just do a general +/- 2% to the expectancy and call it good?
The Typical Idiot Fan - August 16, 2009
But that makes for a fairly large swing
The average number of batters faced in a season (for qualified starters over 2007-2008) is 833 (per FanGraphs TBF numbers).
So +/- 2% for 833 batters faced works out to +/- 16.66 K/Season.
Your swing there is 33.33 K/Season, a certainly not insignificant amount, and one you probably don’t want to apply to every pitcher when predicting K rate.
Robert Lintott - August 17, 2009
Actually I miscalculated that
Jeff Sullivan - August 17, 2009
There, that's fixed
I accidentally reported average K-expK/StDev, as opposed to average K-expK.
Jeff Sullivan - August 17, 2009
any way you can post p-value's on your graphs?
i always wonder about level of significance.
br
sirbrianwilson - August 16, 2009
Please capitalize properly.
Matthew - August 16, 2009
Wow.
Any way you can post p-value’s on your graphs? I always wonder about level of significance.
From one interested fan to another.
sirbrianwilson - August 17, 2009
Somebody please remind me how to do this in Excel
Jeff Sullivan - August 17, 2009
Significance F for Contact% and K% is 1.4E-172
Significant!
The other chart is from a sheet I have at home.
Jeff Sullivan - August 17, 2009
I think I just figured out multiple regression
Jeff Sullivan - August 17, 2009
This is so much fun
Jeff Sullivan - August 17, 2009
Congrats...
Now tell me what you get….
Run tRA against Swinging Strike%, Ground Ball%, and the average tuesday temperature in Dublin, Ireland.
PLU Tim - August 17, 2009
I have never understood wanting P values on charts that are obviously ludicrously significant.
Graham MacAree - August 17, 2009
Not so much for the first chart.
The second one though…
sirbrianwilson - August 17, 2009
Yeah, that wasn't a commentary on your request
It just tends to be a knee-jerk reaction no matter what data are presented.
Graham MacAree - August 17, 2009
I agree. I've presented at many a conference where
I showed numerous charts that look like the K%/Contact% one above. It never fails…some old codger in the back is concerned where the p-value is .01 or .001.
sirbrianwilson - August 17, 2009
I can get that to you after work
Jeff Sullivan - August 17, 2009
Rad.
sirbrianwilson - August 17, 2009
5.01E-25
Jeff Sullivan - August 17, 2009
Thanks, Jeff.
sirbrianwilson - August 17, 2009
Easy...
Tools → Add Ins → Analysis Toolpak
Once that is done..
Tools → Data Analysis → ANOVA: Single Factor
Plus in the cells and go.
PLU Tim - August 17, 2009
I was just looking at Pujols' numbers to see
what a perfect hitter’s contact% looks like and indeed, he is awesome. One thing surprised me though which was he has a lot of infield fly balls. I don’t know why this is surprising but when I think of the type of guys who hit fly balls, usually I think of guys like Jose Lopez and not amazing hitters.
Edgar for Pres - August 16, 2009
Pujols gets a lot of backspin on his swings
When he misses them, he often pops it up; when he doesn’t….
vivaelpujols - August 17, 2009
What is the correlation between tRA and K-expK?
Dewey N - August 16, 2009
I'll use FIP because it's easier
The r value for FIP and K-expK is -0.3948. However, we’d expect a correlation like that, because pitchers with a higher K-expK will generally have a higher K%, which improves their FIP.
Jeff Sullivan - August 17, 2009
Yeah that's what I was expecting
Dewey N - August 17, 2009
What's the standard deviation of the K-Rate predicting error?
vivaelpujols - August 17, 2009
2.2%
Jeff Sullivan - August 17, 2009
This is fantastic. I'd love to see more graphs like this.
They really help the statistical layman get a sense of the math behind a lot of the conclusions about sustainability and sample size that the LL authors come to.
Decatur - August 17, 2009
By the way
As one might expect, strikeout rate in year x is a slightly better predictor of strikeout rate in year x+1 than swinging strikes in year x.
Jeff Sullivan - August 17, 2009
It'll be multivariate
I’d bet that you could figure out y+1 K% with some permutation of y contact% and zone% more accurately than with y K%
Graham MacAree - August 17, 2009
Probably but that's over my head
Jeff Sullivan - August 17, 2009
You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.