SB Nation - Login for mobile commenting

Lookout Landing

On Pitcher Contact Rate And Strikeouts

If you've ever wondered why I spend so much time talking about swinging strikes (or contact rate, which are basically the same idea), this is why:

Kcontact_medium

2005-2008 data, based on 567 pitchers who threw at least 100 innings in a season. As expected, there is a very strong correlation between missing bats and racking up strikeouts. I've looked at this sort of thing before, and it comes as no surprise.

However, the correlation isn't 1 (or -1, as it were), and what's always interested me is how certain pitchers can exceed their expected strikeout rate, while other pitchers undershoot. For example, last year AJ Burnett struck out 24.2% of the batters he faced even though, based on his contact rate, we would've expected him to come in at 22.1%. This isn't just an anomaly. It does appear to be at least somewhat within the pitcher's control.

Kexpkcorrel_medium

This is a chart showing 283 matched pairs of consecutive pitcher seasons with 100+ innings pitched (2005-2008). On the x axis is the difference between K% and expected K% in one year, while on the y axis is the same difference in the year following. What you see is that, though the correlation isn't as strong as in the first chart, it's still very much significant. It's clear that, though swinging strikes are important, they aren't the only factor when it comes to generating strikeouts.

Based on some preliminary investigation, the following factors are correlated to the difference between K% and expK%:

  • First-pitch strike% (positive correlation)
  • Zone% (positive correlation)
  • Fastball% (positive correlation)
  • Fastball velocity (positive correlation)
  • Curveball% (positive correlation)
  • Changeup% (negative correlation)
  • Called strike% (positive correlation)

Called strike%, curveball% and changeup% have the strongest correlations among those listed. That is, pitchers who throw a lot of curveballs or get a lot of called strikes may be able to exceed their expected strikeout rate, while pitchers who throw a lot of changeups may be the opposite.

There's a lot more work to be done on this matter, though. Just maybe not by me.

In case you're curious, here are the pitchers who, between 2005-2008, showed the biggest differences between K% and expK%.

Top Five

1) Erik Bedard (+4.4%, three-year average)
2) Mike Mussina (+4.3%, four-year average)
3) Josh Beckett (+3.8%, four-year average)
4) Esteban Loaiza (+3.5%, two-year average)
5) Curt Schilling (+3.5%, two-year average)

Bottom Five

1) Runelvys Hernandez (-4.6%, two-year average)
2) Brandon Backe (-4.4%, two-year average)
3) Ramon Ortiz (-3.7%, three-year average)
4) Kelvim Escobar (-3.5%, two-year average)
5) Brian Burres (-3.3%, two-year average)

So far in 2009, the biggest positive differences belong to Tim Lincecum, Justin Verlander, Josh Beckett, Zack Greinke, and Jon Lester, while the biggest negative differences belong to Trevor Cahill, Micah Owings, Ryan Dempster, Francisco Liriano, and Armando Galarraga.

3 recs  |  47 comments

Comments

Can you run the comparison between curveball% and called strike%

My theory is that those two are highly similar.

Yeah, this harkens back to our conversation earlier in the season.

It would be ideal to separate out the power curves like Felix’s that are similar to vertical sliders and seem to get more swinging strikes, as opposed to the loopy curves thrown by Bedard, Mussina, and Beckett.

I’d be curious to see called strike % correlated to curveball horizontal and vertical movement as well as velocity.

For velocity, r = 0.0525
Believe it or not

The correlation sucks.

Yay for science!
Now if only I had more trust in Fangraphs' reported pitch type percentages
I would guess that Wandy Rodriguez is on the plus side too, given his curve.
Out of curiosity...

what is the correlation between K% and ERA, FIP, or tRA?

The reason I ask, is that I ran this once and got a negative number. Didn’t seem to make sense.

The more Ks, the lower the ERA/FIP/tRA

ergo, negative correlation.

Oops...let me rephrase...

I was getting a positive correlation between the two. Which seemed counter intuitive. It was a very small constant, but positive nonetheless.

I'm getting r = -0.6360
Yeah, there's no way it would be positive

unless some weird shit was going on with pitchers who’d thrown a couple of innings. Did you use an IP cutoff PLU Tim?

I can't recall...

I know that I did use some point of reference. I didn’t allow any schmuck into the sample. I know that it was less than Jeff’s and included relief pitchers. Which I wonder if relief pitchers just threw the entire sample off because they are a statistically volatile in nature.

Adding relievers shouldn't matter. I'd just check it again, because there's counterintuitive results

that are interesting and point out something new and exciting, and then there are counterintuitive results that don’t make sense and point out errors.
Especially the FIP thing… I mean, you see the equation for FIP; HOW can that be positively correlated to K%?

Well..when I did this..

was like 2-5-3 years ago so it was based on ERA. FIP wasn’t terrible “mainstream” as far as advanced metrics go at the time.

If I used FIP the result would likely make sense. ERA has enough noise in it to screw everything up anyways.

Considering the lowest three year average is -2.1 and the highest is 2.0

Couldn’t we just do a general +/- 2% to the expectancy and call it good?

But that makes for a fairly large swing

The average number of batters faced in a season (for qualified starters over 2007-2008) is 833 (per FanGraphs TBF numbers).

So +/- 2% for 833 batters faced works out to +/- 16.66 K/Season.

Your swing there is 33.33 K/Season, a certainly not insignificant amount, and one you probably don’t want to apply to every pitcher when predicting K rate.

Actually I miscalculated that
There, that's fixed

I accidentally reported average K-expK/StDev, as opposed to average K-expK.

any way you can post p-value's on your graphs?

i always wonder about level of significance.

br

Please capitalize properly.
Wow.

Any way you can post p-value’s on your graphs? I always wonder about level of significance.

From one interested fan to another.

Somebody please remind me how to do this in Excel
Significance F for Contact% and K% is 1.4E-172

Significant!

The other chart is from a sheet I have at home.

I think I just figured out multiple regression
This is so much fun
Congrats...

Now tell me what you get….

Run tRA against Swinging Strike%, Ground Ball%, and the average tuesday temperature in Dublin, Ireland.

I have never understood wanting P values on charts that are obviously ludicrously significant.
Not so much for the first chart.

The second one though…

Yeah, that wasn't a commentary on your request

It just tends to be a knee-jerk reaction no matter what data are presented.

I agree. I've presented at many a conference where

I showed numerous charts that look like the K%/Contact% one above. It never fails…some old codger in the back is concerned where the p-value is .01 or .001.

Easy...

Tools → Add Ins → Analysis Toolpak

Once that is done..

Tools → Data Analysis → ANOVA: Single Factor

Plus in the cells and go.

I was just looking at Pujols' numbers to see

what a perfect hitter’s contact% looks like and indeed, he is awesome. One thing surprised me though which was he has a lot of infield fly balls. I don’t know why this is surprising but when I think of the type of guys who hit fly balls, usually I think of guys like Jose Lopez and not amazing hitters.

Pujols gets a lot of backspin on his swings

When he misses them, he often pops it up; when he doesn’t….

What is the correlation between tRA and K-expK?
I'll use FIP because it's easier

The r value for FIP and K-expK is -0.3948. However, we’d expect a correlation like that, because pitchers with a higher K-expK will generally have a higher K%, which improves their FIP.

Yeah that's what I was expecting
What's the standard deviation of the K-Rate predicting error?
This is fantastic. I'd love to see more graphs like this.

They really help the statistical layman get a sense of the math behind a lot of the conclusions about sustainability and sample size that the LL authors come to.

By the way

As one might expect, strikeout rate in year x is a slightly better predictor of strikeout rate in year x+1 than swinging strikes in year x.

It'll be multivariate

I’d bet that you could figure out y+1 K% with some permutation of y contact% and zone% more accurately than with y K%

Probably but that's over my head

You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.