We're into the stretch run now. I'm not going to go into individual statistics - the idea was never to walk through absolutely everything but rather to provide a solid foundation that facilitates good, logical thinking about sabermetrics. So instead of talking about strikeouts, wins, tRA, xFIP, whatever over the next few days, I'll describe how I think pitching/batting/defence should be evaluated - but in general. We'll start with pitching.
Prerequisites for Understanding: The Isolation Problem, Linear Weights, Base Runs, Replacement Level, Expected Wins/Losses, The Run-Win Conversion, Value, Regression, Correlation, Park Effects, Environment, WPA and LI, Data.

Evaluating Pitching
What makes a good pitcher? What makes a bad one? How do we evaluate them? Pitching is a deceptive area of study - our first generation numbers told us that we knew how many games pitchers were responsible for winning, and how many runs they gave up. For a very long time, we were content with this.
And then, quite suddenly, we weren't. What are wins, we ask? And what exactly does ERA tell you? Well, wins tell you how often your position players score more runs over the course of a game than the pitcher and the position players save. ERA is similar bizarre, thinking about it: How many runs does a pitcher and his defence concede per game discounting runs that the scorers think ought not to have counted.
This would, of course, be all well and good if the impact of defence was negligible, or that pitchers had any real control over whether batted balls find gloves or not. But defence matters. It can make average pitchers look like world beaters, and replacement level pitchers look alarmingly valuable. Whenever the ball enters the field of play, the defence is involved, and understanding that and seeking to adjust for it is absolutely critical.
So, what do we need in order to measure pitchers? Ideally, we'd be able to judge them without the defence clouding the issue. We certainly shouldn't involve bats, so wins and losses are outs. We must build upon the things that the pitcher is solely responsible for: strikeouts (mostly), walks, HBP, and home runs. After these are taken into account, we can start looking at batted balls - third generation data gives us some insight into how difficult a ball is to field, although it's not as accurate as we'd like. Anyway, my belief is that if a pitcher gives up a drive that's an out 90% of the time with an average defence, we should give him 0.9 of an out. Credit for the quality of the defence should probably go to the actual defenders rather than the pitcher. Actually doing this is non-trivial, but it's the direction we should be steering our statistics.
We then need to turn this information into runs and outs. Outs are essentially trivial with good enough defensive data, but the run conversion can come from linear weights, Base Runs (this is, of course, more accurate than linear weights), or any other run expectancy tool you can think of. With expected runs and outs, you can figure out how many runs you'd expect a pitcher given up per nine innings... to a point: We've neglected 'situational pitching'. Personally, I think this is an acceptable oversight, but it may well be that it can make a significant difference to our evaluation of pitchers. Certainly, it's something that will be fairly important to look into down the line.
So. Expected runs per nine. Using this combined with some function of batters faced leads you to a certain number of runs above average which eventually leads us to wins (and don't forget to park/league adjust!). But many prefer a quite elegant shortcut: if we know the expected runs allowed per nine and the league average figure, we can use pythagorean theory to derive expected winning percentage. This is the number that WAR for pitchers is based on, and ultimately what we want to know. Getting to that point is just a matter of refining the method and using better data: our general theory is laid out pretty cleanly.
We should be careful to regress our numbers pretty severely when dealing with pitchers, as some of their outcomes are highly luck-dependent (notably home runs per fly ball, and to a lesser extent some ball in play classifications). As with everything we look at, remember that the data at hand never tells the whole story. Regression is the name of the game, and we want to apply it mercilessly when non-correlative statistics come into play. But we should also remember that there is real value in measuring what a pitcher actually has done, as well.
Things to Remember
1 recs | 6 comments
Would you really say pitchers are responsible for how many home runs they give up?
I think it might be clearer to say they are responsible for how many flyballs they give up right?
Edgar for Pres - March 3, 2010
No it definitely would not
Just because something is not particularly correlative doesn’t mean it’s not their responsibility. Pitchers give up home runs and pretending like they’re all fly balls is crazy.
Graham MacAree - March 3, 2010
Yeah I agree now that I think about it again.
Edgar for Pres - March 3, 2010
I have never fully understood why FIP is considered accurate/relevant
I don’t know if such a player actually exists, but consider a hypothetical pitcher who doesn’t give up many home runs, has decent strikeout and walk rates, but for whatever reason gives up a ton of balls in play (and therefore hits). Questions I have:
1. This hypothetical player will have much better FIP than he should, since at the end of the day he’s still giving up hits and runs. Do people realize and/or care about this?
2. Is this even a realistic scenario? I see how if a pitcher has good K and BB rates then it’s unlikely that he’s giving up a ton of hits, but it seems like there are bound to be some exceptions. I’m thinking maybe sinkerballer type guys who rely a lot on defense and balls in play to get outs.
3. By only focusing on 3 PA outcomes (K, BB, HR), FIP really ignores a very large chunk of what goes on during a baseball game. Why then do people rely on FIP as a relevant measure of a pitchers ability?
shuswapslugger - March 4, 2010
Couple of things, if I may
First, as you mentioned, if a pitcher isn’t giving up home runs and is striking out a lot of players, he’s not going to have many balls put in play, which is good. Second, the point of FIP is in its name: Fielding Independent Pitching. So you could have a guy who gives up a ton of fly balls that don’t go for home runs, but has the Mariners’ outfield defense, so he doesn’t give up a lot of hits on those fly balls. Then he’s traded and has Adam Dunn in the outfield. He pitches at the same level that he used to, but now he’s giving up huge amounts of hits that sail over the head of the concrete statue positioned in left.
All FIP tries to do is isolate the outcomes that a pitcher has control over, and not punish him for having a lousy defense or reward him for an elite one. It has more predictive power than ERA.
controlled_slide - March 4, 2010
Good explanation
I’ll add that FIP actually has the leaguewide run value of the average batted ball implicit in its formula. Essentially, what FIP is doing is taking the average ERA and tweaking it upwards or downwards in response to certain events – balls in play are already embedded into that average ERA.
Graham MacAree - March 4, 2010
You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.