SB Nation - Login for mobile commenting

Lookout Landing

Sabermetrics 101: Linear Weights

Now we're really getting into the good stuff.

Prerequisites for understanding: Game state, environment.

Prerequisites for derivation: Game state; database.

Star-divide

The Not-So-Missing Link

Let's go all the way back to the beginning. We started this series by looking at the game state, along with run expectancy and win expectancy. We did so for a reason, and that's because it's impossible to understand any of the more modern statistics without a good understanding of the game state (and run expectancy). Here, we have the intermediate step between game state and a useful metric, whether we wish to look at batting, pitching, or defence: Linear Weights. Just as runs require some translation in order to be presented in a measure that has some inherent value, the events on the field must also be converted into runs. How? By going back to our game state and looking at run expectancy.

We know the average number of runs scored over the remainder of an inning in any baserunner/out state. Bases loaded, no out? You're looking at a lot of runs. Empty, with two down? Rather less. With play-by-play data, we can actually look at any class of event and find out the average change said event causes in run expectancy. Add in the average number of runs that scored on a play and suddenly you're left with the value, in runs - i.e. the linear weight - of any given event. This is a pretty big deal, as without it we'd have no way to measure the relative importance of say, walks and singles. Combined with the run/win conversion, linear weights (in run form) bridge the gap between the old baseball stats and value.

Nothing Is Arbitrary

Consider the previous paragraph again. It's critically important to have a good grasp of what it means: that all of our top-line stats are related to runs above or below average by empirical means. There is nothing arbitrary in the exact weighting we have of a home run relative to a triple, or a ground ball to a line drive. Years upon years of data allow us to convert back and forth, or up and down with ease. A common complaint with modern sabremetrics is the bewildering array of fractional coefficients that dot the scene, but if you look at a formula that's based on linear weight, don't see them as confusing numbers. Instead, look at them as relative values, derived through years of baseball being played.

A Livable Zone

Linear weights is a fantastic tool, but we should be aware of the limitations as we sing its praises. Because we build our run (and out) values on league average data, there's no guarantee that they work in extreme environments. And, in fact, they do not. At all. If a pitcher struck out 100% of the batters he faced and we attempted to estimate his ERA though linear weights, we would end up with our pitcher allowing something like negative three runs per nine innings, a clearly impossible solution. Situations don't have to be as extreme as that either: the best pitchers in baseball effect their run environment to the point that linear weights may not accurately reflect the true conversion between their pitching and the runs we'd expect. The take away point? Linear weights are optimised for the average baseball game, and start to fall apart when you drift too far away from that. They're still usable when a long way from the mean, but as with anything, understanding what's wrong with what we use is just as important as knowing what's right.

What Follows

Baseruns, wOBA, FIP, tRA, UZR.

2 recs  |  6 comments

Comments

I am keeping up so far

and it has been 45 years since my last college math class.

I really appreciate what you are doing here Graham. I have been reading this and other Sabremetrics friendly sites and blogs for a little over a year now. I have bought into the results because they make sense, but it is great to have a way to start at the beginning and go through the concepts that the current stats are built on.

What *are* linear weights?

This wasn’t clear to me from the article.

Sorry, I re-worded it a little to address this complaint

A linear weight is the average run value of an event taken by the average change in run expectancy plus the average runs scored on that event. Linear weights is what we call the collection of these averages.

Clearer now, thanks

So, if I understand correctly, for each single (say) we have two pieces of information:
1) the game state before and after the single (and hence the change in run expectancy)
2) the number of runs scored due to the single

Linear weights are computed by taking the average of 1) and 2) over all singles.in the dataset.

The sum of the averages
Runner on second, one out, hitter singles.

The run scores, put it in the bank. Having a runner on second with one out might have a run expectancy of .8 runs. Runner on first is .6 runs. So you’ve gone from .8 runs to 1.6 runs (1 + .6), meaning the single was worth .8 runs. Then average the .8 with every other single that occurs.

You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.