Now seemed as good a time as any.
Prerequisites for understanding: Regression, correlation.
Prerequisites for derivation: Data, regression, correlation.

Sample Sizes
We're familiar with regression and correlation, so let's get a little more in depth with the nature of sample sizes. If we have a certain set of data, how can we assess its reliability? What is it actually telling us? We unpacked a little of this in a previous post, but didn't touch on how it applies to the data we wish to analyse. There are many many resources that describe this problem and the solutions in minute, tortuous (to some) detail. We don't need to rehash them here - these posts are intended to be more overview than encyclopedic. So - an overview:
The idea is that different skills and stats have different thresholds for sample size tolerance. We know that we must regress our measurements towards a mean, and we've thought a little bit about which means we should be using. What we haven't really discussed is how far we should be regressing our given values. This is governed by our sample size and the stability of the statistic - higher samples and higher stability means less regression. The important thing to point out is that the amount of regression we apply should be a continuum, rather than a step - meaning that for every sample size there is a certain amount of associated information. The smallest sample (i.e. zero) tells you nothing, and we slowly work our way up the ladder until we reach the largest samples, which still don't tell you everything.
Some Rules of Thumb
The way we determine the information associated with a given sample size for a given statistic is to look at the stability across the MLB population while taking into the relative persistence of the statistic year by year in individuals. Needless to say, this can be a daunting task. In lieu of pursuing some intensive mathematics, here are some rules of thumb:
What Follows
Projection systems, understanding splits.
0 recs | 17 comments
Finding out that Batter vs. Pitcher stats are massively subject to small sample sizes was one fo the most upsetting things to realize.
CapSea - February 28, 2010
Doesn't ERA tend to stabilize over the course of an entire career?
Poochie - February 28, 2010
Over 6-7 years I would say an adjusted ERA is better than DIPS
vivaelpujols - March 1, 2010
Than our current implementations of DIPS, sure
Eventually we’ll get the more interesting information encoded in ERA into our defence-independent statistics
Graham MacAree - March 1, 2010
Yes
SIERA is a start to that, although it’s going about it the wrong why IMO.
vivaelpujols - March 1, 2010
If you play for a bunch of different teams, probably
But if you pitch in the same park, in front of similar defenses, then it may not.
cyberwulf - March 1, 2010
Warning - Math content - If you don't want to read, please ignore
Here is a good example of a case where the concept of variance (or more importantly its square root, the standard deviation) could help explain what is going on.
When we calculate a rate, there is a “true rate” which our sample is approximating. Every rate exhibits variability (think about flipping a coin 10 times – on any set of 10 reps, you will see lots of different results – when you observe all the results, the mean would be 5, the standard deviation measures (not exactly, but close) the average difference from the mean of a sample of size 10. The standard deviation of the rate that you calculate from you sample represents the expected variability that you may see in that rate even though the mean rate is .5. With larger samples, the standard deviation will be smaller. The rate of decrease in proportional to the square root of the sample size.
What does this mean. An OBP calculated based on 400 PA exhibits half the variability of an OBP based on 100 PA. To get half the variability of an OBP based on 400 PA, you’d have to go to 1600 PA. Sorry for the interruption :)
New England Fan - March 1, 2010
I think there are at least two separate issues here
which it may be enlightening to distinguish. The primary question is: What prevents us from estimating something precisely? In terms of baseball statistics, there may be two different challenges:
1) The quantity you are trying to measure is inherently variable. For example, due to “randomness” (really a bunch of small unobservable factors) a player’s defensive performance may actually fluctuate quite a lot from game to game, week to week, or even year to year. So we need large sample sizes to estimate the true mean (i.e. talent level) precisely.
2) Even if the quantity of interest doesn’t necessarily have high variability, the nature of baseball restricts the sample sizes available to estimate it. For example, if a batter is platooned, then he may have only a small number of at-bats against a same-handed pitchers, and it will obviously be hard to estimate his performance against them.
A subtler point related to 2) is that the “events” (swings, at-bats, pitches, etc.) we look at are dependent (i.e. correlated) to varying degrees. Swings, for example, are grouped into at-bats, and four or five consecutive at-bats often take place against the same pitcher, inducing dependence. Generally speaking, the larger the dependence of the events, the larger sample size you need to estimate your quantity accurately.
cyberwulf - March 1, 2010
Here's what confuses me about sample size and sports:
the idea that we learn “nothing” from small sample sizes.
Perhaps this is just a language issue, and when someone says that we learn nothing from that stat, because of sample size, they really mean that we learn very very little.
I think of it like this. If I see a batter come to the plate and I know nothing about him (except that he is on a major league baseball team), and then he hits a homerun in that single at bat. That has low value. But is it “zero”? I mean, isn’t it more likely than not that he is a power hitter? The odds are higher than they were when he came up to bat (and I could only expect league averages from him), even if that increase in odds is just 1% or .1%, it is something, right?
This rolls around in my brain most often when it comes to batter matchups vs. pitchers. If Batter A is 9 for 12 lifetime against Pitcher B, then isn’t there an increased chance that he is better against that pitcher than if he were 1 for 12? It isn’t a large enough sample to draw a statistically significant sample, but aren’t the odds somewhat higher?
Sorry to make this so long, but I use analogy to articulate what I mean. Lets say I have a random quarter I found on the ground and I haven’t looked at it yet. The odds that this quarter is a two-sided coin (both heads) is, lets just say, .001% (if 1 out of every 100,000 quarters on the ground is two sided trick coin).
Now I start flipping that coin. I get heads three times in a row. Three is a tiny sample size and for all effective purposes it is worthless. But aren’t the odds that my coin is two-sided now higher? Maybe they are .002%. Is there something I’m missing here, or is this kind of differentiation between ‘zero’ and ‘really tiny’ only valuable as a curious mental exercise?
Snuffleupagus - March 1, 2010
Jordan Schafer homered in his first big league at bat then proceeded to be very very bad offensively for the rest of his time with the Braves.
On at bat tells you that, yes, Jordan Schafer has the physical ability to hit a home run. You knew this because he had homered at least once in high school and the minors. Even if something has a tiny tiny chance of happening, the single occurrence in one observation of that something doesn’t tell you anything about the likelihood of it happening again.
abender20 - March 1, 2010
Also, Kenji Johjima hit a home run to right field in his first game as a Mariner.
Yes that actually happened, but it did not mean he was a hitter that would routinely show power the opposite way.
Sec 108 - March 1, 2010
That is to say that it tells you that the event isn't impossible, but it's not much more instructive than that.
abender20 - March 1, 2010
And most of the time, we already knew that.
Llewdor - March 1, 2010
Wellllll
A sample size of one has to tell you something, because otherwise a sample size of ten thousand couldn’t tell you anything. It just tells you not very much at all, and our brains are really bad at handling Bayesian Inferences. All in all, you’re better off thinking of it as worth nothing rather than being worth some small figure. We’re just too prone to vastly overestimating what the very small number is to bother with it.
Graham MacAree - March 1, 2010
I think everyone should play a few dozen freeroll poker tournaments...
Once you get over the anger and frustration, they really are amazing for realizing how bad your tendency to extrapolate from insufficient sample sizes is. That’s assuming you actually pay attention to the odds, what actually happens, and your natural response to what actually happens.
Sidi - March 1, 2010
We generally treat all events as equally informative
So the first at-bat is no more or less informative than the 17th (taken in isolation). There is a statistical quantity called “information” which increases in relation to the inverse of the variance.
For the record, I don’t like the notion that there is some point at which sample sizes become “reliable”. Everything’s on a continuum; if you tell me how “reliable” you want your estimate to be, I’ll come back with a sample size which will accomplish that. People often seem to use R=0.5 as a benchmark for reliability, but that’s a somewhat arbitrary choice.
cyberwulf - March 1, 2010
Also, thanks for these awesome articles
This is a ton of work for you. I took statistics in college and I’ve never been turned off by the sabermetrics world, finding it approachable enough to add to my understand of baseball. But these articles are a fantastic introduction and explanation.
Snuffleupagus - March 1, 2010
You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.