This is going to be a fun one.
Prerequisites for understanding: None.
Prerequisites for derivation: N/A; conceptual.

In Depth
The concept of regression towards the (I really should say a) mean is important in fields far beyond baseball analysis, so I suppose we should start with an easy non-baseball example. To Wikipedia!
A class of students takes two editions of the same test on two successive days. It has frequently been observed that the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because student scores are determined in part by underlying ability and in part by chance.
The last sentence is the critical one to understand. Most measurements of human ability are partly achieved by skill and partly achieved by luck. This means that data cannot always be taken at face value. Since we cannot always be completely confident that we've measured what we want to measure, we can apply an expected regression to the mean to get a true idea of talent. We all do this, whether we mean to or not. The rookie that comes up in September and gets a hit in his first at-bat? The numbers say he's on pace for a career batting average of 1.000. Does anyone expect said rookie to never make an out in his life? Of course not.
The interesting question is which mean to apply our expected regression towards. What if our rookie is reckoned by scouts to be an excellent pure hitter? What if he's a guy who swings from the heels and misses half the time he offers at a pitch? Clearly, we expect different batting averages from the two, and one at-bat isn't going to influence our expectations either way. We'd regress the first player towards the 'good hitter' population mean, and the second towards the 'bad-hitter' population mean. Eventually (given enough at-bats), we simply use their career numbers as the population mean for the player. This is a shortcut rather than being analytically rigourous, as some element of randomness always influences career numbers, meaning that barring other information players should always be expected to be slightly more average than they have been historically. It's not a big effect, however; I merely highlight it to demonstrate the difficulty in choosing the mean towards which we expect a player to regress.
Clearly the idea of regression towards a mean is the force behind the need for large samples of data in order to have strong conclusions about talent level. However, while we intuitively know that one at-bat doesn't mean a whole lot, we don't have a good grasp on how strong our conclusions are for any apparently reasonable sample size. This is dangerous, and leads to poor conclusions and arguments, as well as the occasional misinformed request for larger sample sizes. The requisite sample size depends on the proportion of skill and luck inherent in a measurement - higher means a lower sample size is required for a given level of confidence in one's results, and vice versa.
Lies, Damned Lies, and Statistics
We all know that numbers are manipulable, and that it's possible to draw completely ludicrous conclusions from them that simply don't bear up to even basic common sense. With a good grasp on the theory behind regression towards the mean, one can avoid the pitfalls of putting too much faith in a poor sample size. However, we remain unsure of what sample size is actually required for a given metric until regression's close cousin correlation comes into play. Regression also does not protect us from statistical arguments based on irrational theories of value (i.e. over/undervaluing a specific skill or statistic).
Quick Notes
What Follows
Correlation; hitting, pitching, and fielding metrics.
8 recs | 43 comments
This is probably the most common mistake people make
You have to make sure you’re regressing players towards a standard that’s actually meaningful. Some stats depend on individual career averages, such as career hitter BABIP, rather than adjusting them to the league average.
OlSalty - February 15, 2010
Regression can get pretty complicated
I don’t understand it completely all the time either. It would be kind of interesting if somebody did a “how to” for regression sometime. I’m more number driven than most but I think I need something like that to really get it.
Also, hitter BABIP should probably be regressed to the career average and league average. Not sure exactly how you figure out how to do that but i think that’s the right way to go about it.
Edgar for Pres - February 15, 2010
I thought Devil Fingers did a pretty good job in
THIS. You get a chance to see regression applied to career platoon splits. Plus he name drops your favorite SS.
Andy Hellicksonstine - February 16, 2010
That's true, for smaller samples you do need to regress it towards league average as well
OlSalty - February 16, 2010
Also important:
Regression is not a term that means “Getting worse.” A player that ages and gets worse is not regressing, they are aging. A player that was good and starts to suck – same thing. Regression only refers to when their numbers were more or less than a mean, and are taking back towards a mean.
CapSea - February 15, 2010
Also
http://www.hardballtimes.com/main/article/but-i-regress/
vivaelpujols - February 15, 2010
From Shyster's facebook (I hope I didn't make it too big)
baetown415 - February 15, 2010
It should also be noted that some stats are more influenced by skill than others
Strikeouts and walks are very skill dependent for pitchers (although they obviously contain a lot of luck in them through the batter and umpire), so you would regress a player’s strikeout rate less than his BABIP, which is much more luck dependent.
That’s pretty obvious, but it’s sometimes overlooked when projecting players going forward.
vivaelpujols - February 15, 2010
I'm going to address this in the correlation post
But I did mention it:
Graham MacAree - February 15, 2010
Aah
I must have glossed over that sorry.
vivaelpujols - February 15, 2010
The line "This is a shortcut rather than being analytically rigourous", that's pretty good.
It approaches a topic that might be worth expanding on, precisely the difference between what you guys do and baseball fans like myself. Often times it appears to me that there is a disconnect in how some people view your work. Just my opinion.
Also P values, recently I read an article that discussed misconceptions in exactly what P value means (within the scientific community, it was a very interesting read).
Kermit. - February 15, 2010
Do you have a link to the article?
vivaelpujols - February 15, 2010
Finally found it, interesting interview.
Link.
Kermit. - February 15, 2010
Thanks, good read
vivaelpujols - February 16, 2010
I found that article a little bit of a mess, honestly
How to interpret a p-value:
Suppose we are interested in looking at the difference in means between two groups. Say that the observed difference is 2 units, with a p-value of 0.05. This means that, if there were truly no difference between the two groups and we took repeated samples of the same sizes as in our initial experiment, we would expect to see a difference as large or larger than 2 units in 5% of these samples.
Informally, the p-value expresses how “surprised” we would be to see a discrepancy as extreme (or more extreme) than the one we observed if, in truth, there were no difference between the groups being compared.
cyberwulf - February 16, 2010
From a non-math view point, when P value is being discussed in a thread for example.
It’s not uncommon at all to misunderstand the concept if a person tries to define it from the context of the conversation. The usual take on it is that P value means it’s right or wrong, or more or less accurate. The repeatability of the results doesn’t really come across.
Excellent definition by the way, very clear.
Kermit. - February 16, 2010
Graham, why don't you add these to StatCorner as well so they are all in the same easy to find location?
I can see this being a go-to guide by its completion, but finding it all on the blog may be more difficult than having them on the same page at StatCorner.
Top stuff so far.
EnglishMariner - February 16, 2010
Don't worry, I'll figure out a good place to store these to use for easy reference.
Graham MacAree - February 16, 2010
I would really like to hear feedback from people who "don't get" maths, by the way
Graham MacAree - February 16, 2010
i am one of those people.
when I took the SAT years ago my composite score was a 1020. I scored around a 700 on the verbal portion. I’ll let you do the math because I sure as hell can’t.
That being said, I find statistics fascinating and do my best to learn as much about them as I can. The last edition on game state was pretty straight forward but this one will take a few more read throughs before I think I’ll get it. You are doing as good a job as you probably can at making it accessible to us math dummies.
thewyrm - February 16, 2010 via mobile
So far so good, I like that you are keeping the articles short and also I enjoy the bullet point conclusion for easy reference.
EnglishMariner - February 16, 2010
I agree, nice and concise.
Eyeball Kid - February 16, 2010
For someone who never took a statistics course? This is priceless.
Thanks for taking it back to basics. It’s not just a benefit to all of us, but eventually to you guys, too, since it should both attract more readers and improve the overall quality of the comments (although that would require determining the mean comment, wouldn’t it?)
diderot - February 16, 2010
Misuse of "regression to the mean"
As a statistician, I find this concept to grossly misused, because people tend to use it to grind an axe, rather than make a substantive point, or they don’t understand what it really means.
Here’s the fact. Given a single observation of a variable, the next observation is more likely to be closer to the mean of the variable than the given observation than it is to be farther from the mean.
It does not mean that it is guaranteed to happen. Repeat it isn’t guaranteed. Secondly, we never know what the mean actually is. The mean involved when looking at a player’s performance is the measure of his true ability (whatever that is). When a player’s stats improve from his first year to his second year, there are two possibilities. The first is that the second year was one of those cases when the less probable event happened, the second is that the second year did represent regression to the mean, and his first year was an outlier. You really can’t analyze this until a player’s career is over.
I
New England Fan - February 16, 2010
Why would the player's career ending allow us any more confidence in our estimate of his talent level?
Graham MacAree - February 16, 2010
Yeah, I don't agree with that
Even Hank Aaron’s career is only a sample of his true ability.
vivaelpujols - February 16, 2010
Well, we certainly have more data points.
It seems like we could have more confidence than just looking at his first 2 years. If a player plays regularly for a number of years, seeing his career track can give you a better idea of which seasons were more likely to be closer to his true talent level and which ones were more likely to be skewed by random occurence. There are still problems galor with that, but it should give you a better perspective than just 2 seasons.
nathaniel dawson - February 16, 2010
Well, yes, of course
What, though, is the significance of the fact that there will be no forthcoming datapoints? There shouldn’t be one, and that’s what ‘You really can’t analyze this until a player’s career is over’ implies.
Graham MacAree - February 16, 2010
Yeah, I didn't understand that either.
I think what New England Fan was saying (and please correct me if I’m wrong) is you can’t judge a player’s true talent level until after he has retired, because if you do, you’re ignoring forthcoming data points. But if he retires, you’re in the exact same spot, so it’s pretty much a moot point.
Also, we can do a pretty good job of estimating a player’s true talent level after a certain number of years. Frank Thomas could have retired two years ago — or two years from now — and we’d still have a damn good idea of just how talented he was for his career.
Teej - February 16, 2010
To pick a nit
Strictly speaking, this is not true. Here’s a counterexample:
P(X=1) = 0.49
P(X=0) = 0.49
P(X=0.5) = 0.01
Clearly the mean of X is 0.5. But if I observe X1 = 0.5, and we assume independent draws from the distribution, then P(X2 is further away from the mean than X1) = 0.98.
Sure, this is a contrived example. My point is that we need to be careful about blanket statements, since they can be confusing to people who may have trouble differentiating between hard mathematical truths and softer “rules of thumb” which usually hold in practice but have exceptions.
cyberwulf - February 16, 2010
Oops
Should read P(X=0.5) = 0.02
cyberwulf - February 16, 2010
Nit successfully picked
Which is another reason why the whole concept can be totally misused. In fact, an individual item could even hit the mean exactly which means that the probability that the next observation is closer to the mean is 0.
New England Fan - February 16, 2010
I remember reading this article a while back and it was very helpful in my understanding of regression
Staturday: Small sample size
Dewey N - February 16, 2010
Agreed
Jeff Sullivan - February 16, 2010
Yep
I’ll be cribbing heavily from that article when I go into more depth about sample sizes
Graham MacAree - February 16, 2010
Thanks Graham
This is a great resource.
Attractive Nuisance - February 16, 2010
Love it
Question… Is there a guide (or are you planning one) on general places to “expect” improvement or decline vs. the mean?
Not all of these examples I understand to be true may even be accurate, but just goes to show how much I need help here…
-Older players are eventually going to decline, younger ones with upside have the opportunity to improve
-Hitters seem to cement their career ceilings sooner than pitchers
-Power and basic old player skills decline faster/sooner
-26-27 year old position players with MLB experience are prime candidates for breakout years (read this somewhere once, pretty sure it’s a James thing)
-Major injuries often result in decline (what are some of the most and least recoverable?)
-Ichiro will play until he’s 100 and fuck trying to project him as anything less than Ichiro
seattlecougar - February 16, 2010
Also
Are certain stats better candidates for regression than others? I see it floated around a lot that certain stats tend to stabilize much more rapidly than others, but I’m not sure I’ve ever really seen it compiled which these are (on both ends of the spectrum)
seattlecougar - February 16, 2010
I will have a piece on projections and one on sample size stability.
Those pieces can kind of stand alone but the real value will be in integrating them all with one another.
Graham MacAree - February 16, 2010
Is the sample size stability going to come from Pizza Cutter's work?
I don’t know math enough to get a good feel for his study — but I think there was some question about the thresholds he used to determine stability. Is this something that you feel pretty comfortable with?
nathaniel dawson - February 16, 2010
I was planning on using it, but I haven't taken a look at it in a while
Graham MacAree - February 16, 2010
I'm curious, what were the problems you heard about it?
vivaelpujols - February 16, 2010
There's a problem with that, see?
I don’t know the math to be able to tell you. There was a discussion of it over at The Book right after he did the work, and some of the discussion was about, uh, I think it was about statistical significancy. R levels, I believe. I believe the confidence level he used was .50? Does that sound right?
As you can tell by reading this, you really better look at it for yourself. And I don’t know whether it was really considered a problem — there were just some questions that came up about it.
nathaniel dawson - February 17, 2010
You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.