Coming up with a good model for park factors is a rather difficult proposition. You want to include as much data as possible, but all of it is fraught with bits of bias. It helps to think of it like attempting to quantify pitching. We have a simple end product, runs allowed, but there are so many things that go into producing that result and we have to wade pretty deep in order to extract the parts of it that are due to pitching and strip away the parts contributed by the defense, the park, random luck and the quality of the opposing offense.
Park factors are a lot like that. We have a simple result, runs scored in one park versus runs scored in another, but there's a lot that factors into that result and most of it is not a direct result of the influence of the particular park. Last season, New Yankee Stadium garnered a reputation for being a hitter's haven because of all the high-scoring games played there. However, the 2009 Yankees had one of the best offenses in baseball; so how much of the run scoring was because the park was friendly to hitters and how much of the run scoring was because the Yankee hitters weren't so friendly to opposing pitchers?
There's no easy answer to these problems. We can arrive at what we consider some decent approximations by using multiple years of data and trying to control for the variety of influences provided by the hometown nine. What began as just one number for each park, its overall run factor, has since blossomed into finding park factors for nearly every stat out there down to even the strike zone.
This might have been done already for all I know, but one part I have long been interested in is how parks effect batters based on handedness. We have deep personal knowledge of how Safeco Field plays differently for left-handed and right-handed hitters. It can be such an extreme split that using just one overall home run factor for Safeco significantly shortchanges right-handed hitters and overcompensates lefties. Investigating a way to rectify that has been on my to do list for a while now and I have finally gotten through a first pass at constructing a way to tackle it.
I won't go into the bitty gritty details but the high-level concept goes like this. To figure out the strikeout factor for left-handed hitters (LH K) in Safeco I take four pieces of data:
A: The number of plate appearances made by a hitter from the left side at Safeco (regardless of team)
B: The number of strikeouts recorded during those (A) plate appearances.
C: The number of plate appearances made by a hitter from the left side during Mariner away games (regardless of team)
D: The number of strikeouts recorded during those (C) plate appearances.
B/A gives you the ratio of Mariner-related* at bats in Safeco that yielded a LH K.
D/C gives you the ratio of Mariner-related at bats not in Safeco that yielded a LH K.
*Mariner-related refers to any at bat that the Mariners are a participant in whether as the offensive or defensive team.
Taking the first ratio (B/A) and dividing it by the second ratio (D/C) gives you a ratio of the ratios, which is your de facto Safeco Field park factor for LH Ks.
My reason for defining the samples this way is to cancel out the home team bias as best as possible and to match up the samples. If, for example, you replaced D/C with the ratio of LH Ks on all at bats in the American League then the makeup of the Mariner hitters and pitchers would dramatically affect the resulting park factor. Mariner hitters and pitchers would make up half of the B/A sample but only 1/14th of the D/C sample. By using only games the Mariners play in for calculating D/C, they make up half the sample in both. In theory, if the Mariner hitters/pitchers struck out a lot at home, they would do the same on the road as well, canceling their impact out and leaving only the park's influence behind.
In theory. There are still refinements left to be made which is why this is a first stab. It would help if teams played a balanced schedule, but with our skewed schedules there is still going to be some unduly large influences on D/C by each team's division mates. I welcome any constructive suggestions on how to deal with this and any other statistical issues. Still, I think this is a good start and a better picture than other park factors give us. With no further boring exposition, here are the Safeco Field results covering 2007-present. A factor greater than 100 indicates that Safeco helps to increase that stat:
| Factor | LH | RH |
|---|---|---|
| K | 108 | 110 |
| BB | 108 | 102 |
| HBP | 84 | 108 |
| GB | 97 | 97 |
| FB | 90 | 92 |
| LD | 105 | 106 |
| IF | 105 | 104 |
| 1B | 105 | 99 |
| 2B | 84 | 103 |
| 3B | 70 | 90 |
| HR/BIA | 97 | 91 |
| wOBA | 97 | 97 |
K, BB, HBP, GB, FB, LD and IF are all factored on a per PA basis since they are all discrete possible results of a PA. 1B, 2B and 3B are factored on a per batted ball basis. HR is factored by balls in the air (i.e. non-ground-ball batted balls). wOBA is based on what the league average line would have looked like given the above factors.
6 recs | 79 comments
This is really cool.
I can understand the big differences for RH and LH with doubles and triples, but do you (or anyone else for that matter) have an idea as to what causes RH to get HBPs so much more often?
Since it’s a ratio the amount of righties vs lefties hitting shouldn’t cause the swing. Do pitchers tend to hit same-handed batters more than opposite handed batters? Maybe that’s the answer.
brayden04 - April 22, 2010
I'd be on that being a small sample size anomaly
Jeff Sullivan - April 22, 2010
Yeah, that sounds about right
3 years sounds like a lot of data points for something like , but really, HBP’s happen seldom enough that there could be a big dose of random occurence with that stat.
nathaniel dawson - April 22, 2010
Likely the same reason walks and strikeouts go up
In a tough offensive environment for batted balls like Safeco, hitters try to work counts more. The more pitches they see, the more likely they are to get hit by one.
Matthew - April 22, 2010
Pitching inside deliberately?
I might well be barking up the wrong tree here, but is there a case for pitchers deliberately throwing away from LH hitters (to mitigate field advantage – if they hit a ball way outside the zone it’s more likely to go to LF, no?), and likewise inside to RH hitters?
If you’re pitching to Mr Generic Left and want to pitch outside the zone to him to illicit swinging strikes, given Safeco’s dimensions aren’t you more likely to do so outside than inside, so that on the offchance he makes a connection it’s less likely to leave the park?
The same would then also apply to BB being balanced the other way.
MarkE - April 23, 2010
My brain wants to read that as A First Stab at Hand
ed Park Factors
lemonverbena - April 22, 2010
For getting the opponents leveled...
couldn’t you divide by # of games played against each team? then you’d get a single number for each team.
Lucas Cervi - April 22, 2010
Discrepancy in doubles and triples
Would that be skewed at all by the fact that Ichiro played in RF for a lot of those at bats and Raul Ibanez played in lF?
Schaefer - April 22, 2010
er, LF
Schaefer - April 22, 2010
It's already adjusted for that
because we are comparing results in Safeco to results in other Mariners games. BIP by RH batters (which are more likely to end up in LF) during M’s away games are the standard by which BIP by RH batters at Safeco are judged. The influence of the defense is eliminated by using the same defenders in both the control and test groups.
Sukafish - April 22, 2010
Ah, I see
For some reason I read it as only account for Mariner at bats on the road, but it makes more sense now.
Schaefer - April 22, 2010
Probably because
Safeco has a huge LF/CF gap that righties hit into. The RF/CF gap isn’t nearly as big.
brayden04 - April 22, 2010
I wonder...
Are triples usually pulled? Seems like they might be one hit that is more likely to come by going the opposite way and catching the defense out of position.
nadzor - April 22, 2010
Yeah it would seem like the
LHB would have an advantage on triples. Also they’re closer to first by a step. But I dunno, the gap in left-center is pretty large. This is just my guess, Matthew probably knows something I don’t.
brayden04 - April 22, 2010
But it's harder to hit a triple to left field than right field, since the throw to third is much closer
seattlebruin - April 22, 2010
These factors aren't set up for which field the ball is hit to.
They merely reflect the handedness of the batter. Nothing about the data Matthew has presented us suggests that RHB have significantly more triples because of the LCF gap. It is entirely plausible that the RHB push the ball into RF instead of pulling it to LF on their triples. What we know from this data is only that RHB are more likely than LHB to get triples in Safeco.
harkening - April 22, 2010
You don't even know that.
What we know from this data is that RHB are less harmed by Safeco than LHB as far as getting triples.
To look at it another way, most triples come from the RF/RCF area. Safeco has a shorter distance there so triples are cut down. Since that’s the pull gap for LHB, they are more adversely effected by the shorter gap and so, percentage-wise, they hit fewer triples than they would in a neutral ballpark.
Matthew - April 22, 2010
Nice work Matthew
I was wondering how a park could influence Ks and BBs so much. And then I thought “shadows”.
Sukafish - April 22, 2010
They've constantly fiddled with Safeco's batter's eye, also.
Interconnected? Probably.
thehemogoblin - April 22, 2010
Have the messed around with it much since '07?
I feel like those different backdrops were before then.
yuniform - April 22, 2010
That black wall in center field has been tinkered with several times over the years.
At one point (’02?) there were 20 or so trees planted in front of it to reduce glare. I believe it currently has a matte black honey comb covering. This is from memory, it used to be a frequent story in the sports page.
Kermit. - April 23, 2010
It's like there was a new backdrop every month
Jeff Sullivan - April 23, 2010
Between Olerud and Boone falling off a cliff at the plate, and all the bitching about Cammy's strikeouts...
…it sure felt like it. Google isn’t helping me much, but apparently even Ichiro and Edgar went to management about the wall, and even tried to get the roof closed during certain hours of the day. And I also didn’t know Cameron had eye surgery on one eye, guess that didn’t help his strike outs either.
Kermit. - April 23, 2010
That is so strange that the park factor for wOBA for right and left handed hitters is the same
And that HR/BIA has such a slight difference
Dewey N - April 22, 2010
Why do righties get HBP so much more?
killer_ewok18 - April 22, 2010
Maybe more right handed pitchers?
Seems like it would be easier to go inside and miss if you were same handed
superJAYdude7 - April 22, 2010
So you don't overanalyze the HBP and 3B factors
Please note that, since 2007, there have been:
160 HBP at Safeco
156 HBP on the road
59 triples at Safeco
77 on the road
Introduce handedness and you’re talking about really small sample sizes.
Jeff Sullivan - April 22, 2010
I kinda liked it better when I thought Safeco was harmful to righties.
Literally.
brayden04 - April 22, 2010
I'm as surprised as you are
Jeff Sullivan - April 22, 2010
I'm a little too surprised
The HR/BIA makes sense to me but the final wOBA surprises me a lot.
Edgar for Pres - April 22, 2010
I will talk to Matthew when he returns from lunch
Alternatively, you may talk to Matthew, in this thread, when he returns from lunch.
Jeff Sullivan - April 22, 2010
Yeah trying to process some useful thoughts
Edgar for Pres - April 22, 2010
Can we ask you questions to ask him?
Dewey N - April 22, 2010
Matthew could you put up the sample size for all these variables.
Edgar for Pres - April 22, 2010
Along those lines, is it too much work to get the range of the true park factors for each of these doing a 95% confidence interval?
Dewey N - April 22, 2010
Yes
Matthew - April 22, 2010
Figured so
Dewey N - April 22, 2010
I was wondering about this myself
It looks like your method reduces bias by limiting the sample size. I get that it’s lots of work to include confidence intervals, but it’d be a useful future addition.
Nadingo - April 23, 2010
Do FB include IF?
Edgar for Pres - April 22, 2010
No, there's a separate IF factor
Matthew - April 22, 2010
So these numbers are showing me two things.
First is the obvious one that we already knew which was that left handers hit more home runs because the RF wall is shorter. For RHB, the hits that would normally go for HR in most other parks might go for 2B instead so they see a boost in their 2B total which is related to their HR decline.
The other thing isn’t really related to park factors for LH/RH splits but it appears that pitchers are definitely changing their approach. Instead of pitching to contact they are probably going for more strikeouts however I don’t really understand how they are doing this. The number of flyballs drops pretty dramatically and the number of groundballs also drops a little with a rise in line drives and infield fly balls. Since this is on a per PA basis the drop in ground balls might be caused by the increase in strikeouts and walks. This means that we have a decrease in flyballs and an increase in line drives and IFFB.
Maybe pitchers feel safe in Safeco and throw more stuff high in the zone (increase in strikeouts and infield fly balls) and when they miss with these pitches high they end up either walking guys or throwing a fastball down the middle of the plate (increase in line drives).
Edgar for Pres - April 22, 2010
There's also classification bias measured here.
Different stringers have been shown to be more liberal/conservative on what’s a line drive, what’s a fly ball, etc
Matthew - April 22, 2010
Yeah fucking people ruining all our data
Edgar for Pres - April 22, 2010
Do batters take more pitches on average at Safeco?
It seems like with the increased walks and strikeouts you’d see a higher average pitch count.
sigalert - April 22, 2010
The wOBA drop is weird
What’s the blanket run factor again? Something like 94? Wouldn’t we then expect a wOBA split to match?
Graham MacAree - April 22, 2010
Yeah I am not sure what to think about that. I would think they'd match.
Small sample? He used enough years where it should have been ok.
What do you get for Runs/PA using this method?
Edgar for Pres - April 22, 2010
I have a thought on what might drive that wOBA figure higher.
I’ll look into it later today.
Matthew - April 22, 2010
Fixed.
I neglected to adjust the # of batted balls available for 1Bs/2Bs/etc based on the K, BB and HBP factors. Safeco drives more Ks and BBs and thus fewer batted balls, which means fewer opportunities for singles, doubles, triples, home runs and reaching via error.
wOBA impact to LHB is 97.2, RHB is 96.6
Matthew - April 22, 2010
Safeco hurts clutch hitting?
I’m not being entirely facetious.
Bearskin Rugburn - April 23, 2010
wOBA is 99 for both handed hitters!?
I would have suspected it to be lower, especailly for righties. Very interesting!
ARock - April 22, 2010
So I guess there is a question I have with this stuff
Do pitcher or hitters (or both) change their approach in Safeco? If they don’t change their approach then all these effects can be attributed to field shape and environment. If they do change their approach then we are seeing some really complex behavior going on I think.
It might help to try to separate these effects by seeing how hitters/pitchers are changing their approach. This gets to be a lot more of a pain in the ass. The stuff you’ve done in this post is useful and simple and the best part is it uses data that is reasonably easy to get.
It would be really interesting to look at a spray chart for LHB vs RHB at Safeco and not at Safeco and then subtract them to see the differences. Maybe RHB stop trying to hit home runs and are trying to pull the ball down the line for doubles more.
Also pitch type or pitch f/x stuff could be very useful to look at how pitchers are attacking batters.
Both of these would take significantly more time and are kind of data fishing hoping something interesting pops out at the end.
Edgar for Pres - April 22, 2010
If they change their approach because of the park, I'm calling that a park factor.
Matthew - April 22, 2010
Yeah of course. Its very hard to seperate that sort of stuff
Now we have the problem though when we look at this data and ask why this happens. Either explanation of environment or approach is valid but with just these numbers its difficult to say why something is occurring.
Its just an interesting question I think of how much a hitter or pitcher adapts their approach based on the park they play and I think there are a lot of small things to look at. Do only home teams adjust their approach?, etc. Its tough to study though because of all the reasons park factors are tough to study.
Edgar for Pres - April 22, 2010
That would be my initial reaction.
If hitters or pitchers are changing their approach, and that affects the results, then in my mind, that’s part of the overall effect of the park their playing in.
nathaniel dawson - April 22, 2010
Is BABIP similar for LHB and RHB?
Edgar for Pres - April 22, 2010
Awesome!
Love to see this sort of analysis. Blanket park factors based on only a few offensive events tell us only part of the story, we’ve been missing a lot about the different ways Safeco affects what happens on the field.
Umm…..any chance you’ll expand this to pitcher handedness? Park factors in general have focused on hitters, I’ve seen nothing that breaks down how pitchers are affected. For instance, it’s always been assumed that left-handed pitchers are helped more by Safeco than right-handers, but we only assume this because of how hitters are affected, and sort of reversing that for the pitching side. It’s probably a pretty fair assumption to make, but we really don’t know that for sure, or how much and in what ways it affects pitchers by handedness. It’s kind of frustrating to have some pretty good information about one side of the eqaution, but having almost nothing for the other. I’d love to see someone smarter than me come up with a method of evaluating this.
(Like offense doesn’t provide you with enough wading already)
nathaniel dawson - April 22, 2010
They're blind park factors. They're aren't focused on hitters or pitchers.
LHPs benefit from Safeco by forcing opposing managers to use more RHBs or else concede a platoon advantage.
Matthew - April 22, 2010
Blind park factors?
I’m not sure what that phrase means. When you listed the different events and how Safeco plays differently than a neutral park, it was in regards to how hitters are affected, is that right? So you looked at the plate appearances by a left-handed hitter, and found out that they struck out at a rate that’s 10% greater than in a neutral park, and right-handed hitters struck out at a rate that’s 8% greater than in a neutral park, etc.
So that looks like it’s from a hitter’s perspective. What I haven’t seen is how pitchers are affected. Like do left-handed pitchers allow more doubles in Safeco than on the road? Do they allow fewer homers, or more? How does that compare with how right-handers are affected? How does Safeco affect strikeouts for LH and RH pitchers?
That’s the stuff I haven’t seen before. All the park factors I’ve seen that have involved splits for handedness have always been about the hitting side, with nothing about the pitching side. What do we know about how Safeco Field affects pitchers by handedness?
That’s what was on my mind when I posted that question. Is it possible to take the same approach and apply it to pitchers? Is that something you would consider exploring?
nathaniel dawson - April 22, 2010
I understood what you meant.
I’m saying the hitters face pitchers. The factors aren’t from a hitter’s perspective, they’re from a plate appearance perspective. It’s connected. The factors aren’t different for pitchers, how could they be?
Matthew - April 22, 2010
When determining park factors.
Ichiro has been out in right field practically since they opened the stadium. Does that cause any sort of problems with determining park factors?
Kermit. - April 22, 2010
No
Matthew - April 22, 2010
He played CF for a season.
Eyebrows - April 22, 2010
Yes he did! But his longevity in right field in a new stadium is a bit unique.
I was curious if that caused any noteworthy discussion during the process of planning this project. How these statistical studies are put together is interesting.
Kermit. - April 23, 2010
The marked favorability of IF hits is the one that puzzles me the most.
Since all infields are inherently alike. Does Ichiro help skew that LH stat? Are IF’ers playing deeper at Safeco to help the outfielders?
sigalert - April 22, 2010
Infield flies, not hits
Matthew - April 22, 2010
Oops- of course, thanks.
sigalert - April 22, 2010
Also infields are not all inherently alike
Grass length and composition can greatly affect how fast the ball rolls.
Matthew - April 22, 2010
And of course there's a giant difference between a grass infield and a turf one too
Graham MacAree - April 22, 2010
I thought of that.
But with only two stadiums still using turf I didn’t think they alone could sway the numbers too far either way.
sigalert - April 22, 2010
As well, how the dirt portion is configured can introduce bias to batted ball types as reported by the stringers.
Press box height and positioning, too. Hit F/X could probably clear some of that up for us, but different stadiums likely yield different bias with batted ball types just based on perccption, rather than actual differences in how the ball is struck.
nathaniel dawson - April 22, 2010
Just wondering if this is easy
Can you do these calculations for Mariners hitters and non-Mariners hitters and then use the 1/14th factor to do a weighted average?
I think your thinking is pretty sound but there are different types of hitters besides just LH and RH. If Mariners hitters are affected by Safeco differently than the average hitter then this will skew your results. If its hard then don’t worry about it. No reason to waste time for what might just be a marginal improvement.
Edgar for Pres - April 22, 2010
I don't foresee any value in doing that.
Matthew - April 22, 2010
That's fine, just thought I'd bring it up.
There are a thousand things that could be done that might or might not make things more accurate and all take time.
Edgar for Pres - April 22, 2010
Initial thoughts
Assuming looking at LH v RH only, if you sampled the stadiums you may find a breakdown something like 30% favor LH, 40% neutral, 30% favor RH. This would break down to something like 9 Ball parks, 12 and 9. But if the D/C ratio you proposed is skewed by 1 to 2 parks that “extremely” favor a certain side, then this can skew the results just on a purely empirical value given here.
Unfortunately you would nearly have to cross reference every park compared to another (like a 30×29 matrix). At the moment I assume it flattens your results. Safeco could be LH friendly, but if a couple of other parks are more so, and to a more extreme extent than other parks are RH friendly, then they would drop the ratios quite considerably. The limiting factor is basically 30 ball parks, not the sample PA’s, as 30 is pretty small in statistical evaluation, and the deviations between each eack park would need to be examined first.
Anyway could be completely wrong – 1.09am, had some bourbons, have only just started to post here and looking forward to RRS tomorrow.
aussie_m's_fan - April 23, 2010
I thought about that cross-referencing as a possible solution to the unbalanced schedule issue
but was unsure how really to go about doing it in a worthwhile manner.
Matthew - April 23, 2010
would be a tough one
as you would also probably need to break it down at a per outcome verse each park ie a LHB getting to 3B may be 30% more likely per PA at the Coliseum than the average, and 10% more likely at Safeco. But as the denomintor would have more samples of the Coliseum than say Yankee Park (that could be average), then you get an outcome from your calcs as the Safe maybe showing 107, or playing 7% more likely for that outcome.
On the reverse, due to a park design issue, maybe a larger outfield so it makes 3B hits more likely than a smaller park where it would result in double, the Coliseum could play at -10% the average for a LHB to get a double. Then using this method it would cause the Safe value to go higher.
End of the day – will be a bitch to work out – and would have to expect a certain +/- factor to be inherent and unavoidable. Just have to work out how to minimise the +/-
aussie_m's_fan - April 23, 2010
Sample Size and other statistical thoughts
Your methodology is pretty reasonable, except that I think you are overweighting the Mariners offense. The Mariners will account for 50% of the offensive statistics. I think you should either reduce the weight of the M’s games to 1/14, or simply compile the statistics for only the teams that aren’t the Ms.
I do wish that you would consider incorporating a margin of error into the results. I suspect that some of the apparent differences are not statistically significant. Reporting a figure as 108 +/- 10 or whatever the margin of error might be would make it a lot easier to evaluate the meaning of the numbers. The methodology for all the “new” stats is great, but without a margin of error, their predictive validity is suspect. When stats like WAR are reported, they are used as though they were exact, but the reality is that it’s not that exact. Depending on various factors, there may or may not be any significant difference between a WAR of 3 and one of 2. Incorporating a MOE would improve the ability to interpret these data.
New England Fan - April 23, 2010
They account for 50% on each side.
I’m not going to drop the Ms weight to 1/14 because that overweights the opposition, which as I said isn’t normally distributed thanks to the unbalanced schedules. And you cannot toss out Ms data because then you have no data. Every play in Safeco involves either the Mariners offense or the Mariners defense. Tossing out the offense but leaving the defense makes no sense and you just halved your sample.
Stats aren’t used as though they’re exact. Reporting MOE on every stat we use would be cumbersome and pointless. This is baseball. This isn’t economic forecasting or drug testing. It’s okay to be informal about the terminology. People understand these aren’t exact and that sample sizes are important. If they don’t, well, that’s their problem and they’re not going to know because we included +/- on top of everything. They won’t even read it.
As for the MOEs here, give me the equation you want used and I’ll give them to you. I’m unsure whether these constitute random sampling or not. It’s been a few years since I was a practicing statistician. I have other things to remember now.
Matthew - April 23, 2010
I'm flabbergasted
Bearskin Rugburn - April 23, 2010
You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.