SB Nation - Login for mobile commenting

Lookout Landing

A First Stab at Handed Park Factors

Cliff Lee!

Otto Greule Jr - Getty Images

Cliff Lee!

Coming up with a good model for park factors is a rather difficult proposition. You want to include as much data as possible, but all of it is fraught with bits of bias. It helps to think of it like attempting to quantify pitching. We have a simple end product, runs allowed, but there are so many things that go into producing that result and we have to wade pretty deep in order to extract the parts of it that are due to pitching and strip away the parts contributed by the defense, the park, random luck and the quality of the opposing offense. 

Park factors are a lot like that. We have a simple result, runs scored in one park versus runs scored in another, but there's a lot that factors into that result and most of it is not a direct result of the influence of the particular park. Last season, New Yankee Stadium garnered a reputation for being a hitter's haven because of all the high-scoring games played there. However, the 2009 Yankees had one of the best offenses in baseball; so how much of the run scoring was because the park was friendly to hitters and how much of the run scoring was because the Yankee hitters weren't so friendly to opposing pitchers?

There's no easy answer to these problems. We can arrive at what we consider some decent approximations by using multiple years of data and trying to control for the variety of influences provided by the hometown nine. What began as just one number for each park, its overall run factor, has since blossomed into finding park factors for nearly every stat out there down to even the strike zone.

This might have been done already for all I know, but one part I have long been interested in is how parks effect batters based on handedness. We have deep personal knowledge of how Safeco Field plays differently for left-handed and right-handed hitters. It can be such an extreme split that using just one overall home run factor for Safeco significantly shortchanges right-handed hitters and overcompensates lefties. Investigating a way to rectify that has been on my to do list for a while now and I have finally gotten through a first pass at constructing a way to tackle it.

I won't go into the bitty gritty details but the high-level concept goes like this. To figure out the strikeout factor for left-handed hitters (LH K) in Safeco I take four pieces of data:

A: The number of plate appearances made by a hitter from the left side at Safeco (regardless of team)
B: The number of strikeouts recorded during those (A) plate appearances.
C: The number of plate appearances made by a hitter from the left side during Mariner away games (regardless of team)
D: The number of strikeouts recorded during those (C) plate appearances.

B/A gives you the ratio of Mariner-related* at bats in Safeco that yielded a LH K.
D/C gives you the ratio of Mariner-related at bats not in Safeco that yielded a LH K.

*Mariner-related refers to any at bat that the Mariners are a participant in whether as the offensive or defensive team.

Taking the first ratio (B/A) and dividing it by the second ratio (D/C) gives you a ratio of the ratios, which is your de facto Safeco Field park factor for LH Ks.

My reason for defining the samples this way is to cancel out the home team bias as best as possible and to match up the samples. If, for example, you replaced D/C with the ratio of LH Ks on all at bats in the American League then the makeup of the Mariner hitters and pitchers would dramatically affect the resulting park factor. Mariner hitters and pitchers would make up half of the B/A sample but only 1/14th of the D/C sample. By using only games the Mariners play in for calculating D/C, they make up half the sample in both. In theory, if the Mariner hitters/pitchers struck out a lot at home, they would do the same on the road as well, canceling their impact out and leaving only the park's influence behind.

In theory. There are still refinements left to be made which is why this is a first stab. It would help if teams played a balanced schedule, but with our skewed schedules there is still going to be some unduly large influences on D/C by each team's division mates. I welcome any constructive suggestions on how to deal with this and any other statistical issues. Still, I think this is a good start and a better picture than other park factors give us.  With no further boring exposition, here are the Safeco Field results covering 2007-present. A factor greater than 100 indicates that Safeco helps to increase that stat:

FactorLHRH
K 108 110
BB 108 102
HBP 84 108
GB 97 97
FB 90 92
LD 105 106
IF 105 104
1B 105 99
2B 84 103
3B 70 90
HR/BIA 97 91
wOBA 97 97


K, BB, HBP, GB, FB, LD and IF are all factored on a per PA basis since they are all discrete possible results of a PA. 1B, 2B and 3B are factored on a per batted ball basis. HR is factored by balls in the air (i.e. non-ground-ball batted balls). wOBA is based on what the league average line would have looked like given the above factors.

6 recs  |  79 comments

Comments

This is really cool.

I can understand the big differences for RH and LH with doubles and triples, but do you (or anyone else for that matter) have an idea as to what causes RH to get HBPs so much more often?

Since it’s a ratio the amount of righties vs lefties hitting shouldn’t cause the swing. Do pitchers tend to hit same-handed batters more than opposite handed batters? Maybe that’s the answer.

I'd be on that being a small sample size anomaly
Yeah, that sounds about right

3 years sounds like a lot of data points for something like , but really, HBP’s happen seldom enough that there could be a big dose of random occurence with that stat.

Likely the same reason walks and strikeouts go up

In a tough offensive environment for batted balls like Safeco, hitters try to work counts more. The more pitches they see, the more likely they are to get hit by one.

Pitching inside deliberately?

I might well be barking up the wrong tree here, but is there a case for pitchers deliberately throwing away from LH hitters (to mitigate field advantage – if they hit a ball way outside the zone it’s more likely to go to LF, no?), and likewise inside to RH hitters?

If you’re pitching to Mr Generic Left and want to pitch outside the zone to him to illicit swinging strikes, given Safeco’s dimensions aren’t you more likely to do so outside than inside, so that on the offchance he makes a connection it’s less likely to leave the park?

The same would then also apply to BB being balanced the other way.

My brain wants to read that as A First Stab at Hand

ed Park Factors

For getting the opponents leveled...

couldn’t you divide by # of games played against each team? then you’d get a single number for each team.

Discrepancy in doubles and triples

Would that be skewed at all by the fact that Ichiro played in RF for a lot of those at bats and Raul Ibanez played in lF?

er, LF
It's already adjusted for that

because we are comparing results in Safeco to results in other Mariners games. BIP by RH batters (which are more likely to end up in LF) during M’s away games are the standard by which BIP by RH batters at Safeco are judged. The influence of the defense is eliminated by using the same defenders in both the control and test groups.

Ah, I see

For some reason I read it as only account for Mariner at bats on the road, but it makes more sense now.

Probably because

Safeco has a huge LF/CF gap that righties hit into. The RF/CF gap isn’t nearly as big.

I wonder...

Are triples usually pulled? Seems like they might be one hit that is more likely to come by going the opposite way and catching the defense out of position.

Yeah it would seem like the

LHB would have an advantage on triples. Also they’re closer to first by a step. But I dunno, the gap in left-center is pretty large. This is just my guess, Matthew probably knows something I don’t.

But it's harder to hit a triple to left field than right field, since the throw to third is much closer
These factors aren't set up for which field the ball is hit to.

They merely reflect the handedness of the batter. Nothing about the data Matthew has presented us suggests that RHB have significantly more triples because of the LCF gap. It is entirely plausible that the RHB push the ball into RF instead of pulling it to LF on their triples. What we know from this data is only that RHB are more likely than LHB to get triples in Safeco.

You don't even know that.

What we know from this data is that RHB are less harmed by Safeco than LHB as far as getting triples.

To look at it another way, most triples come from the RF/RCF area. Safeco has a shorter distance there so triples are cut down. Since that’s the pull gap for LHB, they are more adversely effected by the shorter gap and so, percentage-wise, they hit fewer triples than they would in a neutral ballpark.

Nice work Matthew

I was wondering how a park could influence Ks and BBs so much. And then I thought “shadows”.

They've constantly fiddled with Safeco's batter's eye, also.

Interconnected? Probably.

Have the messed around with it much since '07?

I feel like those different backdrops were before then.

That black wall in center field has been tinkered with several times over the years.

At one point (’02?) there were 20 or so trees planted in front of it to reduce glare. I believe it currently has a matte black honey comb covering. This is from memory, it used to be a frequent story in the sports page.

It's like there was a new backdrop every month
Between Olerud and Boone falling off a cliff at the plate, and all the bitching about Cammy's strikeouts...

…it sure felt like it. Google isn’t helping me much, but apparently even Ichiro and Edgar went to management about the wall, and even tried to get the roof closed during certain hours of the day. And I also didn’t know Cameron had eye surgery on one eye, guess that didn’t help his strike outs either.

That is so strange that the park factor for wOBA for right and left handed hitters is the same

And that HR/BIA has such a slight difference

Why do righties get HBP so much more?
Maybe more right handed pitchers?

Seems like it would be easier to go inside and miss if you were same handed

So you don't overanalyze the HBP and 3B factors

Please note that, since 2007, there have been:

160 HBP at Safeco
156 HBP on the road

59 triples at Safeco
77 on the road

Introduce handedness and you’re talking about really small sample sizes.

I kinda liked it better when I thought Safeco was harmful to righties.

Literally.

I'm as surprised as you are
I'm a little too surprised

The HR/BIA makes sense to me but the final wOBA surprises me a lot.

I will talk to Matthew when he returns from lunch

Alternatively, you may talk to Matthew, in this thread, when he returns from lunch.

Yeah trying to process some useful thoughts
Can we ask you questions to ask him?
Matthew could you put up the sample size for all these variables.
Along those lines, is it too much work to get the range of the true park factors for each of these doing a 95% confidence interval?
I was wondering about this myself

It looks like your method reduces bias by limiting the sample size. I get that it’s lots of work to include confidence intervals, but it’d be a useful future addition.

Do FB include IF?
No, there's a separate IF factor
So these numbers are showing me two things.

First is the obvious one that we already knew which was that left handers hit more home runs because the RF wall is shorter. For RHB, the hits that would normally go for HR in most other parks might go for 2B instead so they see a boost in their 2B total which is related to their HR decline.

The other thing isn’t really related to park factors for LH/RH splits but it appears that pitchers are definitely changing their approach. Instead of pitching to contact they are probably going for more strikeouts however I don’t really understand how they are doing this. The number of flyballs drops pretty dramatically and the number of groundballs also drops a little with a rise in line drives and infield fly balls. Since this is on a per PA basis the drop in ground balls might be caused by the increase in strikeouts and walks. This means that we have a decrease in flyballs and an increase in line drives and IFFB.

Maybe pitchers feel safe in Safeco and throw more stuff high in the zone (increase in strikeouts and infield fly balls) and when they miss with these pitches high they end up either walking guys or throwing a fastball down the middle of the plate (increase in line drives).

There's also classification bias measured here.

Different stringers have been shown to be more liberal/conservative on what’s a line drive, what’s a fly ball, etc

Yeah fucking people ruining all our data
Do batters take more pitches on average at Safeco?

It seems like with the increased walks and strikeouts you’d see a higher average pitch count.

The wOBA drop is weird

What’s the blanket run factor again? Something like 94? Wouldn’t we then expect a wOBA split to match?

Yeah I am not sure what to think about that. I would think they'd match.

Small sample? He used enough years where it should have been ok.

What do you get for Runs/PA using this method?

I have a thought on what might drive that wOBA figure higher.

I’ll look into it later today.

Fixed.

I neglected to adjust the # of batted balls available for 1Bs/2Bs/etc based on the K, BB and HBP factors. Safeco drives more Ks and BBs and thus fewer batted balls, which means fewer opportunities for singles, doubles, triples, home runs and reaching via error.

wOBA impact to LHB is 97.2, RHB is 96.6

Safeco hurts clutch hitting?

I’m not being entirely facetious.

wOBA is 99 for both handed hitters!?

I would have suspected it to be lower, especailly for righties. Very interesting!

So I guess there is a question I have with this stuff

Do pitcher or hitters (or both) change their approach in Safeco? If they don’t change their approach then all these effects can be attributed to field shape and environment. If they do change their approach then we are seeing some really complex behavior going on I think.

It might help to try to separate these effects by seeing how hitters/pitchers are changing their approach. This gets to be a lot more of a pain in the ass. The stuff you’ve done in this post is useful and simple and the best part is it uses data that is reasonably easy to get.

It would be really interesting to look at a spray chart for LHB vs RHB at Safeco and not at Safeco and then subtract them to see the differences. Maybe RHB stop trying to hit home runs and are trying to pull the ball down the line for doubles more.

Also pitch type or pitch f/x stuff could be very useful to look at how pitchers are attacking batters.

Both of these would take significantly more time and are kind of data fishing hoping something interesting pops out at the end.

If they change their approach because of the park, I'm calling that a park factor.
Yeah of course. Its very hard to seperate that sort of stuff

Now we have the problem though when we look at this data and ask why this happens. Either explanation of environment or approach is valid but with just these numbers its difficult to say why something is occurring.

Its just an interesting question I think of how much a hitter or pitcher adapts their approach based on the park they play and I think there are a lot of small things to look at. Do only home teams adjust their approach?, etc. Its tough to study though because of all the reasons park factors are tough to study.

That would be my initial reaction.

If hitters or pitchers are changing their approach, and that affects the results, then in my mind, that’s part of the overall effect of the park their playing in.

Is BABIP similar for LHB and RHB?
Awesome!

Love to see this sort of analysis. Blanket park factors based on only a few offensive events tell us only part of the story, we’ve been missing a lot about the different ways Safeco affects what happens on the field.

Umm…..any chance you’ll expand this to pitcher handedness? Park factors in general have focused on hitters, I’ve seen nothing that breaks down how pitchers are affected. For instance, it’s always been assumed that left-handed pitchers are helped more by Safeco than right-handers, but we only assume this because of how hitters are affected, and sort of reversing that for the pitching side. It’s probably a pretty fair assumption to make, but we really don’t know that for sure, or how much and in what ways it affects pitchers by handedness. It’s kind of frustrating to have some pretty good information about one side of the eqaution, but having almost nothing for the other. I’d love to see someone smarter than me come up with a method of evaluating this.

(Like offense doesn’t provide you with enough wading already)

They're blind park factors. They're aren't focused on hitters or pitchers.

LHPs benefit from Safeco by forcing opposing managers to use more RHBs or else concede a platoon advantage.

Blind park factors?

I’m not sure what that phrase means. When you listed the different events and how Safeco plays differently than a neutral park, it was in regards to how hitters are affected, is that right? So you looked at the plate appearances by a left-handed hitter, and found out that they struck out at a rate that’s 10% greater than in a neutral park, and right-handed hitters struck out at a rate that’s 8% greater than in a neutral park, etc.

So that looks like it’s from a hitter’s perspective. What I haven’t seen is how pitchers are affected. Like do left-handed pitchers allow more doubles in Safeco than on the road? Do they allow fewer homers, or more? How does that compare with how right-handers are affected? How does Safeco affect strikeouts for LH and RH pitchers?

That’s the stuff I haven’t seen before. All the park factors I’ve seen that have involved splits for handedness have always been about the hitting side, with nothing about the pitching side. What do we know about how Safeco Field affects pitchers by handedness?

That’s what was on my mind when I posted that question. Is it possible to take the same approach and apply it to pitchers? Is that something you would consider exploring?

I understood what you meant.

I’m saying the hitters face pitchers. The factors aren’t from a hitter’s perspective, they’re from a plate appearance perspective. It’s connected. The factors aren’t different for pitchers, how could they be?

When determining park factors.

Ichiro has been out in right field practically since they opened the stadium. Does that cause any sort of problems with determining park factors?

He played CF for a season.
Yes he did! But his longevity in right field in a new stadium is a bit unique.

I was curious if that caused any noteworthy discussion during the process of planning this project. How these statistical studies are put together is interesting.

The marked favorability of IF hits is the one that puzzles me the most.

Since all infields are inherently alike. Does Ichiro help skew that LH stat? Are IF’ers playing deeper at Safeco to help the outfielders?

Infield flies, not hits
Oops- of course, thanks.
Also infields are not all inherently alike

Grass length and composition can greatly affect how fast the ball rolls.

And of course there's a giant difference between a grass infield and a turf one too
I thought of that.

But with only two stadiums still using turf I didn’t think they alone could sway the numbers too far either way.

As well, how the dirt portion is configured can introduce bias to batted ball types as reported by the stringers.

Press box height and positioning, too. Hit F/X could probably clear some of that up for us, but different stadiums likely yield different bias with batted ball types just based on perccption, rather than actual differences in how the ball is struck.

Just wondering if this is easy

Can you do these calculations for Mariners hitters and non-Mariners hitters and then use the 1/14th factor to do a weighted average?

I think your thinking is pretty sound but there are different types of hitters besides just LH and RH. If Mariners hitters are affected by Safeco differently than the average hitter then this will skew your results. If its hard then don’t worry about it. No reason to waste time for what might just be a marginal improvement.

I don't foresee any value in doing that.
That's fine, just thought I'd bring it up.

There are a thousand things that could be done that might or might not make things more accurate and all take time.

Initial thoughts

Assuming looking at LH v RH only, if you sampled the stadiums you may find a breakdown something like 30% favor LH, 40% neutral, 30% favor RH. This would break down to something like 9 Ball parks, 12 and 9. But if the D/C ratio you proposed is skewed by 1 to 2 parks that “extremely” favor a certain side, then this can skew the results just on a purely empirical value given here.

Unfortunately you would nearly have to cross reference every park compared to another (like a 30×29 matrix). At the moment I assume it flattens your results. Safeco could be LH friendly, but if a couple of other parks are more so, and to a more extreme extent than other parks are RH friendly, then they would drop the ratios quite considerably. The limiting factor is basically 30 ball parks, not the sample PA’s, as 30 is pretty small in statistical evaluation, and the deviations between each eack park would need to be examined first.

Anyway could be completely wrong – 1.09am, had some bourbons, have only just started to post here and looking forward to RRS tomorrow.

I thought about that cross-referencing as a possible solution to the unbalanced schedule issue

but was unsure how really to go about doing it in a worthwhile manner.

would be a tough one

as you would also probably need to break it down at a per outcome verse each park ie a LHB getting to 3B may be 30% more likely per PA at the Coliseum than the average, and 10% more likely at Safeco. But as the denomintor would have more samples of the Coliseum than say Yankee Park (that could be average), then you get an outcome from your calcs as the Safe maybe showing 107, or playing 7% more likely for that outcome.

On the reverse, due to a park design issue, maybe a larger outfield so it makes 3B hits more likely than a smaller park where it would result in double, the Coliseum could play at -10% the average for a LHB to get a double. Then using this method it would cause the Safe value to go higher.

End of the day – will be a bitch to work out – and would have to expect a certain +/- factor to be inherent and unavoidable. Just have to work out how to minimise the +/-

Sample Size and other statistical thoughts

Your methodology is pretty reasonable, except that I think you are overweighting the Mariners offense. The Mariners will account for 50% of the offensive statistics. I think you should either reduce the weight of the M’s games to 1/14, or simply compile the statistics for only the teams that aren’t the Ms.

I do wish that you would consider incorporating a margin of error into the results. I suspect that some of the apparent differences are not statistically significant. Reporting a figure as 108 +/- 10 or whatever the margin of error might be would make it a lot easier to evaluate the meaning of the numbers. The methodology for all the “new” stats is great, but without a margin of error, their predictive validity is suspect. When stats like WAR are reported, they are used as though they were exact, but the reality is that it’s not that exact. Depending on various factors, there may or may not be any significant difference between a WAR of 3 and one of 2. Incorporating a MOE would improve the ability to interpret these data.

They account for 50% on each side.

I’m not going to drop the Ms weight to 1/14 because that overweights the opposition, which as I said isn’t normally distributed thanks to the unbalanced schedules. And you cannot toss out Ms data because then you have no data. Every play in Safeco involves either the Mariners offense or the Mariners defense. Tossing out the offense but leaving the defense makes no sense and you just halved your sample.

Stats aren’t used as though they’re exact. Reporting MOE on every stat we use would be cumbersome and pointless. This is baseball. This isn’t economic forecasting or drug testing. It’s okay to be informal about the terminology. People understand these aren’t exact and that sample sizes are important. If they don’t, well, that’s their problem and they’re not going to know because we included +/- on top of everything. They won’t even read it.

As for the MOEs here, give me the equation you want used and I’ll give them to you. I’m unsure whether these constitute random sampling or not. It’s been a few years since I was a practicing statistician. I have other things to remember now.

You must Login with your SB Nation account and be a member of Lookout Landing to post a comment.