Beyond Baseball

Sunday, January 21, 2007

Park Factors Solved!

Park factors are one of many sabermetric tools. They allow sabermaticians to adjust a player's raw statistics to what they would be in a league average environment. Simple park factors can be calculated very easily. For example, to find the run park factor (or just run factor) of Team A we'd use the following formula:

((RShome+RAhome)/Ghome)
----------------------------
((RSroad+RAroad)/Groad)

The bigger the number the "easier" it is to score runs in Team A's home park. Using this basic formula park factors for other events (such as singles, doubles, triples, and homeruns) can easily determined. Unfortunately, this simple formula has become inadequate. John Shea, an economics professor at the University of Maryland wrote a paper illustrating the problems this formula is ill-equipped to address. In the paper Shea outlines that the advent of the unbalanced schedule, interleague play, and differences in the overall quality of opponents are introduced. Under a balanced schedule teams play the same proportion of teams at home and on the road. That allows the formula to factor out those confounding effects. However, in today's baseball that is impossible because the denominator isn't the same for each team.
I believe that I have found a solution to this problem. My calculations are a bit more intensive but think it factors out the confounding effects that Shea outlined. To begin I start with a basic formula:

E(Event)=LA*OF*DF*PF

Here the expected rate of an event (Runs Scored, H/BIP, etc.) is dependent on four factors. LA is League Average, it's the baseline rate of that event. OF/DF are multipliers accounting for the offensive or defensive prowess of a specific team. PF is the park factor.

Looking at this formula we can see that if the teams are average and playing in an average park (OF, DF, and PF are all equal to 1) then we see the League Average rate. Factors greater (lesser) than one will increase (decrease) the observed rate. Within this framework we can isolate the PF variable to determine the appropriate park factor.

First, I determined the OF and DF multipliers for each team. Using game data from thos great people at Retrosheet, I was able to calculate estimates by isolating a particular variable. If I look at only home Twin games (all played in the Metrodome), I find the OF and DF multipliers for all Twins opponents by finding the average rate of an event for each team (i.e. White Sox or Tigers) and divide it by the average rate of all Twins opponents (thus controlling for different amount of games per opponent). I repeat this process for all 30 teams which gives me an array of OF and DF multipliers for each team. Average those numbers and voila! the OF and DF factors are estimated.

Using the appropriate OF and DF for each game, I adjust the raw rates by dividing by OF*DF. The resultant rates are now factors of just LA and PF. Since LA remains a constant for all games, I simply divide the average adjusted rate for games in a specific park by the average adjusted rate for all Major League games. The result is a park factor!

I calculated park factors for singles, doubles, triples, and homeruns on balls in play.

































Team1B/BIP2B/BIP3B/BIPHR/BIP
ANA0.941.020.980.95
ARZ0.980.991.401.07
ATL1.061.101.260.88
BAL1.020.830.680.98
BOS0.901.251.000.97
CHW0.970.940.771.28
CHC0.951.070.951.11
CIN0.971.180.541.20
CLE0.971.010.480.89
COL1.150.971.101.07
DET1.030.871.640.98
FLA1.020.941.430.84
HOU1.060.891.001.22
KC1.011.091.030.75
LAD0.921.000.541.09
MIL0.901.000.901.16
MIN1.030.950.860.96
NYY1.080.890.901.03
NYM1.040.940.760.90
OAK0.981.080.680.84
PHI1.021.061.311.17
PIT1.041.171.090.92
SD1.000.901.470.78
SF1.000.881.210.91
SEA1.060.980.710.81
STL1.011.050.641.15
TB1.010.911.310.95
TEX0.961.021.541.17
TOR1.011.060.841.16
WSH0.911.000.980.81

0 Comments:

Post a Comment

<< Home