Beyond Baseball

Saturday, December 09, 2006

Independence of Runs Scored and Runs Allowed.

In the February 2006 issue of By the Numbers, the SABR statistical publication, Ray Ciccolella published an article titled "Are Runs Scored and Runs Allowed Independent?" In the article he referenced a separate article in the same issue written by Steven Miller. Miller's article provided a theoretical framework for the Pythagorean Theorem (the baseball one) and concluded that Run Scored (RS) and Runs Allowed (RA) are independent. Ciccolella found that conclusion to be counter-intuitive because he thought "environmental factors" like the "ballpark, the weather conditions, and the home plate umpire" are the same for each team. Ciccolella then performed his own experiments to determine the independence. He tried three methods.

Method 1 was comparing the actual margin of victory with a randomly margin of victory and comparing the difference. The distribution for RA was the teams actual distribution for RA. He completed 5 seasons worth of data for each team. He found that the random margin of victory was larger than the actual and it also resulted in less 1-run games (which makes sense if your margin of victory is larger). Although this seems to suggest that RS and RA are not indpendent, Ciccolella was surprised that the difference between the two wasn't that big.

Method 2 found the expected number of wins given the team scored X number of runs. He found that when teams score 0-2 and 6+ runs per game, the team won less than expected. Again this suggests that RS and RA are not independent.

Method 3 was a similar experiment to previous work regarding this question and ended with similar results.

Ciccolella concluded that RS and RA couldn't be independent but their correlation was weaker than he expected. Not by coincedence Miller had similar conclusions but only took a different path to get there. Miller did say that RS and RA couldn't be independent because there are no tie games in baseball. If one team scores 5 runs, the other team can't score 5 runs. Nevertheless Miller concluded that RS and RA behave as they were independent once you correct for ties.

Personally, I find that Miller is more correct in this issue than Ciccolella. By choosing RA randomly from the teams distribution and comparing it to its RS distribution you bring in the "environmental factors" Miller talked about in the beginning of the article. Individual ballparks alone have been shown to increase or decrease scoring (which is why sabermaticians have developed a park effect statistic). When you randomly combine these two distributions you introduce the possibility of RS in Coors Field (lots of runs) and an RA in PETCO Park (not so much runs) or vice versa. Ciccolella would argue that this scenario proves that RS and RA are not independent of each other because they do depend somewhat on the environment they were scored/allowed in. However, I see it as Ciccolella proving that RS and RA are not independent of their environment not necessarily dependent of each other. The nature of baseball doesn't preclude you from scoring runs if the other team is or isn't. Perhaps this study needs to be redone with the "environmental factors" controlled for and it may yield different results.

0 Comments:

Post a Comment

<< Home