Saturday, October 06, 2007

The Curious Incident of the Snakes in the Night-time

How does Arizona do it? If you know about Bill James' so-called Pythagorean Theorem, you know that if you square a team's runs scored, and divide it by the square of their runs scored + the square of their runs allowed, you get their winning percentage. The formula usually is within 2 or 3 wins of the actual number, and more often than not is dead-on or 1 win off. The formula improves if you use an exponent of 1.83.

There's also Pythagenport, which is the same thing except it uses a custom exponent based on the league run scoring during that particular year. The custom exponent in the 2007 NL is 1.895.

The Diamondbacks scored only 712 runs this year, and gave up 732. You don't need the exponents to figure out this should equate to a record below .500. To be exact, .4869, or 79 wins. The D-backs actually won 90 games.

You could just say that Pythagenport doesn't work. But it does. The standard error is about three wins, not 11. The formula only breaks down in seasons where total runs per game is less than 4 or greater than 30. This year it was 9.42 in the NL.

So what explains this? It could be a weird distribution of runs. For instance, if the D-backs got blown out in several games, their runs allowed would be disproportionately high.

I searched the D-backs games for games they lost by more than 5 runs. There were 21 such games. Without looking at every other team, it's hard to know if that's a lot. But we also have to see how many games the D-backs won by more than 5 runs, as an offset. There were 15 such games. The runs scored and runs allowed in those 36 games were: 183-247, which would be a .362 winning percentage. That's a pretty bad record in extreme games.

Computing Pythagenport without those runs, you get a winning percentage of .541, which over a 162 game season is 88 wins. That's within the Pythagenport standard error for the D-backs actual wins (90).

So the 36 extreme games, particularly the losses, seem to explain the discrepancy. We need to compare it to a couple of other teams, though, to find out whether every team has the same number of extreme games.

I'm going to use the Cardinals, who outperformed their Pythagenport by 7 games. They had 31 extreme games on the loss side, and 15 on the win side. Their RS and RA for those games: 218-364. No wonder Walt Jockety left the Cards. That's a Pythagenport of .275.

Computing Pythagenport without those runs, you get a winning percentage of .541 (same as the D-Backs), which over a 162 game season is 88 wins. The Cards actually won 78, so the error is actually worse if you subtract the extreme games.

That tells me that the extreme game distribution doesn't explain everything for the D-Backs. It's a mystery. However, the large number of extreme games with the Cards and D-backs may simply throw Pythagenport into disarray.

I want to run the same numbers with a team whose record is nailed by Pythagenport. I'll use the Dodgers, who scored 735 and allowed 727, for a predicted and actual record of 82-80.

They had only 14 extreme games on the loss side, and 17 on the win side. That's pretty balanced. Their RS and RA for those games: 195-177. That's a Pythagenport of .546.

Computing Pythagenport for the Dodgers without those runs, you get a winning percentage of .491, or 80 wins, which is only two off the predicted and actual wins. That's within the Pythagenport standard error.

Without running all the teams, I can't say anything definitively, but it is pretty clear that lots of extreme games, especially if there is disparity in the wins and losses in those games, throws off Pythagenport significantly.

(Oh, and if you are wondering about the title of the post, this is a good book).