Saturday, July 05, 2008

Predicting Pitchers

To give an idea how difficult it is to nail a pitcher's performance on any given night with a game simulator, let's look at Carlos Zambrano. His line last night:

Actual 6.0 IP, 4H, 0Hr, 2Bb, 5K, 0Er
Predic 6.7 IP, 6H, 0Hr, 2Bb 4K, 2Er

Not too bad from the predictor, but obviously not perfect. I looked at the sim and calculated the odds of getting within one measure of the actual performance in each category. So, I calculated the odds of predicting between 5.7 and 6.3 IP, 3-5 hits, 1-3 walks, 4-6 strikeouts and 0-1 earned runs. For homers, I calculated the percentage for the exact number. Here are the odds of me coming that close with a 500 game sim, for each of the categories:

IP: 29%
H: 33%
HR: 52%
BB: 64%
SO: 49%
ER: 33%

Home runs I have a decent shot at, as rarely are there more than 2 HR in a game, so predicting that a good pitcher like Zambrano will have 0 is a even proposition. Walk and strikeout rates are a little easier too. But I have only about a 1/3 chance of getting within 1/3 of an inning or within 1 hit, or 1 earned run, of the actual performance.

Those are the stats in isolation. What about combining them? The simulator had Zambrano with between 5.7-6.3 IP and 3-5 hits, only 8% of the time. The simulator had Zambrano with between 5.7-6.3 IP and 0-1 earned runs, only 10% of the time.

And the big one: the simulator had Zambrano with between 5.7-6.3 IP and 3-5 hits and 0-1 earned runs, only 5% of the time. It found the exact combination of IP(6), Hits(4) and Earned Runs(0) only once out of 500 simulations. In that instance, Zambrano gave up 0 homers (accurate), 4 walks (off by 2) and 6 strikeouts (off by 1).

That's how hard it is to predict the performance.

Which got me thinking about whether I should bother. Perhaps I should simply predict the odds of being excellent, good, average, below average and poor. Or, simpler still, the odds of being above average and below average. The question is how to measure those. I could do so with game scores.

Zambrano's game score for last night was 67. The simulator only predicted 9 out of 500 games to have that exact game score. But only 20% of the performances in the majors have game scores 65 or higher, and the sim found 25% of Zambrano's game to be 65 or higher, 21% to be 56-64, 20% to be 47-55, 19% to be 36-46 and 15% to be 35 or lower.

Perhaps that's more useful than the other stats. Or the above average, below average moniker: His odds of a top 40% performance (game score 56 or higher) were 46%, and his odds of a bottom 40% performance 46 or lower were 34%.

Game score is imperfect, though. A pitcher can mitigate giving up runs by striking out a lot of hitters.