Friday, February 16, 2007

Alternative Hall of Fame -- Methods (Sim Score)

The first item on my grade sheet is for Similarity Score. This is a Bill James creation from his book "Whatever happened to the Hall of Fame." It takes two players' career stats in a few categories, compares the differences, and assigns a value to those differences.

The stat categories for hitters are: games, at bats, runs, hits, doubles, triples, home runs, RBI, walks, strikeouts, stolen basis, batting average and slugging percentage. You start with 1,000 points, and then subtract based on the differences between two players. For example, for each .001 difference between two players' batting averages, you subtract 1 point. For each difference of 4 triples, you subtract one point, and so on. There's also a positional adjustment, on the theory that shortstops (for example) are a lot different than DHs, but not that much different than second basemen.

The method is the same for pitchers, and the categories are: wins, losses, win percentage, ERA, games, starts, complete games, innings pitched, hits allowed, strikeouts, walks, shutouts and saves. There's an adjustment based on which hand they throw with, and whether they are starters or relievers.

The idea is that if the player being evaluated has a high similarity score with other players who are worthy of induction, then the player being evaluated may be worthy of induction. You can find Sim Scores for every player on Baseball Reference.com. James invented it only as a fun little test.

From an evaluation standpoint, it has three principal flaws, in my opinion. First, the career stats are not park-adjusted or era-adjusted. I have a partial remedy that, described below. Second, it uses only career stats, so very good players with long careers may be similar to awesome players with shorter careers. Third, it ignores defense, although can take into account a position value. My remedy for the second and third flaws is to give Sim Scores grades only 1/2 the weight of a normal measure.

My remedy for the first flaw is more complex. I take the career numbers after the "normalization" process described in another post on this blog, and enter them into the spreadsheet that does the Sim Score calculations. Accordingly, a player's Sim Scores are based on park-adjusted normalized career stats.

How does this produce a "grade"? I find the 10 most similar players to the one being evaluated. For each one that is HoF-quality, I assign 4 points to those with a score of 950 or better (James called this ("unusually similar"); 3 points if the score is 900 or better ("truly similar"); 2 points if the score is 850 or better ("essentially similar"); and 1 point if the score is 800 or better ("somewhat similar"). I also give 3 points if there are no unusually similar players. For instance, no one is truly similar to Babe Ruth, but he should not get a lower grade for being better than everyone else.

If a similar player is no HoF-quality, there are no points given. How do I determine HoF-quality players? It is subjective. I count everyone in the HoF, even if I think their election was a mistake. I count everyone in the Hall of Merit, even if I think their election was a mistake. And I count everyone I've put in my PHoM, in which I make no mistakes. :)

The most points anyone could get would be 40. That's 10 unusually similar HoF-quality players. No one has 40 points. I give an A for 16 points or better, a B for 12 points or better, a C for 8 points or better and a D for 4 points or better. Everything else is an F.

One more thing: I do not use the positional adjustment, except for those players on the left end of the defensive spectrum: catchers, shortstops, second basemen and third basemen.

As stated earlier, the Sim Score grade gets 1/2 the normal weight in the GPA.