Thursday, April 8, 2010

Net Runs, Part 1: An improvement on evaluating MLB individual contributions

Similar to the RE24 stat created by David Appelman and now used on Baseball Reference and Fangraphs, Net Runs uses park adjusted base-out run expectancy (based on a composite of 2005-2009 MLB data) before and after plays to weigh a player's individual impact on his team.

There is one big difference. While Net Runs does typically credit the hitter in full for outcomes like RE24, and credits the baserunner for stolen bases... Net Runs does take the extra step of crediting defenders through evaluation of the game logs and MLB Gameday field data, as well as contextually crediting baserunners for situations where they take an extra base on a base hit. Errors are also not (erroneously) credited to the hitter in Net Runs: Those are chalked up under a "Luck" category that will be explained later. Pitchers are credited with an equivalent run value by batted ball type (line drives, groundballs and fly balls), while the defense receives credit for the difference.

Which defender receives credit depends on where the ball was hit and what type of ball was hit. For example, a flyball out to LF is easily credited to the LF, but a groundball single to LF is not, since the players most likely to convert an out from such a ball were of course the SS and 3B. Which one gets credit depends on the proximity of each player to the ball: Closer to the middle of the infield, and the SS gets credit. Closer to the LF line, and the 3B gets credit. It obviously gets fuzzy in the middle of the two, but we'll evaluate how to objectively approach that in time.

This is an admittedly subjective exercise, and one reason why Appleman's method (and other methods such as win probability added calculations on both aforementioned stat sites) stick to simply crediting the hitter and pitcher for all outcomes. Net Runs requires observational analysis of some plays to determine who gets credit for what, and many sabermetricians shy away from inserting subjectivity into what they prefer to be a completely automated (and thus cleanly objective) process. There are as many as 15 games a day in MLB, dozens more in the minors, and manually sifting through each one could take a while. In the case of the minors, play by play data lacks MLB Gameday level detail, and it can be difficult to determine, for example, which infielder let that groundball single to LF get by him.

The obvious disadvantage with the RE24 and WPA method is that the data for players is largely inaccurate, since most of their outcomes are in some significant part out of each respective individual's control. But the method they use is much simpler and for their purposes objective.

At the expense of added effort, Net Runs is designed to bridge the gap and offer a better picture of how much a defense contributes, how much a baserunner contributes, how much luck and the park contributes, etc. There are some simplicities that make it an incomplete yardstick: Hitters typically receive full credit for the outcome of a batted ball regardless, the catcher receives credit on all stolen base attempts while ignoring the pitcher's impact (some pitchers, like San Diego's Chris Young, are easier to steal on, which means Padres catchers would be somewhat unfairly penalized), and blown umpire calls are ignored. But for the most part, it takes a closer look than any of the methods at how much a player contributes to his team in a game, at the plate, in the field, on the mound and on the basepaths. It also accounts for luck, managerial play calls and the park.

The method itself is a convoluted exercise, one that proves fairly simple once you learn the mechanics and the rules... though it's a bit time consuming. This will be discussed starting in Part 2.

No comments:

Post a Comment