## Sunday, February 14, 2010

### Run values by ball and strike count, and outcome

While continuing work on my projection methods, I stumbled upon Baseball Reference's stats by ball and strike count. Linked are 2009's breakdown for the AL and NL.

I suddenly had an idea to try and estimate the run values of every pitch. Sabes are exploring at length the idea of using Pitch F/X to extrapolate the value of curveballs, fastballs, etc. But I've elected to take a simpler approach: Who cares what pitch is used? What should ultimately matter is that the pitch thrown in a particular count results in a particular different count or an outcome. Whether that new count resulted from the quality of the pitch or if it resulted from fooling or not fooling the batter can be subject to all the debate you wish. But ultimately, the pitch worked or it didn't. The pitcher got farther ahead or the hitter, he fell farther behind, the pitcher got beat or he beat the hitter.

I decided to try and estimate the run expectancy of a given count. One way to do this is to explore every known instance of each count using game logs, extrapolate the outcome and crunch an average run expectancy, but since I'd rather not spend the rest of my life locked in a room crunching game logs, I'd rather estimate. And methods exist to approximate the run value of different situations given historical outcomes.

As Dan Fox outlined in 2006, Pete Palmer and Dick Cramer created the Batter's Run Average stat (BRA) in 1974, the precursor to what eventually became OPS. By multiplying a batter's OBP and SLG together, they could get a rough estimate of how many runs the batter averaged per plate appearance. For example, a player averaging 250/300/400 averages 0.12 runs per plate appearance. Over 600 plate appearances in a season, that player would expect to produce 72 runs as a hitter.

Of course, as Fox notes in his piece, BRA ignores defense and lacks some of the advantages of OPS, like its relative ease to calculate and park adjust. I wouldn't necessarily use BRA to blindly estimate a player's production, unless I park adjusted his raw stats first, used approximated stats from batted ball rates, and I lacked better methods otherwise to project said player.

However, BRA can have value in estimating the run value of ball and strike counts for an entire league. Since I'd use the raw stats for the entire league, there's no need to park adjust the raw numbers (though you might want to park adjust the final numbers). And by using raw stats over multiple years, I should have a large sample that should all but erase the effects of variance.

So by taking the OBP and SLG for each individual count, I took the 2009 stats for each league and approximated the expected run value for an average hitter in each given count. I took not the stats for a ball put into play on a given count, but the stats for an hitter that ended up in a given count regardless of when the eventual outcome occurred. Below are the BRA's for each count by league in 2009:

2009 AL run expectancy by pitch per BRA

Entering PA - 0.144
1-0 - .184
2-0 - .265
3-0 - .384
0-1 - .101
1-1 - .128
2-1 - .174
3-1 - .296
0-2 - .054
1-2 - .068
2-2 - .096
3-2 - .178

2009 NL run expectancy by pitch per BRA

Entering PA, position players - .142
As pitcher - .032
1-0 - .176
2-0 - .258
3-0 - .365
0-1 - .094
1-1 - .119
2-1 - .172
3-1 - .292
0-2 - .051
1-2 - .065
2-2 - .095
3-2 - .181

Notice that for the NL I separated position players and pitchers, since pitchers in general are very poor hitters and their low hitting numbers skew the NL hitters' data. However, the pitchers' data is included in the ball/strike data since I could not parse pitchers from that data, and as a result the NL's BRA by count does trend slightly lower in most counts, but not by a large amount.

Never the less, from all this you could follow each pitch for a pitcher and, after each pitch, estimate a marginal run value.

What happens once the ball's put in play or the plate appearance otherwise concludes? Using BRA and the outcomes of groundballs, flyballs and line drives over the last five seasons (2005-2009), I also estimated the run values via BRA of balls put into play by type. However, Baseball Reference's flyball data includes all flyballs, whether pop flies, home runs or flyballs into the outfield, even though all three have very different contexts. A pop fly would stand to have nearly no run value since they are caught for outs 98-99% of the time.

A home run would of course be an automatic run or runs: However, BRA's estimated value of 4.0 runs (the 1.000 OBP times the 4.000 SLG for a home run) isn't accurate since it's not going to score 4 runs every time you hit one unless in the unlikely event you hit nothing but grand slams. Likewise, a walk or HBP's estimated BRA of 0.0 (1.000 OBP x .000 SLG) isn't accurate since many free passes will eventually result in runs.

Using old data (1974-1990), Tom Tango estimated the average run value of a home run at 1.402 and though more current data has likely updated that number... a simple, obvious and more accurate way to contextually assess the value of a given home run would be to simply count the number of runs each individual homer scores. If a home run's hit with two men on, simply count it as 3.0 runs, since after all that's how many it's scored. Run values for all other events are estimated because we aren't certain at the time of the outcome, but a home run's outcome is absolutely certain when it happens.

Run values for walks, hit by pitches, strikeouts, balks and other events that move runners can be devised using a reliable run expectancy chart or matrix (Tom Tango has a nice example of one using data from 1999-2002). You simply take the run expectancy going into a situation, find the run expectancy of the new situation and find the marginal difference.

With all this in mind, here is how you can devise the approximate run values for given outcomes.

Line drive - .735
groundball - .063
in-play flyball - .053
pop fly - .000(5)
home run - Number of runs actually scored
Walk/HBP/other outcomes - The net difference in run expectancy per run expectancy matrix

Using all this, you could graph over the course of a game the run expectancy of a pitcher's performance, and over time gauge the net run value added or prevented over a game and even a season. You can also do this for hitters. Obviously the ball/strike count expectancies wouldn't matter in a player's net run value produced once a plate appearance concludes, but it can help if you were going to graph a player's plate appearances in logging how each pitch improved or decreased the player's run expectancy.

And obviously, this would mean tracking and perusing the game logs and maybe even... heaven forbid... watching the games, instead of just glancing at the cumulative stats and crunching estimates, to get all this data. But as baseball research develops, we ought to get used to taking closer looks at the game, rather than finding excuses to make guesstimates by tracking it from a distance.