Monday, March 1, 2010

On the idea behind my past and present research on batted ball rates

I started late last year with tracking batted ball rates for Mariners minor leaguers, then comparing them against MLB league average batted ball AVG/OBP/SLG rates to get a composite average, or how each player would have hit if his respective batted balls produced in line with league averages. I kept walks, strikeouts and other non-in-play outcomes constant. I devised the same composite AVG/OBP/SLG numbers for hitters and pitchers, since I was measuring the same rate stats. By MLE-adjusting the numbers, I could get a general sense of the progress a player made as he advanced through the system.

The idea was to get away from counting stats (singles, doubles, triples) in analyzing and projecting players, and look more towards stats that trended towards certain outcomes. Groundballs go for hits at a better rate than flyballs (.242 composite AVG over 2005-2009) but come with no power (.262 SLG). Flyballs go for hits the least (.223) but lead to the most isolated power (.594) and of course home runs... though that data includes home runs, which can't be fielded, and pop flies, which are almost always outs (.020 AVG). Line drives go for hits an amazing 73-74% of the time (.736), though they come with less isolated power than flyballs (1.004 SLG).

Dan Fox had some old batted ball data in 2007 that I used at the time, as it was the most reliable data I could get my hands on. I got as far as napkin adjusted numbers for several Mariners farm teams, but never much beyond that. Ultimately, I realized that, since Fox's data was outdated by several years, it wouldn't do me much good to continue until I had some present-day data.

Since the saber-community focuses their data on spitting out a single raw number centered around runs, and many of their formulas are built around the counting stats, not a whole lot of data was available on batted ball-in-play averages and slugging. According to Baseball Prospectus annual data from net run expectancy (which I pulled from their past three annual publications), MLB run averages per batted ball are typically 0.19 runs per flyball, 0.39 runs per line drive and 0.04 runs per groundball. But there's no data as to the breakdown by average and slugging, thus extrapolating an expected slash from that data is a fool's errand until you can find averages for each.

That is, until Baseball Reference began including GB/FB/LD splits in their recent year to year MLB league split data (here is the AL's 2009 breakdown). This allowed me to finally take a recent five year split of data and create a composite AVG/SLG for each batted ball based on actual MLB data... which is how I got the numbers parenthetically referenced above. As with the composite run expectancy matrix, the numbers are taken into account like so:

2009: 50%
2008: 30%
2007: 10%
2006: 6%
2005: 4%

I used the past five years to ensure a sufficiently sizable sample and minimize variance, while properly giving greater weight to recent data and lesser weight to data from the more distant past. This was especially important in light of the recent rollout of some new ballparks (Citi Field, Nationals Park) plus a slight but noticeable decline in offensive numbers over the last 2-3 years. That said, numbers consistently correlated across seasons and I'm sure the league average batted ball data is reliably consistent.

