Over the years, I've tried my hand at creating a number of stats in an attempt to help us better understand the players that drive our fantasy baseball
games. Some have been quite innovative, others are merely practical, and others have fallen by the wayside, but I thought it would be useful to
present them all here, in one place, for easy reference (with links to the original article introducing the stat).
Pitching Stats | Hitting Stats
Pitching Stats
CAPS (Context Adjusted Pitching Statistics):
Context is everything. This is perhaps the most underrated, misunderstood, and ignored
concepts in player evaluation. CAPS seeks to rectify this. Over the past couple years, nearly all fantasy websites have begun to realize that stats
like strikeout rate and walk rate are better indicators of a pitcher's true skill than ERA or WHIP, so CAPS is an attempt to take things a step
further.
The context in which a player posts his stats are key. Are they accumulated in the American League or the National League? A "pitcher's" ballpark or
a "hitter's" ballpark? Against good batters or poor batters? If a pitcher faces a disproportionate number of Adam Dunn-type hitters, he is going to
strike out and walk more batters and allow more home runs than we'd expect. Because a pitcher has no ability to control the batters he faces, though,
we can't consider this a repeatable skill and must, therefore, neutralize a pitcher's stat line based on the opposition he faces. The same thing goes
for any other number of contextual factors.
The last published record of CAPS accounted for:
- Past home ballpark
- Current home ballpark
- Past road ballparks
- Current road ballparks
- Past quality of opponents (neutralized)
- League switch adjustments
- Ground balls adjusted for league average line drive rate (xGB)
Since then, I've added adjustments for quality of catcher, quality of umpire, and pitcher role change (starter/reliever).
—
THT 1/5/09 |
THT 1/28/09
xGB% (Expected Groundball Percentage): Ground balls are a better result for a pitcher than a flyball because fly balls become extra base hits
ten times more often than groundballs do, and groundballs are almost never home runs. Actual ground ball percentage can be misleading, though, if a
pitcher allows or prevents line drives at an abnormal rate. Because line drive percentage is so unstable, xGB% first assumes a league average line
drive rate and then recalculates ground ball percentage based on that.
—
THT 7/17/07
DIPS WHIP: A take-off of LIPS (Luck Independent) ERA, DIPS WHIP tries to neutralize some of the variance inherent in WHIP. DIPS WHIP
assumes a league average line drive rate and adjusts a pitcher's other batted ball percentages accordingly. It then assumes a league average rate of
hits on each batted ball type, combines these hits with a pitcher's actual walks, and divides by innings pitched to arrive at DIPS WHIP.
—
THT 7/15/07
xBABIP (Expected Batting Average on Balls in Play): While we often say that a pitcher has little control over his BABIP — and this is true
— they do not relinquish all control. Most importantly, we know that a pitcher has a lot of control over his groundballs and flyballs, a good
amount of control over his pop-ups, and little control over his line drives. To calculate xBABIP, we first neutralize line-drive rate and adjust the
other three rates accordingly (like we do to calculate xGB%). Then we assume a league average rate of hits on all types of batted balls. Add up those
hits, and we can calculate an expected BABIP.
What we'll see is that extreme GB pitchers have higher xBABIPs and extreme FB pitchers have lower xBABIPs (while also realizing that guys who induce a
lot of pop-ups will have low xBABIPs too). In 2009, for example, GB'er Aaron Cook had a .314 xBABIP while FB'er Jered Weaver had a .291 xBABIP.
—
THT 1/25/10
xWins (xW): While many fantasy analysts call Wins a fickle stat — and they're right — they aren't wholly unpredictable.
Axioms like "don't chase wins" or "draft skills" are thrown around often, and while one can be successful by simply following this advice, I feel as
though we can do a little bit better. And if we can do better, why shouldn't we?
Essentially, xW uses Bill James's Pythagorean Theorem to estimate the expected number of games a pitcher should have won. Using this formula, I plug
in the pitcher's LIPS RA (weighted by his IP per game), his expected Bullpen Support (weighted by the IP the starter doesn't pitch per game),
and his expected Run Support.
This gives us the number of games the SP's team will win on days he pitches, and from there we calculate the percentage of those games he should get
credited for the Win based upon how deep into games he goes (pitchers who last into the eighth inning are far more likely to receive a win than those
who only last four or five innings — there's more time for his offense to score runs. The small problem here is that unlucky pitchers won't go
as deep into games as they should, and visa-versa for lucky pitchers, but I haven't accounted for this yet).
—
THT 1/25/10
K/BB RI (Strikeout and Walk Run Impact): K/BB ratio (sometimes called "command") is often used as a measure of a pitcher's success
with strikeouts and walks. This is incredibly flawed however, as a strikeout and a walk do not have the same impact on a pitcher's success at
preventing runs. As my
studies have shown, you can have two pitchers with equal K/BB ratios but wildly differing ERAs. A pitcher with lots of strikeouts and lots of walks
will do better than a pitcher with few strikeouts and few walks even if they have the same K/BB ratio.
To calculate K/BB RI, you first determine how many strikeouts, walks, and batted balls occurred per major league game. Next, multiply each event by
it's corresponding relative run value, giving you runs per game. For batted balls, you need to back-calculate runs per game using total league average
runs per game and runs per game on strikeouts and walks. You also back-calculate batted balls per game by subtracting strikeouts and walks per game
from total batters faced per game. Divide runs per game (on batted balls) by batted balls per game to get a relative run value on batted balls.
For each individual pitcher, you then multiply his strikeout, walk, and batted ball per game figures by each event's relative run value. After doing
this, you get a runs per game figure for each event. Add them up. Subtract this number from league average runs per game, and you arrive at the impact
strikeouts and walks have on a pitcher's runs allowed.
—
THT 2/14/08
xHR/FB (Expected Home Run per Flyball rate): This is calculated very simply by using park factors. We assume a 50/50 home/road split for the
pitcher, a neutral road schedule (HR/FB park factor of 1.00), and account for the pitcher's home ballpark's HR tendencies. It is very important to note
that even if a pitcher calls an extreme HR park home, his expected HR/FB will still remain pretty close to neutral. The xHR/FB for Rockies pitchers,
for example, was just 12.39 percent in 2009 (with a league average of 11.18 percent).
Analysts often like to credit deviation further from the mean than this to a pitcher's home park, but that simply is not the case (unless the pitcher
has thrown a disproportionate number of games at home, and even if he has, that shouldn't be expected to continue going forward). Simply put, HR park
factors are not quite as extreme as most seem to believe.
—
THT 1/25/10
xLOB% (Expected Left on Base Percentage): Of the three main 'luck indicators' (BABIP, HR/FB, LOB%), LOB% has the most room for skill-based
variation. This is because LOB% is actually an exponential function. To put it simply, if Pitcher A allows hits at a 24 percent rate and Pitcher B
allows hits at a 30 percent rate, once men reach base, more of them will score on Pitcher B because he is more likely to give up hits to begin with.
His hits will be clumped closer together. As such, LOB% has a fairly strong relationship with the rate at which batters reach base.
xLOB% is calculated using a regression formula derived from BAA and BB%. Now, of course, BAA is subject to extreme variation since it is largely
comprised of BABIP. So instead of using actual BAA, we use xBAA, which accounts for the pitcher's actual K rate (the more Ks, the fewer opportunities
for hits) and his xBABIP. What we end up seeing is that good pitchers end up leaving more runners on base (2009 Tim Lincecum: 75.6 percent) while bad
pitchers let more score (2009 Jeremy Sowers: 68.1 percent) than league average (2009: 71.9 percent).
—
THT 1/25/10
R/HR (Runs per Home Run) and
xR/HR (Expected Runs per Home Run): HR/FB has become a common stat for measuring a pitcher's luck with home
runs, but it doesn't tell us everything. For example, a pitcher can have a seemingly lucky 4 percent HR/FB but could actually have experienced bad luck
with HRs if he was unfortunate enough to have given up all of his HRs while the bases are loaded. On average, about 1.4 runs score per HR, but not all
pitchers allow them at this rate (some justifiably, some as a result of luck). R/HR tells us how many runs actually scored per home run allowed while
xR/HR tells us how many runs should have scored (the process for this is a little complicated, but I'd be happy to explain for anyone interested).
—
THT 1/25/10
Home Run Runs per Fly Ball (HRR/FB) and
Expected Home Run Runs per Fly Ball(xHRR/FB): A mixture of HR/FB and R/HR, HRR/FB tells us how
many runs scored on home runs per outfield fly. xHRR/FB, naturally, tells us how many should have scored. You can consider this a super-powered HR/FB
since it not only accounts for how many HRs are allowed but also the total damage done by the HRs, which is what truly matters. Ten solo home runs
do just as much damage as five two-run homers, which is something HR/FB doesn't capture on its own.
—
THT 1/25/10
True Quality Starts (TQS): Writers often talk about how inconsistent a pitcher is, how he can look fantastic one start and be a train wreck the
next. They talk about how he has great potential and could be a great pitcher if he just puts it all together. TQS was conceived as a way to measure
this, to evaluate a pitcher on a start-by-start basis and see if an otherwise unnoteworthy pitcher might have some hidden potential.
To calculate TQS, I first normalize the batted-ball figures to account for disproportionate line drives. After that, I apply relative run values to
strikeouts, walks, hit-by-pitches, adjusted ground balls, adjusted fly balls, adjusted pop-ups, and adjusted line drives. I take this figure,
multiply by nine, and divide by innings pitched to get a sort of makeshift, one-game expected ERA. I then apply a simple above replacement level
measure, calculated as (6 - expected ERA)*IP/9. I call this the TQS score. From here, I take every TQS score for the entire year and find the standard deviation among them. These standard deviations are then used to classify
each start as one of the following: great, good, above average, below Average, bad, or awful.
—
THT 2/20/08
Hitting Stats
CABS (Context Adjusted Batting Statistics): I've never published anything about CABS, but I have developed it behind-the-scenes based on the
same principals as CAPS.
True Home Runs (tHR): I hypothesized that when trying to predict home runs, "hitting home runs" isn't the skill that we should be testing for.
"Hitting the ball far" and "hitting the ball in a particular direction" should be the skills we test for. Players who hit the ball a long way should
also be able to hit the ball a short way. Players who hit a lot of long home runs but don't hit a lot that are just clearing the fence are getting
unlucky, while players who don't hit many long ones but who hit a lot that barely clear the fence are getting lucky.
To calculate tHR, every home run is run through Greg Rybarczyk's
HitTracker system in 30
environments: every park in the league with average weather for that park. The homers that are given a No Doubt label are counted up and then entered
into an exponential regression equation to arrive at the total number of home runs that the hitter should be expected to hit based upon.
—
THT 7/9/08
Plate Discipline Stats
Based on work by Russell Carleton (aka Pizza Cutter) at MVN's now defunct Statistically Speaking blog, I used signal detection theory to come up
with a set of four stats to help measure a hitter's plate discipline.
Judgment: A measure of a hitter's pitch recognition judgment or of "how good a batter is at judging between the pitches at which he should and
shouldn't swing." Calculated the same as "sensitivity" in signal detection theory with a correct decision including all in-zone swings. A correct
rejection includes all balls. Type I errors include out-of-zone swinging strikes and Type II errors include called strikes. Presented as an index with
100 being average.
—
THT 9/16/08
Aggressiveness/Passivity Bias (A/P Bias): A/P Bias shows the batter's tendencies when he makes a mistake in Judgement. Is he taking too many pitches or
swinging at too many? If a batter is going to make mistakes, these stats show that swinging more will limit strikeouts better than taking too many
pitches.
—
THT 9/16/08
Bat Control: If a batter has a perfect eye but isn't able to take advantage of it by swinging the bat well, what's the point? Bat control is
the percentage of balls within the strike zone that the hitters makes contact with (given that he swings). The formula is
(in-zone contacted balls)/(in-zone swings).
Since a hitter is swinging, we can assume it's because he believes he can hit the ball (yes, this isn't always true, but it's good enough for our
purposes), and a ball within the strike zone is definitely capable of being hit (by focusing on in-zone pitches, we ignore the times where a batter
swings and misses on pitches out of the zone. This is because these are more likely to be caused by poor judgment, not poor bat control —
the batter shouldn't be swinging at a ball outside the strike zone if he isn't able to hit it).
So the percentage of times he does what he intends to do (make contact) when he should be expected to (when it's in the strike zone), I contend,
gives us a good measure of bat control.
—
THT 9/16/08
Bad Ball Hitting: While bat control measures a batter's ability to hit balls that the rules of baseball say he should be able to
(pitches within the strike zone), some hitters are able to hit balls that they really aren't expected to (pitches out of the strike zone).
So our second new stat we'll call bad ball hitting (name lifted from
Dan Fox's article on plate discipline stats).
It is calculated as (out-of-zone contacted balls)/(out-of-zone swings).
—
THT 9/16/08