Original Stats I've Created

Over the years, I've tried my hand at creating a number of stats in an attempt to help us better understand the players that drive our fantasy baseball games. Some have been quite innovative, others are merely practical, and others have fallen by the wayside, but I thought it would be useful to present them all here, in one place, for easy reference (with links to the original article introducing the stat).

Pitching Stats | Hitting Stats

Pitching Stats

CAPS (Context Adjusted Pitching Statistics): Context is everything. This is perhaps the most underrated, misunderstood, and ignored concepts in player evaluation. CAPS seeks to rectify this. Over the past couple years, nearly all fantasy websites have begun to realize that stats like strikeout rate and walk rate are better indicators of a pitcher's true skill than ERA or WHIP, so CAPS is an attempt to take things a step further.

The context in which a player posts his stats are key. Are they accumulated in the American League or the National League? A "pitcher's" ballpark or a "hitter's" ballpark? Against good batters or poor batters? If a pitcher faces a disproportionate number of Adam Dunn-type hitters, he is going to strike out and walk more batters and allow more home runs than we'd expect. Because a pitcher has no ability to control the batters he faces, though, we can't consider this a repeatable skill and must, therefore, neutralize a pitcher's stat line based on the opposition he faces. The same thing goes for any other number of contextual factors. The last published record of CAPS accounted for:
  1. Past home ballpark
  2. Current home ballpark
  3. Past road ballparks
  4. Current road ballparks
  5. Past quality of opponents (neutralized)
  6. League switch adjustments
  7. Ground balls adjusted for league average line drive rate (xGB)
Since then, I've added adjustments for quality of catcher, quality of umpire, and pitcher role change (starter/reliever). — THT 1/5/09 | THT 1/28/09

xGB% (Expected Groundball Percentage): Ground balls are a better result for a pitcher than a flyball because fly balls become extra base hits ten times more often than groundballs do, and groundballs are almost never home runs. Actual ground ball percentage can be misleading, though, if a pitcher allows or prevents line drives at an abnormal rate. Because line drive percentage is so unstable, xGB% first assumes a league average line drive rate and then recalculates ground ball percentage based on that. — THT 7/17/07

DIPS WHIP: A take-off of LIPS (Luck Independent) ERA, DIPS WHIP tries to neutralize some of the variance inherent in WHIP. DIPS WHIP assumes a league average line drive rate and adjusts a pitcher's other batted ball percentages accordingly. It then assumes a league average rate of hits on each batted ball type, combines these hits with a pitcher's actual walks, and divides by innings pitched to arrive at DIPS WHIP. — THT 7/15/07

xBABIP (Expected Batting Average on Balls in Play): While we often say that a pitcher has little control over his BABIP — and this is true — they do not relinquish all control. Most importantly, we know that a pitcher has a lot of control over his groundballs and flyballs, a good amount of control over his pop-ups, and little control over his line drives. To calculate xBABIP, we first neutralize line-drive rate and adjust the other three rates accordingly (like we do to calculate xGB%). Then we assume a league average rate of hits on all types of batted balls. Add up those hits, and we can calculate an expected BABIP.

What we'll see is that extreme GB pitchers have higher xBABIPs and extreme FB pitchers have lower xBABIPs (while also realizing that guys who induce a lot of pop-ups will have low xBABIPs too). In 2009, for example, GB'er Aaron Cook had a .314 xBABIP while FB'er Jered Weaver had a .291 xBABIP. — THT 1/25/10

xWins (xW): While many fantasy analysts call Wins a fickle stat — and they're right — they aren't wholly unpredictable. Axioms like "don't chase wins" or "draft skills" are thrown around often, and while one can be successful by simply following this advice, I feel as though we can do a little bit better. And if we can do better, why shouldn't we? Essentially, xW uses Bill James's Pythagorean Theorem to estimate the expected number of games a pitcher should have won. Using this formula, I plug in the pitcher's LIPS RA (weighted by his IP per game), his expected Bullpen Support (weighted by the IP the starter doesn't pitch per game), and his expected Run Support. This gives us the number of games the SP's team will win on days he pitches, and from there we calculate the percentage of those games he should get credited for the Win based upon how deep into games he goes (pitchers who last into the eighth inning are far more likely to receive a win than those who only last four or five innings — there's more time for his offense to score runs. The small problem here is that unlucky pitchers won't go as deep into games as they should, and visa-versa for lucky pitchers, but I haven't accounted for this yet). — THT 1/25/10

K/BB RI (Strikeout and Walk Run Impact): K/BB ratio (sometimes called "command") is often used as a measure of a pitcher's success with strikeouts and walks. This is incredibly flawed however, as a strikeout and a walk do not have the same impact on a pitcher's success at preventing runs. As my studies have shown, you can have two pitchers with equal K/BB ratios but wildly differing ERAs. A pitcher with lots of strikeouts and lots of walks will do better than a pitcher with few strikeouts and few walks even if they have the same K/BB ratio.

To calculate K/BB RI, you first determine how many strikeouts, walks, and batted balls occurred per major league game. Next, multiply each event by it's corresponding relative run value, giving you runs per game. For batted balls, you need to back-calculate runs per game using total league average runs per game and runs per game on strikeouts and walks. You also back-calculate batted balls per game by subtracting strikeouts and walks per game from total batters faced per game. Divide runs per game (on batted balls) by batted balls per game to get a relative run value on batted balls.

For each individual pitcher, you then multiply his strikeout, walk, and batted ball per game figures by each event's relative run value. After doing this, you get a runs per game figure for each event. Add them up. Subtract this number from league average runs per game, and you arrive at the impact strikeouts and walks have on a pitcher's runs allowed. — THT 2/14/08

xHR/FB (Expected Home Run per Flyball rate): This is calculated very simply by using park factors. We assume a 50/50 home/road split for the pitcher, a neutral road schedule (HR/FB park factor of 1.00), and account for the pitcher's home ballpark's HR tendencies. It is very important to note that even if a pitcher calls an extreme HR park home, his expected HR/FB will still remain pretty close to neutral. The xHR/FB for Rockies pitchers, for example, was just 12.39 percent in 2009 (with a league average of 11.18 percent).

Analysts often like to credit deviation further from the mean than this to a pitcher's home park, but that simply is not the case (unless the pitcher has thrown a disproportionate number of games at home, and even if he has, that shouldn't be expected to continue going forward). Simply put, HR park factors are not quite as extreme as most seem to believe. — THT 1/25/10

xLOB% (Expected Left on Base Percentage): Of the three main 'luck indicators' (BABIP, HR/FB, LOB%), LOB% has the most room for skill-based variation. This is because LOB% is actually an exponential function. To put it simply, if Pitcher A allows hits at a 24 percent rate and Pitcher B allows hits at a 30 percent rate, once men reach base, more of them will score on Pitcher B because he is more likely to give up hits to begin with. His hits will be clumped closer together. As such, LOB% has a fairly strong relationship with the rate at which batters reach base.

xLOB% is calculated using a regression formula derived from BAA and BB%. Now, of course, BAA is subject to extreme variation since it is largely comprised of BABIP. So instead of using actual BAA, we use xBAA, which accounts for the pitcher's actual K rate (the more Ks, the fewer opportunities for hits) and his xBABIP. What we end up seeing is that good pitchers end up leaving more runners on base (2009 Tim Lincecum: 75.6 percent) while bad pitchers let more score (2009 Jeremy Sowers: 68.1 percent) than league average (2009: 71.9 percent). — THT 1/25/10

R/HR (Runs per Home Run) and xR/HR (Expected Runs per Home Run): HR/FB has become a common stat for measuring a pitcher's luck with home runs, but it doesn't tell us everything. For example, a pitcher can have a seemingly lucky 4 percent HR/FB but could actually have experienced bad luck with HRs if he was unfortunate enough to have given up all of his HRs while the bases are loaded. On average, about 1.4 runs score per HR, but not all pitchers allow them at this rate (some justifiably, some as a result of luck). R/HR tells us how many runs actually scored per home run allowed while xR/HR tells us how many runs should have scored (the process for this is a little complicated, but I'd be happy to explain for anyone interested). — THT 1/25/10

Home Run Runs per Fly Ball (HRR/FB) and Expected Home Run Runs per Fly Ball(xHRR/FB): A mixture of HR/FB and R/HR, HRR/FB tells us how many runs scored on home runs per outfield fly. xHRR/FB, naturally, tells us how many should have scored. You can consider this a super-powered HR/FB since it not only accounts for how many HRs are allowed but also the total damage done by the HRs, which is what truly matters. Ten solo home runs do just as much damage as five two-run homers, which is something HR/FB doesn't capture on its own. — THT 1/25/10

True Quality Starts (TQS): Writers often talk about how inconsistent a pitcher is, how he can look fantastic one start and be a train wreck the next. They talk about how he has great potential and could be a great pitcher if he just puts it all together. TQS was conceived as a way to measure this, to evaluate a pitcher on a start-by-start basis and see if an otherwise unnoteworthy pitcher might have some hidden potential.

To calculate TQS, I first normalize the batted-ball figures to account for disproportionate line drives. After that, I apply relative run values to strikeouts, walks, hit-by-pitches, adjusted ground balls, adjusted fly balls, adjusted pop-ups, and adjusted line drives. I take this figure, multiply by nine, and divide by innings pitched to get a sort of makeshift, one-game expected ERA. I then apply a simple above replacement level measure, calculated as (6 - expected ERA)*IP/9. I call this the TQS score. From here, I take every TQS score for the entire year and find the standard deviation among them. These standard deviations are then used to classify each start as one of the following: great, good, above average, below Average, bad, or awful. — THT 2/20/08

Hitting Stats

CABS (Context Adjusted Batting Statistics): I've never published anything about CABS, but I have developed it behind-the-scenes based on the same principals as CAPS.

True Home Runs (tHR): I hypothesized that when trying to predict home runs, "hitting home runs" isn't the skill that we should be testing for. "Hitting the ball far" and "hitting the ball in a particular direction" should be the skills we test for. Players who hit the ball a long way should also be able to hit the ball a short way. Players who hit a lot of long home runs but don't hit a lot that are just clearing the fence are getting unlucky, while players who don't hit many long ones but who hit a lot that barely clear the fence are getting lucky.

To calculate tHR, every home run is run through Greg Rybarczyk's HitTracker system in 30 environments: every park in the league with average weather for that park. The homers that are given a No Doubt label are counted up and then entered into an exponential regression equation to arrive at the total number of home runs that the hitter should be expected to hit based upon. — THT 7/9/08

Plate Discipline Stats

Based on work by Russell Carleton (aka Pizza Cutter) at MVN's now defunct Statistically Speaking blog, I used signal detection theory to come up with a set of four stats to help measure a hitter's plate discipline.

Judgment: A measure of a hitter's pitch recognition judgment or of "how good a batter is at judging between the pitches at which he should and shouldn't swing." Calculated the same as "sensitivity" in signal detection theory with a correct decision including all in-zone swings. A correct rejection includes all balls. Type I errors include out-of-zone swinging strikes and Type II errors include called strikes. Presented as an index with 100 being average. — THT 9/16/08

Aggressiveness/Passivity Bias (A/P Bias): A/P Bias shows the batter's tendencies when he makes a mistake in Judgement. Is he taking too many pitches or swinging at too many? If a batter is going to make mistakes, these stats show that swinging more will limit strikeouts better than taking too many pitches. — THT 9/16/08

Bat Control: If a batter has a perfect eye but isn't able to take advantage of it by swinging the bat well, what's the point? Bat control is the percentage of balls within the strike zone that the hitters makes contact with (given that he swings). The formula is (in-zone contacted balls)/(in-zone swings).

Since a hitter is swinging, we can assume it's because he believes he can hit the ball (yes, this isn't always true, but it's good enough for our purposes), and a ball within the strike zone is definitely capable of being hit (by focusing on in-zone pitches, we ignore the times where a batter swings and misses on pitches out of the zone. This is because these are more likely to be caused by poor judgment, not poor bat control — the batter shouldn't be swinging at a ball outside the strike zone if he isn't able to hit it).

So the percentage of times he does what he intends to do (make contact) when he should be expected to (when it's in the strike zone), I contend, gives us a good measure of bat control. — THT 9/16/08

Bad Ball Hitting: While bat control measures a batter's ability to hit balls that the rules of baseball say he should be able to (pitches within the strike zone), some hitters are able to hit balls that they really aren't expected to (pitches out of the strike zone). So our second new stat we'll call bad ball hitting (name lifted from Dan Fox's article on plate discipline stats). It is calculated as (out-of-zone contacted balls)/(out-of-zone swings). — THT 9/16/08