[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

<< Back to main

Wednesday, March 11, 2015

cFIP

By Tangotiger 01:28 PM

?Jonathan Judge was kind enough to send me his original article several weeks back.  He was also kind enough to include a couple of suggestions, notably kwERA.  The whole research is very interesting.  My takeaway though is a bit different from his: given 170 batters faced, just doing K minus BB per PA is about as good as anything you can create. 

(I have to question some of the correlation results he got with the "all" pitchers insofar as kwERA is concerned.  It just doesn't seem possible to get some of those results.  However, I will concede that something like cFIP would be better for those pitchers who had fewer than 40 innings.)

I had a back and forth with Jonathan on this:

Thus, if we are looking for an accurate estimator of pitcher ability, what we should be considering is not how the estimator predicts future run expectancy, but how the estimator correlates with itself in consecutive seasons.

I told him he was wrong.  He felt fairly strongly he was right.  However, I do agree with him that a stat that does both well in terms of "descriptive" and "predictive" would make the overall point moot.  Anyway, the point I was making is that you have to correlate to what you care about, which in this case is runs.  It's (mostly) irrelevant that runs is a pitcher+fielder outcome. 

Anyway, it's terrific research, and I'm glad that Jonathan spent as much time as he did in doing the research and presenting the work.  He was very enjoyable to communicate with, even if my style could have easily turned him off.  So, I thank him as well for continuing the dialogue.


#1    Tangotiger 2015/03/11 (Wed) @ 13:38

This was the “pre-thread” I had on the topic of self-correlation:

http://tangotiger.com/index.php/site/comments/do-we-care-about-a-metric-correlating-with-itself-or-with-the-thing-were-ac


#2    skyjo 2015/03/11 (Wed) @ 14:22

Great minds! Let the record show I asked about this in the comments of THT article (as jojo).


#3    Cyril Morong 2015/03/11 (Wed) @ 15:31

K minus BB per PA

are IBBs taken out of BBs and PA? What about SH?

Have you ranked pitchers in how well they do in this stat?


#4    Peter Jensen 2015/03/11 (Wed) @ 15:37

I liked this paragraph in the article:

However, none of these metrics is able to consider the context of each underlying event. They don’t account for each batter the pitcher faced, the number of times the pitcher faced that batter over a season, the catcher to whom the pitcher threw, or the umpire behind the plate. They also don’t consider how each event was affected by the stadium in which it occurred, the handedness of the pitcher and the batter, or the effect of home-team advantage. Nor do they account for a pitcher throwing in a loaded division, as opposed to a pitcher running up his stats against lesser competition. This both limits their overall effectiveness and, in particular, their usefulness with smaller sample sizes.

I have thought for a long time that 35 starts a year is not enough to randomize the factors specific to that start that would affect a pitcher’s performance; such as a single plate umpire, a specific park, a specific opponent, specific environmental conditions, and being either at home or away.  So that was a good start and choosing a mixed model metric also seemed appropriate.  But I was a little underwhelmed at the results.

First, I totally agree with jojo (skyjo) and Tango that the whole correlation of a metric to itself is pretty much meaningless as a measure of the metric’s ability to capture true talent, or any other value of the metric for that matter.  The only chart that held any meaning for me was the one that compared the different metrics ability to predict RE24 in year plus 1.  cFIP was better than other metrics at doing so, but not by a very substantial amount over SIERA or kwFIP.  Even this small advantage over kwFIP may be an artifact due to cFIP and RE24 both being park adjusted and kwFIP not.  Of course, the year plus 1 RE24 results are going to have a different set of game level factors that will cause the pitcher’s performance to vary from true talent as much as the same factors did in year 0.  This is the tantalizing reason for wanting to compare a metric to itself as a test.  But is wrong to do so.  The correct method is to feed the specific factor values for year plus 1 back into your metric and then see how well the metric predicts RE24 in year plus 1.

So what I am left with is either Jonathan and I were both wrong and there is little advantage gained in adjusting for game specific factors, or the mixed model approach is not the best method for adjusting for those factors, or Jonathan’s particular application of his mixed model was flawed.


#5    Tangotiger 2015/03/11 (Wed) @ 16:31

I just want to highlight Peter’s excellent description:

The only chart that held any meaning for me was the one that compared the different metrics ability to predict RE24 in year plus 1. ... Even this small advantage over kwFIP may be an artifact due to cFIP and RE24 both being park adjusted and kwFIP not. ... This is the tantalizing reason for wanting to compare a metric to itself as a test.  But is wrong to do so.  The correct method is to feed the specific factor values for year plus 1 back into your metric and then see how well the metric predicts RE24 in year plus 1.

Btw, while I do call it kwERA, kwFIP is actually more appropriate.

It’s similar to the way I have bbFIP (batted ball FIP).  They are all part of the FIP family.


#6    MGL 2015/03/12 (Thu) @ 00:53

So they are adjusting pitcher performance for opponent, park and umpire? Yawn. Of course that isolates pitcher talent more than not doing so. It will also predict future performance a little better, more so for players who switch teams/parks, for obvious reasons. It is also much better for smaller samples, also for obvious reasons (context tends to “even out” in larger samples).

I agree with Jared that if the model already incorporates a regression (shrinking of outliers), then, again, of course it will do MUCH better than metrics that don’t.

So cFIP is FIP turned into a projection? I think forecasters like Jared, Brian, and myself (among others) have been doing that for many years.

And correlating a metric on itself from year to year? That is ONE measure of a good metric (although the limit on that is going to be a function of the random nature of what is being measured - the noise or random variance), but certainly not the whole picture. It also needs to measure what you want it to measure. I can come up with a defensive metric which has a great y-t-y correlation (it is reliable) but is lousy at measuring fielders’ impact on preventing runs (it is not very accurate). I mean that is basic STATS 101, right, reliability versus accuracy?

Also, I am skeptical of some of the huge differences between FIP and xFIP, like with Sabathia, Frieri, and Anderson. I mean, can context make that much of a difference in 50 innings? Perhaps. It is certainly not surprising that each of those pitchers has only 40 innings or so. The fewer the innings, the more context matters. Then again, no one is going to put much weight on a simple stat like FIP or ERA in only 40 innings anyway.

FWIW, my own context-neutral stat which I call SSRATE (scaled to 4.00 for a league average pitcher), has Anderson at 2.65, Frieri at 7.20 and Sabathia at 5.91. Those would be around 66, 180, and 148 in ERA-, which is not even remotely close to the cFIP numbers.

I did not follow their model closely. Does it adjust for defense and catcher (framing, game calling and pitch blocking)? Those are critical if you want to neutralize a pitcher. The 3 things most important in neutralization are park, catcher, and defense. Then comes batter (including batted handedness) and umpire.

One problematic thing is adjusting for batter handedness. Do they do that across the board for pitchers? If they do, that would (unfairly?) hurt LOOGYs and ROOGY’s. The only reason most of these guys are pitching in MLB is because they can get out same-side hitters (they have large platoon splits). Thus, they usually only face around 50% opp side batters. If you adjust these pitchers for batter handedness, of course they will look a lot worse. But in evaluating a LOOGY or ROOGY, you want to assume the same or a similar distribution of batters faced in the future as in the past, hand-wise.


#7    mkt 2015/03/12 (Thu) @ 04:15

6/: “I mean that is basic STATS 101, right, reliability versus accuracy?”

Yup.  Although IME it’s typically not covered in intro stats courses.  But it typically is covered in psychology methods courses (psychometrics), measurement theory courses, perhaps sociometrics courses, etc. and it’s indirectly covered in econometrics courses.

But a decent number of electrons could be spared if people realized that statisticians, psychologists, etc. have been studying these issues for decades and learned what they have learned.  So Peter Jensen goes too far when in 4/ by saying:

“the whole correlation of a metric to itself is pretty much meaningless as a measure of the metric’s ability to capture true talent, or any other value of the metric for that matter.”

It is true that reliability, by itself, tells you nothing about how good a metric is.  A stopped clock will tell you the time with very high correlation to its other time measures, but is useless for actually finding out what time it is (with the exception of two minutes each day).

But reliability is nonetheless a desirable quality for a metric to have.  If we had a measure which really does on average accurately measure what it purports to measure, but which has low reliability, then that measure is still lacking an important quality.  It means that we need a lot more observations to get reliable measurements.  So reliability does matter.


#8    MGL 2015/03/12 (Thu) @ 15:27

#7, exactly. You need both for a metric to be useful.


Click MY ACCOUNT in top right corner to comment

<< Back to main


Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

Aug 20 12:31
How to evaluate HR-saving plays, part 3 of 4: Speed

Aug 17 19:39
Leadoff Walk v Single?

Aug 12 10:22
Walking Aaron Judge with bases empty?

Jul 15 10:56
King Willie is dead.  Long Live King Reid.

Jun 14 10:40
Bias in the x-stats?  Yes!