More Web Proxy on the site http://driver.im/

Issues with MLEs

Why I hate how they are used

The minors-to-majors factors are based on players that played in both minors and majors. From that standpoint, what issues do we have? Let's go through my 5-point amateur list of issues to address in a scientific study.

Selective sampling

Only very specific minor league players get to play in the majors. Which ones? It could be any combination of

great results in the minors (i.e., somewhat lucky)
scouts see something great from him in the minors
needs in the majors

Which players get to stay in the majors? It could be any combination of

great results in the majors and/or minors (i.e., somewhat lucky)
scouts see something great from him in the majors and/or minors
needs in the majors

Sample size

While you have a large enough pool of players, how are they weighted? Well, the weights are based on the lesser number of PAs between the minors performance and the majors performance. What can cause one number to be much lower than the other? It could be any combination of

was called-up in-season
performed very poorly in the majors (by luck or design)

You can also decide to weight all the players equally. But, then you'd have to draw a cutoff point, like at least 150 PAs in both leagues. Otherwise, someone's 25 PAs in one league will be weighted the same as a player with 400 PAs in both leagues.

The in-season callup and the season-to-season callups also have their problems. In one case, there is a 12 month gap between performances, while the other one would have say a 3-month gap. When it comes to players that are on the high-sloping upswing (i.e., players under 25), the MLEs will also capture aging/conditioning issues, as well as trying to establish equivalencies.

Regression towards the mean

All performance, whether in majors or minors, are nothing but samples, and therefore need to be regressed to some mean. Which mean? The whole minor league population (i.e., almost always lower)? The whole minor league population at the same age? The whole minor league population at the same age who were drafted within n rounds of each other? There is ALOT that we do know about the minor leaguers that are called up, and that information should be used.

How do you regress the rookie's MLB performance? Against all MLB players (i.e., almost always higher)? All MLB players at the same age? All MLB players at the same age and experience (tough to do because of the selective sampling issue).

Is there greater variability between a player's 1st/2nd year than say his 3rd/4th years that would not be caused by random chance? Should the major league mean of the player's rookie season actually be regressed towards his 3rd/4th season in establishing MLEs (for past players)?

Lack of control group

How can we control for any of the unknowns? What group of players can we compare them against? None?

Biased context

Do some hitters, or quality of hitters, or profile of hitters have a certain advantage/disadvantage in the minors/majors that will not be as prominent in the other league, compared to the rest of the population?

Is it possible that the MLEs can only be applied to the guys that are expected to be called up, and that those players that would never have been called up because of some deficiency in their game would become readily apparent against higher competition? (That is, maybe an experienced minor leaguer is feasting on a certain portion of pitchers or profiles of pitchers or quality of pitchers, masking his problems from the total numbers.) Can't we break down these players by experience? Can't we put more weight on the minor league performance numbers against high-draft pitchers and less weight on the pitchers in the minors who are older than say 26?

Is it possible that the park factors are biased against certain hitters or types of hitters, whether in the minors or majors? How reliable are the park factors, especially in the minors? (I would not look at the NYY 1920-1930 HR park factors for lefties, if I know that half the HR in that park were hit by Babe Ruth. I would not look at the Busch stadium HR park factor and think it applies equally to Vince Coleman and Jack Clark.)

I would suspect that rookies and sophs are platooned alot more than others at the MLB level. So, if you've got an OBA in the minors translated as an MLE of 350, that might be 390/330 (against LP/RP). In the majors, you might actually perform at 390/330, but if the majority of your PAs are against righties, your OBA will be closer to 330 than to 350.

Conclusion

I hate the way some people use MLEs because we have many issues to resolve. It is possible by the time all these issues are resolved (or at least the degree of impact of these issues is established) that they will all cancel out. I dunno.

Until that happens, there is a large room for error here. MLEs, as currently published, is a first step. We have many steps to go through before we can reduce the error range. We should not treat the currently-published MLEs as a final product.