BaseballBoards.com - Why Runs Produced (R+RBI-HR) is still
a great stat
tangotigre@aol.com
Jump straight to the research
Tango Tiger
Why does R+RBI-HR work? Let's break down the formula.
A run can be broken down as follows: .27 1B + .44 2B +.62 3B + 1.00HR + .27BB
+constant
An RBI (as opposed to runners moved along in an earlier post of mine) can be
broken down as follows: .20 1B + .40 2B + .60 3B + .60HR + .03BB + constant
Add up the individual components of R+RBI-HR and you get: .47 1B + .84 2B +
1.22 3B + 1.60 HR + .30BB.
Now, these numbers are close enough to the real-world, that R+RBI-HR is a very
good quick proxy if you don't have access to anything else.
I know, I know, team helps, batting position helps, etc, etc. As well, if you
want to convert Runs Produced to Runs Created take AB/10 and subtract that from
Runs Produced. Voila, an excellent proxy
David Smyth
Where do those formulas come from? A little more explanation would help others
to evaluate what you are saying
Also, unless I'm reading it wrong, if you add the two HR components together
you get 1.60, but when you subtract out the HR (R + RBI - HR) you're left with
only 0.60 for a homer.
This agrees with the old Bill James criticism of runs produced--that HR shouldn't
be subtracted out.
CRS
I'm thinking the second linear expression is not right because there is no
way the coefficents for 3B and HR are the same. Intuitively, they would differ
by exactly 1.0 though (you drive in the same amount of people, plus yourself)
which makes me think that the second expression is actually for (RBI - HR).
Still, I'm with David Smyth. (R + RBI)/2 should be better than (R + RBI - HR).
Tangotiger, where did you get those linear formulae for RS and RBI?
Tango Tiger
Oops. I did mean to say 1.60 for the Homer for the RBI.
As for where they came from, it gets complicated, but let's see if I can articulate
it.
The "runs" portion is strictly derived from the linear weights grid of base-out
situations.
The "rbis" portion is derived as follows: I have calculated that for an AVERAGE
plate appearance, the batter will have at least one runner on base 45% of the
time, and 55% of the time the bases are empty. Of those times that a runner
is on base, 70% will have a guy on 1B, 45% will have a guy on 2B, and 25% will
have a guy on 3B. (It adds up to 140% because you can have more than one runner
on base.) Ok, then you need to know the percentages for each type of hit that
causes the runners to move an extra base. So, a 1B will cause a runner from
1B to get to 3B about 35% of the time, and a runner from 2B to score about 65%
of the time. A 2B will cause a runner from 1B to score about 45% of the time.
Then, it's jsut a matter of plugging all this in to an excel spreadsheet, and
you get the RBI values I specified in my first post (with the obvious correction
to the HR).
Therefore, Bill James is completely wrong on this issue (I read that Abstract
12 years ago, too.)
As for (R+RBI)/2, it simply makes no sense. On that basis, a 1B will be worth
about 0.25 runs, a 2B will be worth 0.35 runs, a 3B will be worth 0.60 runs,
and a HR worth 1.3 runs.
If anyone wants it, I will do an Excel spreadsheet, and pass it on.
Now, there are limitations. First off, I keep saying "AVERAGE". This is important,
since a team that NEVER homers will have widely different constants. The 80's
Cardinals come to mind. The reason is that without the homer, it's not so easy
to score from 1B. But at the same time, each hit now has MORE run-driving ability.
So, maybe the 1B has only 0.24 run scoring ability, but maybe the 1B now has
0.22 run-driving ability.
I did look at this issue once, and only at the extremes, as you would guess,
does the additive power of linear weights lose its strength. It is exactly for
this reason.
David Smyth
I see two serious problems with this whole thing.
First, the values Tango generates do look like linear weights, but in order
to produce a run estimate, something is missing.
It's the outs. For every 1000 runs, the weighted total for the positive outcomes
is around 1500 runs. To get back to 1000, around 500 runs worth is subtracted
by the negative outs adjustment.
Simply using only the positive linear values doesn't work to estimate runs.
And if R+RBI-HR is a proxy for the positive linear values, it won't work, either.
The second problem relates to the statement that Bill James (and myself) are
wrong that it's better not to subtract out the homers from runs produced. One
way to see who is correct would be to see which version--R+RBI or R+RBI-HR--correlates
better with actual runs. Logic tells me that it has to be R+RBI. If someone
does this study and I'm wrong I'll eat my hat.
CRS
(R+RBI)/2 is guaranteed to correlated with runs better than (R + RBI - HR)
on a team basis by definition! Add up the values for all the players and you
get the runs scored for the team.
I suppose you could account for RBI-less runs by adding a coefficient to balance
it out. Say, (R+C*RBI)/2, where C is simply LgR/LgRBI looks to be about 1.05
or so, but that removes the simplicity of the formula.
Subtracting off the HR had no theoretical basis. It was just done to count how
many of a team's runs a player "was a part of" and makes as much sense counting
half-sacks and sacks the same in american football.
Tango's formulas just don't look right, and if you have the data and time, you'll
use XR, RC or even OPS, anyways. They all work better than team and lineup dependent
Runs Produced.
Tango Tiger
David, there is no question that R+RBI will correlate
closer to team runs than what I have come up with. But that is inherent in that
RBI is usually equal to about 94% of Runs scored, regardless of HR, and so you
will get 99% correlation coefficient. I.e. you are comparing runs to runs!
But that is not the point. I am talking on an INDIVIDUAL basis. On an individual
basis, the positive runs correlate strongly to the positive linear weights WITH
THE HOME RUN ADJUSTMENT. The last thing to do is to proxy the negative runs
of linear weights. One way would be to use the outs, and work out the constant
so that the league totals match. The other way (and the one I prefer for its
simplicity) is to take At Bats and divide by 10.
Again, my point isn't to say Runs Produced is BETTER. My point is that subtracting
the Home Runs has a basis in fact. And the Runs Produced formula (with or without
my adjustment) has a simple elegance to it.
P.S. The rationale for the At Bats / 10 is this: the average hitter with 600
at bats will drive in 60 RUNNERS (RBI - HR). So, you can say that a batter is
presented with 600 at bats and drives in 60 runners. If a batter drives in 70
runners, he is a plus 10. Overall, the league total will be zero. Thus leaving
the aggregate run totals which will equal exactly. Again, looking for simplicity,
with some basis in fact.
David Smyth
OK. For some reason I overlooked the AB/10 subtraction at the end of Tango's
original post.
The best way to analyze this is to work backwards fron Tango's linear formula
to get to runs produced.
That formula is 1B*.47, 2B*.84, 3B*1.22, HR*1.60, and BB*.30
To incorporate the AB/10 adjustment, note that AB = H + (AB-H). So we subtract
.1 run for each hit and out.
The result is 1B*.37, 2B*.74, 3B*1.12, HR*1.50, BB*.30, and (AB-H)*-.10
At first glance this looks decent. When this formula is applied to an actual
league, it yields an estimate which is about 20% too low. This wouldn't be insurmountable
if all the elements were in balance. But the .37 value for a hit is around 20-25%
lower than the *correct* value of .47-.50. And the value for an extra base of
.37-.38 is around 20-25% higher than the correct value of .30-.32
This degree of imbalance is unacceptable in modern sabermetrics, even for a
so-called simple quick approximator.
The next step is to convert to the run/RBI based version, which is R+RBI-HR-0.1*AB
As we all know, the substitution of a batter's run and RBI totals for his hits
and walks is a fairly substantial step down in accuracy, due to the powerful
influence of situational differences.
And the final step, to wind up with runs produced, is to remove one of the four
elements--0.1*AB--from the above formula.
So what we have here is an unacceptable linear formula to start with, to which
another layer of inaccuracy is subsequently added, followed by the arbitrary
lopping off of 25% of the calculation.
The funny thing is, if Tango had simply reported on his values for the run and
RBI components of runs scored and stopped there, that would have been fine.
Those values are worth knowing.
Tango Tiger
First off, let me clarify that I am not trying to supercede,
replace, or in any way make a claim that runs produced is anywhere near as good
as Runs Created or Linear Weights. I would put it somewhere below OPS, and maybe
above OBA or SLG.
Secondly, my claim is also that the Home Run has to be subtracted from R+RBI,
based strictly on the Runs/RBI run components as I described.
Finally, there is another component to the RBI formula and that is "outs". If
you work it out, I agree that RBI's will fall 15-20% below actuals. To make
the component-RBI more accurate, something like .03 * (AB - H - K) would be
needed. Again, you work backwards using league stats and runs scored to come
up with all the constants you require. (I didn't want to get into all that stuff,
as well as SB/CS for the component-Runs.)
I'm also aware that I underweight the Singles, and overweight the extra base
hits, but that is a product of the Runs/RBI stats themselves. The missing component
would be "Base runner assists" or something to that effect. If MLB would count
the number of times a runner was moved along, and eventually scored, this "Assist"
would add value as well.
According to TotalBaseball.com: Babe Ruth, 2844 Runs Created, 3673 Runs Produced,
2833 Adjusted Runs Produced (i.e., remove AB/10). Ted Williams, 2538, 3116,
2345. Mike Schmidt, 1757, 2553, 1718. Tim Raines, 1592, 2311, 1455. Craig Biggio,
1041, 1494, 919.
All I am saying is that when you look at your daily newspaper, a quick look
at R+RBI-HR has alot of value.
David Smyth
The only real question remaining
is whether the best version of runs produced for individuals is R+RBI-HR or
just R+RBI. Using the 1999 sample of 57 NL batters with at least 500 AB, I checked
the correlation of their runs created (new version) with 3 versions of runs
produced-- R+RBI, R+RBI-HR, and R+RBI+HR. For R+RBI, it was .90 For R+RBI-HR,
it was .87 For R+RBI+HR, it was .84 Are these differences meaningful? Yes. Are
the results definitive? Probably not. One would need to use a larger sample
of hitters from different seasons, etc. But I'll go out on a limb and say that
I'm pretty sure the result would be the same. Runs produced has been around
for 20 years, and is still used by sportswriters and others to make their points.
They all seem to follow like sheep, subtracting out the homers without any apparent
consideration as to whether it makes sense to do so. Does the run scored on
a home run count any less than other runs? Does the RBI on a home run reflect
lesser effort or output than other RBI? Am I the only one who is bothered by
this?
Tango Tiger As I mentioned, you have to remove the HR. Breaking the R/RBI
into their hit components, by NOT removing the HR gives HR a value of 2.6 runs.
Removing the HR gives a value of 1.6 runs. As I also mentioned, on a TEAM level,
there is NO QUESTION that R+RBI correlates better with Runs than R+RBI-HR. The
reason for this is that RBI is usually equal to 94% of Runs Scored. And this
is REGARDLESS whether it is a high homer or low homer team. But the question
to ask is, on an INDIVIDUAL level, what makes more sense? And it makes more
sense for a HR to have 1.6 run value than 2.6 run value.
I think a better way to think about it is in basketball/hockey terms. Players
score goals or score baskets. The total of the individual goals/baskets equals
the team totals. Sometimes they score it on a breakaway, and sometimes they
get assists. In hockey, there are 1.6 assists/goal. Meaning every goal has 2.6
points attached to it. Basketball must have like 0.5 assists per basket. I would
say that an assist is equivalent to an RBI (ask Wayne Gretzky is you don't think
an assist is as valuable as a goal). The point is that when you score UNASSISTED
(a home run basically), only one point is credited for the goal. But if you
score a goal with 2 assists, that's 3 points. The fallacy is that baseball has
decided to give the batter an assist for his own run. I prefer RDI (RUNNERS
driven in). This would be akin to assists, and would support my results of the
R/RBI component runs being similar to Linear Weights. Tango Tiger David, Just
re-read your post, and sorry for replying so fast. I did not realize that you
did your study on individual players. I apologize again.
It is very interestign what you bring up then. What is also interesting is that
not only does your study show that HR should be kept inside the Runs Produced
formula, we also both agreed that by removing HR we are STILL overweighting
the extra-base portion of the component parts of R/RBI. Therefore, by keeping
HR, we are SEVERELY overweighting extra base hits. AND STILL, incredibly, there
is higher correlation with a straight R+RBI.
Very good post, and I'll need to think about it. The only thing off the top
of my head is that RC itself is invalid at the extreme level (which James kind
of admitted). The other part is that R/RBI of individual players are a result
of within a team context, and Runs Created assumes that it's basically a team
of the same hitter. Personally, I prefer James' other adjustment of calculating
runs scored on a team level with and without the player, with the difference
attributed to the players.
Great post again.
David Smyth
There are a few ways to analyze why R+RBI is best for individuals. One way
which doesn't require a single calculation is based on logic alone. The best
version for teams is obviously R+RBI. In order for R+RBI-HR to be better for
individuals, it would have to follow that HR have more significance for teams
than for individuals. For any outcome other than homers, that question might
require some sort of study. But homers are a unique occurence, because the answers
are all 100%. On a HR, the team scores a run and records an RBI 100% of the
time. On a HR, the player scores a run and records an RBI 100% of the time.
The significance for the team and the player is exactly the same.
Tango Tiger
Hey David, I agree I can't argue with your logic as it
is stated. The question still remains that the runs component for a HR by using
R+RBI is still 2.60 and that is completely wrong. I'll maintain that if baseball
originally had an RDI (runners on base driven in) 100 years ago instead of RBI,
then R+RDI would be the formula. Anyway, when I have time (next week hopefully)
I hope to answer this question on the flip side. I have proved the component
part that HR should be removed. Now, I will prove in practice. What I will do
will be pretty straightforward: I will look for two groups of hitters (say 30
or so) who have similar batting averages, on-base averages, and slugging averages,
but, one group will have far more home runs than the other group. (The second
group to compensate will need lots more doubles and a few less singles.) Then
we will simply compare their Runs, RBIs, and RDIs, and see which ones match
up. The hypothesis is that similar valued hitters should have similar Runs Produced.
Anyway, hope to get to it next weekend.
CRS
Originally posted by tangotiger "The question still remains that the runs component
for a HR by using R+RBI is still 2.60 and that is completely wrong."
That's because R+RBI double counts runs. You need to use (R+RBI)/2 to compare
directly to linear formulae. This puts the HR coefficient at 1.30 which is not
so bad. Then the question shifts to why 2B's and 3B's are underweighted. I'm
rather curious to see how this turns out, as your R & RBI formulae appear to
be interesting if they are correct.
I think it would be helpful to include players from all parts of the batting
order in your study, not just the stars who bat in the middle of the lineup
and tend to have high RP/BR ratios.
David Smyth
I think I now realize what the problem is. Tango's summed runs scored/RBI values--.47
for a single, etc.--look like linear weights. The only problem is...they're
not. Well, maybe the runs scored portion is, but the RBI portion isn't. The
.40 value for a double doesn't mean .40 runs, it means that there are .40 expected
RBI for each double. But the actual run value for each RBI is different---driving
in a runner on third with no outs has a different weight than driving in a runner
from first with two outs. So, even though these values happen to resemble linear
weights, they're not. And because they're not, there's no reason to alter them
to make them resemble linear weights even more. There's no reason to subtract
out the homers to change it from 2.60 to 1.60.
CRS
Funny David, I was just going to say that the RBI numbers aren't bad, but the
RS numbers are. First, none of this has to do with value at all. It has to do
with accounting. Whether or not a single with a man on third is more valuable
than a triple with a man on first is not at issue. Both result in 1 RBI and
at the end of the day (RS + RBI)/2 will correlate very well with total runs
scored as will linear formulae. The two methods just get there completely differently.
The (RS+RBI)/2 method will get there very circuitously, lineup dependently,
etc. It will place large coeffiecients on things like sacrifice flies, fielders
choices and other groundouts that may not have much "value" but add to the accounting
of who scores and who drives runs in. RBI from base-situations are straightforward.
I used Tangotiger's percentages (I never used the 1st-to-3rd one) and numbers
that were a bit higher than his, coefficients of .24/.46/.63/1.63/.035. I think
they may be even a bit higher though as the sum probability of the base-out
situations was less than one (~.94). That would leave me to believe that the
coefficients used in Total Baseball for expected RBI in Clutch Hitting Index,
namely .25/.50/.75/1.75 may actually be close to correct. This would put the
HR coeffiecient (the most trivial to consider) at 1.375, which looks even better.
The RS formula though. I see what tango may have done. If you average over all
the outs, you see that you can expect .32 runs with the bases empty and .58
runs with a man on first. That gives you .26 for a single which is about what
he had. Trouble is that the value added tell you the increased likelyhood that
SOME runner will score, not if THAT runner will score. Once you put that runner
on first, chances are that if a run scores it will be THAT runner, so maybe
the 1B coefficient should be up over .5 (a bit less perhaps due to FC's). Anyhow,
I don't have the retrosheet-type data to look at what I want to look at. Plus,
though this is an interesting puzzle (to me at least), runs produced numbers
really have no basis in determining "value" and I don't know if this is all
worth the trouble. This was originally posted as a way to "save time" after
all. I guess what I am curious about in all this is simple accounting-type numbers.
When a runner scores, how did they get on base in the first place? What percent
due to singles, doubles, fielders choices, etc. Same for RBI's. How many RBI
from homers, doubles, outs, etc. From this, one could construct percentages
of RS and RBI for a typical event and get some linear-like formulae. It might
not mean anything though.
Tango Tiger
CRS: you are absolutely correct, that this is all about
accounting and not about value. It just so happens that R+RBI-HR, when broken
down, corresponds closely to value. But it is primarily about accounting. R+RBI
/ 2: the one thing that always bothers me about stats where you divide them
by 2 is that it no longer becomes a straight additive play. Going back to hockey,
they have a good stat called plus/minus. Each of the 5 skaters on the ice gets
a plus one when their team scores, and a minus when the opposing team scored.
The aggregate total yields a value that is 5 times larger (by definition) than
the team goal differential. You may be TEMPTED to just say plus/minus divided
by 5, but it doesn't work that way. The reason is that no all players participate
in the plus equally. I think the R+RBI / 2 argument can work out the same way.
That we can break down the r, RBI components and show how closely a R+RBI-HR
matches closely to Linear Weights reinforces this notion (to me anyway).
CRS: Interesting point about the .26 meaning SOME runner but not THAT runner.
You are right, and I simply used my figure as a proxy. That the 1B coefficient
should be 0.52 or 0.46 doesn't really change much for my purposes. My point
is simply that R+RBI-HR has some basis in fact. But your point is very well
taken.
David: absolutely correct that 0.40 does not mean 0.40 runs but simply that
the average double results in 0.40 RBIs. And there is no question that the average
double is NOT worth 0.40 runs in "run-driving ability". It is closer to 0.30
runs. And T/HR are closer to 0.40 runs in "run-driving ability", and not the
0.60 / 1.60 that RBIs give them. I do agree with your point that an RBI in certain
situations should be worth more than others. That "AB/10" thing that I do is
suppose to address this in a general sense. If we consider that you get 600
at bats, and that the average hitter drives in 60 RUNNERS, then we can say that
every 10 at bats yeilds 1 runner. However, you can try to be fancier about it,
and break down his at bats in the situations you describe and get a truer picture
of his run driving ability. Before someone out there thinks this is now getting
away from the simplicity of R+RBI-HR, please note that this last exercise will
yield clutch performance (if not ability). By actually counting the number of
runners driven in in different base-out situations, and compare it to the average,
you are getting a true picture of a batter's ability to drive in a run. But
that is another threas altogether, and I invite someone to start that one.
I just got back from my vacation, and I promise to look at the R+RBI of homer
hitters v non-homers hitters this week!
Tango Tiger
Ok, so I couldn't sleep, so I decided to run my study now. Here it is. The process. First off, I used Lahman's database (all thanks to him for making this easy). I created a database, with seasons of at least 300 plate appearance (AB+BB to be more accurate).
From this list, I took the 20 seasons with the biggest skew towards homeruns. Consider these guys as those who contribute most with their home runs ( a player like Barry Bonds, who contributes with everything, would not appear on such a list): 5 seasons of Dave Kingman, 3 seasons of Sammy Sosa, 2 seasons of Mark McGwire, 2 seasons of Matt Williams, and the rest were one season players (1987 Andre Dawson for example). The aggregate totals of these 20 seasons (let's call them King Kongs) were: 327 OBA, 566 SLG, 262 BA. Those are basically the kind of numbers you'd expect from one-dimensional power hitters.
Then I looked at the other side. I looked for hitters who hit within 8% of the OBA and SLG average above, and since all the above hitters came from 1950 and later, I decided to limit my study to those years. I ended up with an eclectic list: 2 seasons of Cecil Cooper, and single seasons of such players as: Dave Parker, Nomar Garciaparra, Felipe Alou, and Andre Dawson (again!, this time 1983, and not 1987). The aggregate totals of these 20 seasons (let's call them Little Cecils) were: 347 OBA, 534 SLG, 307 BA. Those are the kind of numbers you think of when you think of Cecil Cooper.
So, what were the difference in Runs and RBIs between
the Kongs and the Cecils? Well, first off let's look at the difference in each
of the hitting components. The Kongs had 21 more home runs and 18 more walks.
The Cecils had 33 more singles, 14 more doubles, and 5 more triples. (All numbers
averaged against a 600 Plate appearance season for convenience.) The OPS was
893 for the Kongs and 881 for the Cecils. The positive values of Linear Weights
shows that the pluses of the Kongs are slightly better (by 3 runs) over the
pluses of the Cecils.
So, if R+RBI-HR is accurate then we should see the numbers of the Kongs and
Cecils to be similar. If R+RBI is more accurate, then we should see those numbers
to be similar. The results: For the Runs part, the Little Cecils scored 91.1
runs versus the 90.4 runs of the Kongs. A virtual wash. For the RBIs, the Kongs
had 116 versus the 95 RBIs of the Cecils. That difference is 21 RBIs. If you
remember above, the Kongs also had 21 more home runs. If you look at RBI-HR,
BOTH playerss had 68 Runners driven in. So, what we have are two groups of players
of roughly the same value, one of which derives most of their value from their
home runs, and the other one does not. Yet, their Runs Produced (R+RBI-HR)came
in at 159 for the Little Cecils and 159 for the King Kongs.
If someone wants me to run something different, my database is all set and ready to go; just give me the parameters you want me to run.
David Smyth
Tango, hope you were able to sleep afterwards. Your study
suffers from the same main problem as mine--small sample size. If you have all
of the batter seasons since 1950, and a computer to do all the dirty work, why
not do a study involving a thousand batters instead of just a few dozen? This
way, you could include bad and average hitters instead of only good ones. You
could reduce or eliminate the dependence on atypical hitters with extreme HR
dependence. You could eliminate the possibility of batting order contamination,
which may be present in your design. Another variation would be to switch from
controlling for batting performance and HR and checking for R and RBI, to controlling
for R, RBI, HR, and checking for batting performance. Might be better, I'm not
sure.
Tango Tiger
I expanded to 100 player-seasons, and changed the premise slightly. First off, I kept all the stats in a context of 600 PA (AB+BB) for all those players with over 300 PA.
I then looked for those players who contributed most
of their offense with their Home Runs. This gave me 7 Dave Kingman seasons,
7 McGwires, 5 Juan Goanzalez, and a slew of other players. Their AB/1B/2B/3B/HR/BB
are as follows: 534.95 / 75.38 /21.87/1.84/43.89/65.05. Then I ran a similarity-type
score, looking for players who were above the non-HR as much as possible, and
were close to 0 in the HR. I ended up with 7 Wade Boggs seasons, and 6 Luke
Appling seasons, and a slew of others. Their totals read: 521.54 / 128.65/31.65/4.13/2.24/78.46.
So, looking at the individual differences, we see the Wade Boggs end up with
about 53 more singles, 10 more doubles, 2 more triples, and 13 more walks. The
Kingmans end up with 40 more home runs. The positive values of Linear weights
tells us that the Kingmans are worth about 15-20 more runs. The results. The
Boggs players ended up with 82 runs and 62 RBIs. Their runs produced were 142.
Their R+RBI were 144.
If Runs produced (with the home run subtracted) is more
accurate, then we should see Kingmans RP at about 160. If R+RBI (keeping the
homeruns intact) is more accurate, the Kingmans R+RBI should come in about 165.
The Kingmans ended up with 91 runs and 113 RBIs. That total is 204, and is a
whopping 60 runs above the Boggs numbers. The Kingmans RP (with HR removed)
is 160, and is 18 runs above the Boggs RP, and is PRECISELY what we expected.
I am sure if I re-run this study with 500 players or 1000 players, I will end
up with the same conclusion: the Runs Produced figure with the HR removed is
a more accurate measure of a player. This has been demonstrated by looking at
the individual logical components, and by looking at the players' actual numbers.
Thanks for the feedback guys, as this was alot of fun for me. But I've got to
get back to some boring work now!
P.S. I am re-running the study now, this time controlling the home Runs at 10
(rather than at zero). The Boggs numbers are: 525.49 / 130.22/ 36.74/ 4.82/8.43/
74.51. This gives Boggs 55 more singles, 15 more doubles, 3 more triples, 9
more walks, but 36 less home runs. Linear Weights tells us that the Kingmans
are slightly better (by 6 runs). In effect, pretty much equal-valued players.
The RP of Boggs comes in at 160.7, while those of Kingmans comes in at 159.65.
Pretty much a wash as well. Therefore, removing HR from RP is more accurate
than leaving it in. Thanks....
Tango Tiger
Ok, one last study, and this one is really exhaustive. For each year from 1920 to 1999 (80 years in all), I took the 10 players that contributed the most with their home runs. That gives us 800 player seasons. This also removes any era-biases. These are the King Kong players. For each year, I then took the 10 best hitters who were not home run hitters. These are the Prince Boggs players. So, we will be comparing 800 player seasons to 800 player seasons, with the era-bias removed. The results. PAGE DOWN...don't know why it gives me the blank spaces.
Note: information has been lost by website. I'll try
to reproduce it.
As you can see, the King Kongs, based on their Linear Weights, are worth about
16-17 runs more than the Prince Boggs. If Runs Produced is accurate, we should
see a similar number. If R+RBI is more accurate, then we should see the King
Kongs ahead by 15-20.
As it turns out, the R+RBI of the King Kongs are ahead by a whopping 51 runs.
Their RP is ahead by 21 runs, which is pretty close to what we expected.
Tangotiger (added in some other thread)
The problem with such a weird profile is that for the R/RBI to come out like that, this player must have performed unusually well or poorly with runners in scoring position.
For example, runs scored is roughly equal to .27*1B+.44*2B+.61*3B+1.00*HR+.27*BB.
RBI is roughly equal to .2*1B+.4*2b+.6*3b+1.6*HR+.025*(AB-H). That last value is "forced" in to make sure that the league averages balance out. Like the out constant in LW.
So, for example, a player with the following profile
in 660 PA:
110 30 4 16 60 440 (1b,2b,3b,hr,bb,outs) will have 77.5 runs scored and 72.9
rbis.
Now, to generate a 100/100 guy with 10 hrs, you need
UNDER NORMAL CONDITIONS (660 PA):
141 70 30 10 10 399. If you go back to say Tommy Herr, he did not have such
a profile. I would guess that he hit alot with RISP, AND he was very good at
that as well.
To generate a 125/125, with 65 HRs, you need UNDER NORMAL
CONDITIONS (660 PA):
17 20 0 65 173 385. Again, another "impossible" situations. But I
think McGwire might have performed like this (the 125/125,65) a couple of years
ago. I will guess then that he has few RISP and performed poorly in those situations.
So, which of these 2 guys is better? Well, LWTS says: the first guy has 129 RC, and the second guy (the HR guy) as 125 RC. Their +/- (with 0 as average) is +49 for the 1st guy and +57 for the second guy.
If you incorporate my formula of R+RBI-HR-AB/10, you get: the first guy is 125 runs, and the 2nd guy is 136 runs.
So, however you slice it, these guys are within 10 runs of each other, and not 50 runs apart.
Tangotiger (added in some other thread)
I looked at all players since 1975 with over 300 PA
(AB+BB). I then grouped them as one of 6 types of hitters (singles hitter, doubles,
triples, homer, walk, steals). I then broke down these hitters into 7 values
of hitters (RC over 100 runs, 90, 80, 70, 60, 50 and under 50). What we end
up is 42 "aggregate players" each in a very clear category. Any difference can
be easily accounted for. All of this can be found at http://www.geocities.com/tmasc/RCType.xls.
The results. First thing I did was a regression analysis of the 6 hitting
categories versus R+RBI. This would establish what the Linear Weights coefficients
are for R+RBI.
1B = 0.58
2B = 1.09
3B = 1.31
HR = 2.88
BB = 0.27
SB = 0.20
As you can see, R+RBI overweights singles by about 0.10 runs, doubles by 0.30
runs, triples by 0.30 runs, and home runs by 1.4 runs, while underweighting
walks by under .10 runs. If you use R+RBI-HR, the constant for HR becomes 1.88.
The r-squares of R+RBI v RC is 93.7%, which is great. Adj RP is 98.5%.
Next, since I slotted each of the 5,800 players into one of 42 categories, we
can see what differences pop up. First, let's look at the very best hitters
(RC > 100 runs). For each of the 6 types of hitters (singles, homers, etc),
they all have a RC between 108 and 112. We can say, therefore, that these different
types of hitters all have the same value, though they got there in different
ways. When we look at the adjRP, they range from 110 to 118. An acceptable deviation,
with an 8-run range that is a bit off from RC. But looking at R+RBI, they range
from 182 to 193 for the non-HR player (11 run range), and the HR player comes
in at 208! Now, remember, these 6 types of hitters are all worth about the same
(RC between 108 and 112). Yet, the homerun hitter's R+RBI is 15 to 25 runs above
all the other great hitters.
Let's look at the near-great hitter (RC > 90). Their RC range from 94.0 to 94.6,
for a puny range of 0.6 runs. These 6 widely different types of hitters are
all worth the same, and any overall stat should show them to be the same. adjRP
shows them worth 95 to 100 runs (a 5-run range which is a bit off). R+RBI? Well,
the non-HR hitter comes in at a range of 163 and 175 (which is a wide range
to begin with). The HR hitters comes in at 186 runs! This is 11 to 23 more runs
than he should have.
How about the good hitter (RC > 80)? RC comes in at 84.4 to 85.0 runs. adjRP
comes in at 83 to 90 runs (range of 7 runs, which is a bit high). R+RBI? non-HR
hitters range 150 to 160 runs (10 run range). but the HR hitter comes in at
173 runs! That is 10 to 23 runs too high.
The mediocre hitter (RC > 70) looks the same: RC between 74.5 and 75.4 runs.
adjRP between 72 and 79 (7-run range). R+RBI for non-HR hitter comes in at 137
to 146 runs (9 run range). HR hitter? 159 R+RBI, which is 13 to 22 more runs
than he should get. How about the fair hitter with RC > 60? RC comes in between
65 and 66 runs. adjRP between 60 and 68 (8-run range). non-HR R+RBI is 125 to
134 (9-run range). HR-hitter is 146! That makes him worth 12 to 21 more runs
that he should get.
The bad (RC > 50) hitter? RC in at 55.2 to 56.2 runs. adjRP is 49 to 57 (8-run
range). non-HR R+RBI=111 to 118 (a 7-run range, and the first time R+RBI does
better than adjRP). but the HR hitter's R+RBI? Try 128, and 10 to 16 runs more
than it should be.
How about the worst hitters in the last 25 years (RC < 50)? How do they do?
RC between 42 and 45 runs. adjRP = 37 to 47 runs (10-run range). non-HR R+RBI
is 96 to 108 (12-run range). the HR hitters in this group? 119 R+RBI, which
is 11 to 23 more runs than he should get.
Conclusion
1 - the R+RBI of the home run hitter is consistently 20 runs higher than a similarily
valued, but non-home run hitter.
2 - The adjRP of all types of hitters show no such tendency.
3 - The regression analysis shows that if the home run is to remain part of
R+RBI, then the RC formulaes as we know it are invalid (which they are not).