|
author |
topic this topic is 3 pages
long: 1 2 3 |
 |
 |
Warren |
posted August
13th, 2001 01:51 PM |
|
 |
Senior
Member Member Since: Dec 1999 Location:
|
You can
change the Davenport exponent formula (it was a creation
of Clay Davenport, and not Keith Woolner, I believe) to
match Tango's numbers a bit better. Instead of:
exponent = 1.5*(log total R/G) + .45
you
can use:
exponent = 0.93*(log total R/G) + .85
This matches up a lot better with Tango's
results. Here's an example:
4 runs scored, 3
runs allowed:
Exponent of 2: 0.640 Exponent
of 1.83: .629 Davenport Exponent (1.62 in this
case): .621 Revised Exponent (1.57 in this case):
.616 Tango's: .607
Now, I believe he
generated the constants in the original formula by
fitting the equation to team winning percentages (based
on runs scored and runs allowed). So in some sense the
original is more accurate, in that we *know* if fits
actual data.
|
 |
tangotiger |
posted August
13th, 2001 05:43 PM |
|
 |
All Star Member
Since: May 2000 Location:
|
Interesting.
However, the empirical data
at this point now suffers from sample size. No team
scores exactly 4 runs and gives up 3 runs. You can find
a class of teams that scores 3.8-4.2 etc. And of course,
there are some combinations that you won't find anywhere
near enough. In any case, you have to take some
liberties with the empirical data.
The great
thing with Woolner's real-life data of runs/inn
distribution is that he used 20 years, plus each game
has 18 innings, and therefore you have tons of
data.
|
 |
David Smyth |
posted August
13th, 2001 07:43 PM |
|
 |
All Star Member
Since: Dec 1999 Location: Lake Vostok
|
Tango
and Patriot, thanx for the kind words.
IMO, it's
Tango whose "stuff" deserves more attention. Hey Tango,
how about an article for the Baseball Primer?
One small disagreement. A few posts ago Tango
said that you could use his run per win formula to
produce marginal wins from lwts (marginal runs). The
only problem is that lwts isn't really marginal runs.
It's a sort of "compressed" marginal runs. Your formula
should work with a full BsR (for pitchers, say) or a
lineup-added BsR procedure (for batters). But not with
lwts.
|
 |
tangotiger |
posted August
13th, 2001 10:18 PM |
|
 |
All Star Member
Since: May 2000 Location:
|
I find
it interesting you say that.
"Marginal" refers
to "what would the effect be if you add 1 more item".
This term would be used in economics and supply and
demand. For example, a bottle of beer may cost 1$. But
if you buy 10 bottles, you may only want to pay 40 cents
for that 11th bottle. If you buy 24 bottles, you may
only want to pay 20 cents for that 25th bottle. That is
how I am using marginal.
So, for baseball, what
is the marginal effect to a team for adding one more
single? Well, if the team scores 1000 runs before the
single, and scores 1000.47 runs after the single, the
marginal effect of that single is .47 runs.
Linear Weights IS marginal runs.
David?
|
 |
Patriot |
posted August
14th, 2001 10:59 AM |
|
 |
All Star Member
Since: Jul 2000 Location: Ohio
|
Actually Tango, I wrote it before all of the sim
stuff on here, so no it is not included. Really, we
should have MORE serious sabermetricians on this board.
I remember getting on the net 3-4 years ago, and there
was always some new interesting material on Stathead or
Baseball Stuff or rsbb, and now the only place I go
where there is any new research and studies is here.
Most of the other sites are just writing about
transactions or whatever(BPx2), which is fine because I
care about that too, but it's not advancing
sabermetrics.
|
 |
Riverfront76 |
posted August
14th, 2001 01:39 PM |
|
 |
Member Member
Since: Apr 2001 Location: CT
|
Tango:
After looking at your Win% chart based on runs
scored/gm, would the next step be to create a
3-dimensional chart?
i.e. Team A scores 4.5 r/g
and gives up 4.3 r/g. Facing team B who scores 5.0 r/g
and gives up 4.8 r/g, what is each teams win
expectation?
|
 |
tangotiger |
posted August
14th, 2001 05:09 PM |
|
 |
All Star Member
Since: May 2000 Location:
|
Actually, no that would be another chart. Pat
talked about this with his spreadsheet, so you would do
that as step 1. Then you would take then and use my
chart.
I have not looked at Pat's chart, but I
know "how" you can determine the r/g. First thing you
have to realize is that runs, unlike all other events
are not success/non-success. By that I mean you can
figure out EXACTLY what a .340 hitter would hit against
a .270 hitter, if the league was .250. I've described
this method in the past, and the mathematicians tell me
it's called log5. In any case you CAN'T use this with
runs.
What you have to do is DECONSTRUCT the r/g
into the h/hr/bb etc components, and THEN apply the log
5 method. Then you can reconstruct the r/g based on the
new components. From there, you can determine the new rs
ra for the 2 teams.
From there you can plug into
my win % chart.
|
 |
David Smyth |
posted August
14th, 2001 08:37 PM |
|
 |
All Star Member
Since: Dec 1999 Location: Lake Vostok
|
I was
thinking of "marginal" as meaning "above the baseline",
as in runs above average (which is what lwts is supposed
to be). But as we know, lwts falls short in that regard.
Custom lwts is what is correct in that framework. The
difference between standard lwts and custom lwts is
zilch for an average hitter, but for a Babe Ruth it's
about 20%. Because of that discrepancy, if you want to
use standard lwts you have to be careful about which R/W
construct you choose.
I think part of the
problem in what I've been saying on this topic is the
use of the term "wins". The adjustment to lwts that I
endorse (dividing by the total R/G) does not actually
compute wins. It merely adjusts lwts runs for the
frequency of runs in the league (to permit out-of-league
comparisons). It just so happens that there is a
constant relationship between the number of runs in the
league and the number of wins in the league (R/W = total
R/G). Still, the term "wins" means different things in
different contexts and might be the stumbling block
here.
Just remember this: using standard lwts
results in the most "contracted" (towards the average)
run estimate possible. To most properly counteract this,
you must use the most "expanded" win factor (away from
the average) possible. And that is always the total R/G.
Tango's quick formula for R/W produces 10.75 for
a 4.5 R/G context. But if you look at his W% chart, a
team which scores 4.6 and allows 4.4 has a W% of .521.
.521 means 84.4 wins, which is 3.4 above average. The
run difference between these 2 teams in 162 G is 32.4.
So the (net) runs per win is 32.4/3.4, which is 9.52. As
the run difference between the teams increases (while
averaging 4.5 R), the R/W also increases. But it doesn't
reach Tango's 10.75 value until somewhere around 5.4 vs
3.6, which is a W% in the upper .600s. So I don't
understand what his formula is supposed to be showing.
The 4.6-4.4 example is the closest to .500 that Tango's
breakdown shows, but as you get even closer to .500 the
R/W would go down from 9.52--until it converged to 9.00
(the total R/G in a 4.5 context) at exactly .500.
Why is it important to know the R/W at exactly
.500? Because the standard lwts values are only correct
at .500 (an average team). Using standard lwts with a
non-.500 R/W factor (such as Tango's) is mixing apples
and oranges.
|
 |
Patriot |
posted August
15th, 2001 09:03 AM |
|
 |
All Star Member
Since: Jul 2000 Location: Ohio
|
quote:
originally posted by tangotiger
Actually, no that would be another chart.
Pat talked about this with his spreadsheet, so you
would do that as step 1. Then you would take then and
use my chart.
I have not looked at Pat's
chart, but I know "how" you can determine the r/g.
First thing you have to realize is that runs, unlike
all other events are not success/non-success. By that
I mean you can figure out EXACTLY what a .340 hitter
would hit against a .270 hitter, if the league was
.250. I've described this method in the past, and the
mathematicians tell me it's called log5. In any case
you CAN'T use this with runs.
What you have to
do is DECONSTRUCT the r/g into the h/hr/bb etc
components, and THEN apply the log 5 method. Then you
can reconstruct the r/g based on the new components.
From there, you can determine the new rs ra for the 2
teams.
From there you can plug into my win %
chart.
You are correct Tango, in that
breaking it down into components would be the way to go.
All teams do not match up equally, and so the offensive
shape is an important consideration. But you can answer
this as a general question(i.e. 5 r/g, no specific
team), like Rivefront did, by using Log5 in combination
with your method.
What you do is figure each
team's prob. of scoring 0 runs and the league prob of
scoring 0, and put this in Log5. Do this at every run
possibility(0-20 has been the standard all thread),
multiply the probability by the number of runs it
represents, sum and multiply by 9. That is the expected
r/g.
In Riverfront's example, assuming a league
average of 4.5, Team A would scored 4.5 runs per game
and allow 4.77. Looking at Tango's chart, Team A would
have a W% of .474.
The reason this method works
for the general case is because we are dealing with
probabilities of run scoring. The Log5 method will work
with probabilities of any two forces meeting, if I
remember correctly, and so it works with f0, f6, etc.
[edited
by Patriot on August 15th, 2001 at 09:09
AM]
|
 |
tangotiger |
posted August
15th, 2001 09:37 AM |
|
 |
All Star Member
Since: May 2000 Location:
|
Pat,
excellent point!
David, your statement about
converging to the 4.5 r/g implying the r/w of 9.0 is
intriguing!
However, that being said, I am
talking about marginal runs/marginal wins, and therefore
there is no linear relationship.
I do not
believe that custom LWTS + my chart would yield the same
value as standard LWTS + your r/w.
|
 |
David Smyth |
posted August
15th, 2001 10:00 AM |
|
 |
All Star Member
Since: Dec 1999 Location: Lake Vostok
|
Tango,
I'm not claiming that custom lwts + your chart exactly
equals standard lwts + my R/W. They should be close,
though. As to which is "better", that's a complicated
question.
I still would like an answer to my
question: what does your 10.75 "quick" R/W represent
(for a 4.5 R/G context), or how should it be used? An
example would be helpful.
|
 |
tangotiger |
posted August
15th, 2001 01:27 PM |
|
 |
All Star Member
Since: May 2000 Location:
|
That
should give you the win% chart of mine, similar to that
exponent formula proposed earlier.
For example a
team of 4rpg v 3rpg gives 7 totals runs, or 7*.75+4=9.18
runs/win.
So, the 1 run difference yields 0.109
wins over .500, or .609 overall.
(Coincidentally, I think this is what my chart
shows. This breaks down at the far extremes of course,
which is probably where that custom-exponential formula
comes into play.)
|
 |
David Smyth |
posted August
15th, 2001 09:35 PM |
|
 |
All Star Member
Since: Dec 1999 Location: Lake Vostok
|
"So,
the 1 run difference yields .109 wins, or .609 overall."
Tango, I frankly have no idea what you're
talking about. A yield of .109 wins does not mean that
you add .109 to .500 to get .609. Adding .109 wins to
.500 means 81.109 wins, which is .5007, not .609.
You really must explain this. Either you're
making a silly mistake, or I don't understand what's
going on. (Probably the latter )
For
your example of a 7 R/G context, your formula says 9.18
R/W. Mine says 7 R/W. There is really no discrepancy
there since we mean different things with our stats. But
Palmer's and James' R/W would be around 7.8 R/W, and I
believe that their R/W and yours are supposed to be the
same thing. Why is yours so different?
|
 |
David Smyth |
posted August
16th, 2001 07:07 AM |
|
 |
All Star Member
Since: Dec 1999 Location: Lake Vostok
|
OK, I
seeit now. It's just a matter of orientation. You are
using marginal runs per game. I'm used to looking at
these things on a per season basis, so it's 162 runs per
season you're talking about.
Gotta go. I'll be
back later.
|
 |
tangotiger |
posted August
16th, 2001 09:27 AM |
|
 |
All Star Member
Since: May 2000 Location:
|
Yes
everything is on a per-game basis. Win% of .609 is of
course .609 wins per game, etc, etc.
|
 |
Patriot |
posted August
16th, 2001 10:13 AM |
|
 |
All Star Member
Since: Jul 2000 Location: Ohio
|
If you
use Tango's RPW formula and apply it to 1961-2000 teams,
the RMSE of predicted wins is 4.15. Using a constant RPW
of 9.35 gives 4.08. This is puzzling, the formula that
uses a custom RPW should preform better than the fixed
method. I can get 4.05 from a RPW of 1/(.158-.06logRPG).
So I don't think Tango's formula for RPW is very
reliable.
|
 |
tangotiger |
posted August
16th, 2001 12:37 PM |
|
 |
All Star Member
Since: May 2000 Location:
|
If you
look at the 67 teams that had a run differential of at
least +150 runs, you get the following numbers: 5.04
rs/g, 3.83ra/g, .625 winning record.
Against my
chart though, a team scoring 5.0 and allowing 3.8 has a
record of .611
Using my r/w formula, you get
0.75*(5.04+3.83)+4=10.65
Therefore, you get a
record of (5.04-3.83)/10.65 + .500=.614
So, my
formula matches pretty well with the "expected" winning
record.
The problem therefore lies in why does a
team win more than their scoring distribution says it
should? Now, either I have a problem with my program and
maybe I should increase my sample size in the sim
(though I doubt that's the problem), or that I cannot
extrapolate the r/i directly into r/g.
================ Note: I have re-run the r/i
to r/g program, but this time to create 100,000 games as
opposed to 10,000 for each event. The win% for a 5.0/3.8
competition is .617. Can someone tell me how probable it
is that a .617 team will play .625 over 10,582 games? If
it is improbable, then it is possible you have in-game
strategies that causes these .617 teams to play at .625
[edited
by tangotiger on August 16th, 2001 at 12:56
PM]
|
 |
Vinay |
posted August
16th, 2001 04:15 PM |
|
 |
Member Member
Since: Nov 2000 Location:
|
quote:
originally posted by tangotiger Can someone
tell me how probable it is that a .617 team will play
.625 over 10,582 games? If it is improbable, then it
is possible you have in-game strategies that causes
these .617 teams to play at .625
It's fairly probable. I get the
std. dev. of a .617 team over 10,582 games to be 50
games. (.625-.617)*10,582 is 84 games. So that's within
two standard deviations, which is the commonly used
interval for "statistically insignificant" (there's
roughly a 67% chance of finishing within one SD, and 95%
chance of finishing within two SDs).
(Excel
can't handle binomial distributions with that many
trials, unfortunately, so I can't give you a precise
probability).
|
 |
Vinay |
posted August
16th, 2001 04:19 PM |
|
 |
Member Member
Since: Nov 2000 Location:
|
quote:
originally posted by Vinay (Excel can't
handle binomial distributions with that many trials,
unfortunately, so I can't give you a precise
probability).
Excel can use normal distributions,
though. There's a 4.5% chance that a .617 team could
play .625 ball or better over 10,582 games.
|
 |
tangotiger |
posted August
16th, 2001 06:06 PM |
|
 |
All Star Member
Since: May 2000 Location:
|
I
always had this question about probabilities. It's
always ".625 OR BETTER". That really means an AVERAGE of
.632 or something.
Since we have 67 teams
AVERAGING .625 (say they all finished between .575 and
.675), then why do we look for .625 or better?
If you look at the distribution probability of
.600-.605, .605-.610, etc, etc, wouldn't we find then
that working backwards you have 1 slice at .700-.705, 3
slices at .695-.700, etc down to 1000 slices at
.615-.620, etc, etc, so that the WEIGHTED AVERAGE of all
these slices is .625? So, really, we want ".615 or
better", knowing full well that the weighted average of
this class will be .625.
Always bugged
me...
|
 |
David Smyth |
posted August
16th, 2001 09:10 PM |
|
 |
All Star Member
Since: Dec 1999 Location: Lake Vostok
|
Tango,
are you gonna redo your W% chart using a larger sample
of games. I always thought that 10,000 G might be too
low. 50,000 should be fine.
|
 |
tangotiger |
posted August
17th, 2001 09:47 AM |
|
 |
All Star Member
Since: May 2000 Location:
|
Actually, I posted the new chart with 100,000
games.
I think when I get the chance that i'll
run 1 million games.
|
 |
Patriot |
posted August
18th, 2001 12:47 PM |
|
 |
All Star Member
Since: Jul 2000 Location: Ohio
|
Good pitching
beats good hitting?
Using
my spreadsheet, I tried putting in a offense that was
50% above league average and defense that was 50% above
league average. In a .5 RI league, the RIs were .75 and
1/3 respectively. And the resulting runs scored was just
4.36. I would have bet the farm that the result would be
4.5.
Anyway, it is close enought that it could
just be because I'm using the wrong formula for Tango
Distribution(the original rather than the revised or
something), or did something else wrong, but I am very
surprised by the result.
|
 |
tangotiger |
posted August
21st, 2001 12:07 PM |
|
 |
All Star Member
Since: May 2000 Location:
|
win%
chart has been updated with 1,000,000 games run per pair
matchups.
http://www.geocities.com/tmasc/winsrpg.txt
|
 |
voros |
posted
September 3rd, 2001 01:03 AM |
|
 |
Senior
Member Member Since: Jan 2001 Location: Chicago,
IL
|
quote:
originally posted by Vinay
quote:
originally posted by Vinay (Excel can't
handle binomial distributions with that many trials,
unfortunately, so I can't give you a precise
probability).
Excel can use normal
distributions, though. There's a 4.5% chance that a
.617 team could play .625 ball or better over 10,582
games.
There's a freeware Excel Add-In (or
file, I forget which), which indeed can handle Binomials
up to very large numbers. I use it often and it is
invaluable.
|
> rate this
topic: (5 is best)
|