back to Main
FanHome > Baseball > Strategy and Sabermetrics
How To Calculate Runs/Inning distribution


post a new reply post a new topic
  author topic   this topic is 3 pages long:    1   2   3  
Warren posted August 13th, 2001 01:51 PM find more posts by Warren    edit/delete message   reply w/ quote
Senior Member
Member Since: Dec 1999
Location:

You can change the Davenport exponent formula (it was a creation of Clay Davenport, and not Keith Woolner, I believe) to match Tango's numbers a bit better. Instead of:

exponent = 1.5*(log total R/G) + .45

you can use:

exponent = 0.93*(log total R/G) + .85

This matches up a lot better with Tango's results. Here's an example:

4 runs scored, 3 runs allowed:

Exponent of 2: 0.640
Exponent of 1.83: .629
Davenport Exponent (1.62 in this case): .621
Revised Exponent (1.57 in this case): .616
Tango's: .607

Now, I believe he generated the constants in the original formula by fitting the equation to team winning percentages (based on runs scored and runs allowed). So in some sense the original is more accurate, in that we *know* if fits actual data.




tangotiger posted August 13th, 2001 05:43 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Interesting.

However, the empirical data at this point now suffers from sample size. No team scores exactly 4 runs and gives up 3 runs. You can find a class of teams that scores 3.8-4.2 etc. And of course, there are some combinations that you won't find anywhere near enough. In any case, you have to take some liberties with the empirical data.

The great thing with Woolner's real-life data of runs/inn distribution is that he used 20 years, plus each game has 18 innings, and therefore you have tons of data.


David Smyth posted August 13th, 2001 07:43 PM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

Tango and Patriot, thanx for the kind words.

IMO, it's Tango whose "stuff" deserves more attention. Hey Tango, how about an article for the Baseball Primer?

One small disagreement. A few posts ago Tango said that you could use his run per win formula to produce marginal wins from lwts (marginal runs). The only problem is that lwts isn't really marginal runs. It's a sort of "compressed" marginal runs. Your formula should work with a full BsR (for pitchers, say) or a lineup-added BsR procedure (for batters). But not with lwts.


tangotiger posted August 13th, 2001 10:18 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

I find it interesting you say that.

"Marginal" refers to "what would the effect be if you add 1 more item". This term would be used in economics and supply and demand. For example, a bottle of beer may cost 1$. But if you buy 10 bottles, you may only want to pay 40 cents for that 11th bottle. If you buy 24 bottles, you may only want to pay 20 cents for that 25th bottle. That is how I am using marginal.

So, for baseball, what is the marginal effect to a team for adding one more single? Well, if the team scores 1000 runs before the single, and scores 1000.47 runs after the single, the marginal effect of that single is .47 runs.

Linear Weights IS marginal runs.

David?


Patriot posted August 14th, 2001 10:59 AM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

Actually Tango, I wrote it before all of the sim stuff on here, so no it is not included. Really, we should have MORE serious sabermetricians on this board. I remember getting on the net 3-4 years ago, and there was always some new interesting material on Stathead or Baseball Stuff or rsbb, and now the only place I go where there is any new research and studies is here. Most of the other sites are just writing about transactions or whatever(BPx2), which is fine because I care about that too, but it's not advancing sabermetrics.


Riverfront76 posted August 14th, 2001 01:39 PM find more posts by Riverfront76    edit/delete message   reply w/ quote
Member
Member Since: Apr 2001
Location: CT

Tango: After looking at your Win% chart based on runs scored/gm, would the next step be to create a 3-dimensional chart?

i.e. Team A scores 4.5 r/g and gives up 4.3 r/g. Facing team B who scores 5.0 r/g and gives up 4.8 r/g, what is each teams win expectation?


tangotiger posted August 14th, 2001 05:09 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Actually, no that would be another chart. Pat talked about this with his spreadsheet, so you would do that as step 1. Then you would take then and use my chart.

I have not looked at Pat's chart, but I know "how" you can determine the r/g. First thing you have to realize is that runs, unlike all other events are not success/non-success. By that I mean you can figure out EXACTLY what a .340 hitter would hit against a .270 hitter, if the league was .250. I've described this method in the past, and the mathematicians tell me it's called log5. In any case you CAN'T use this with runs.

What you have to do is DECONSTRUCT the r/g into the h/hr/bb etc components, and THEN apply the log 5 method. Then you can reconstruct the r/g based on the new components. From there, you can determine the new rs ra for the 2 teams.

From there you can plug into my win % chart.


David Smyth posted August 14th, 2001 08:37 PM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

I was thinking of "marginal" as meaning "above the baseline", as in runs above average (which is what lwts is supposed to be). But as we know, lwts falls short in that regard. Custom lwts is what is correct in that framework. The difference between standard lwts and custom lwts is zilch for an average hitter, but for a Babe Ruth it's about 20%. Because of that discrepancy, if you want to use standard lwts you have to be careful about which R/W construct you choose.

I think part of the problem in what I've been saying on this topic is the use of the term "wins". The adjustment to lwts that I endorse (dividing by the total R/G) does not actually compute wins. It merely adjusts lwts runs for the frequency of runs in the league (to permit out-of-league comparisons). It just so happens that there is a constant relationship between the number of runs in the league and the number of wins in the league (R/W = total R/G). Still, the term "wins" means different things in different contexts and might be the stumbling block here.

Just remember this: using standard lwts results in the most "contracted" (towards the average) run estimate possible. To most properly counteract this, you must use the most "expanded" win factor (away from the average) possible. And that is always the total R/G.

Tango's quick formula for R/W produces 10.75 for a 4.5 R/G context. But if you look at his W% chart, a team which scores 4.6 and allows 4.4 has a W% of .521. .521 means 84.4 wins, which is 3.4 above average. The run difference between these 2 teams in 162 G is 32.4. So the (net) runs per win is 32.4/3.4, which is 9.52. As the run difference between the teams increases (while averaging 4.5 R), the R/W also increases. But it doesn't reach Tango's 10.75 value until somewhere around 5.4 vs 3.6, which is a W% in the upper .600s. So I don't understand what his formula is supposed to be showing. The 4.6-4.4 example is the closest to .500 that Tango's breakdown shows, but as you get even closer to .500 the R/W would go down from 9.52--until it converged to 9.00 (the total R/G in a 4.5 context) at exactly .500.

Why is it important to know the R/W at exactly .500? Because the standard lwts values are only correct at .500 (an average team). Using standard lwts with a non-.500 R/W factor (such as Tango's) is mixing apples and oranges.





Patriot posted August 15th, 2001 09:03 AM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

quote:
originally posted by tangotiger
Actually, no that would be another chart. Pat talked about this with his spreadsheet, so you would do that as step 1. Then you would take then and use my chart.

I have not looked at Pat's chart, but I know "how" you can determine the r/g. First thing you have to realize is that runs, unlike all other events are not success/non-success. By that I mean you can figure out EXACTLY what a .340 hitter would hit against a .270 hitter, if the league was .250. I've described this method in the past, and the mathematicians tell me it's called log5. In any case you CAN'T use this with runs.

What you have to do is DECONSTRUCT the r/g into the h/hr/bb etc components, and THEN apply the log 5 method. Then you can reconstruct the r/g based on the new components. From there, you can determine the new rs ra for the 2 teams.

From there you can plug into my win % chart.



You are correct Tango, in that breaking it down into components would be the way to go. All teams do not match up equally, and so the offensive shape is an important consideration. But you can answer this as a general question(i.e. 5 r/g, no specific team), like Rivefront did, by using Log5 in combination with your method.

What you do is figure each team's prob. of scoring 0 runs and the league prob of scoring 0, and put this in Log5. Do this at every run possibility(0-20 has been the standard all thread), multiply the probability by the number of runs it represents, sum and multiply by 9. That is the expected r/g.

In Riverfront's example, assuming a league average of 4.5, Team A would scored 4.5 runs per game and allow 4.77. Looking at Tango's chart, Team A would have a W% of .474.

The reason this method works for the general case is because we are dealing with probabilities of run scoring. The Log5 method will work with probabilities of any two forces meeting, if I remember correctly, and so it works with f0, f6, etc.
[edited by Patriot on August 15th, 2001 at 09:09 AM]


tangotiger posted August 15th, 2001 09:37 AM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Pat, excellent point!

David, your statement about converging to the 4.5 r/g implying the r/w of 9.0 is intriguing!

However, that being said, I am talking about marginal runs/marginal wins, and therefore there is no linear relationship.

I do not believe that custom LWTS + my chart would yield the same value as standard LWTS + your r/w.



David Smyth posted August 15th, 2001 10:00 AM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

Tango, I'm not claiming that custom lwts + your chart exactly equals standard lwts + my R/W. They should be close, though. As to which is "better", that's a complicated question.

I still would like an answer to my question: what does your 10.75 "quick" R/W represent (for a 4.5 R/G context), or how should it be used? An example would be helpful.


tangotiger posted August 15th, 2001 01:27 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

That should give you the win% chart of mine, similar to that exponent formula proposed earlier.

For example a team of 4rpg v 3rpg gives 7 totals runs, or 7*.75+4=9.18 runs/win.

So, the 1 run difference yields 0.109 wins over .500, or .609 overall.

(Coincidentally, I think this is what my chart shows. This breaks down at the far extremes of course, which is probably where that custom-exponential formula comes into play.)


David Smyth posted August 15th, 2001 09:35 PM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

"So, the 1 run difference yields .109 wins, or .609 overall."

Tango, I frankly have no idea what you're talking about. A yield of .109 wins does not mean that you add .109 to .500 to get .609. Adding .109 wins to .500 means 81.109 wins, which is .5007, not .609.

You really must explain this. Either you're making a silly mistake, or I don't understand what's going on. (Probably the latter )

For your example of a 7 R/G context, your formula says 9.18 R/W. Mine says 7 R/W. There is really no discrepancy there since we mean different things with our stats. But Palmer's and James' R/W would be around 7.8 R/W, and I believe that their R/W and yours are supposed to be the same thing. Why is yours so different?



David Smyth posted August 16th, 2001 07:07 AM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

OK, I seeit now. It's just a matter of orientation. You are using marginal runs per game. I'm used to looking at these things on a per season basis, so it's 162 runs per season you're talking about.

Gotta go. I'll be back later.


tangotiger posted August 16th, 2001 09:27 AM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Yes everything is on a per-game basis. Win% of .609 is of course .609 wins per game, etc, etc.



Patriot posted August 16th, 2001 10:13 AM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

If you use Tango's RPW formula and apply it to 1961-2000 teams, the RMSE of predicted wins is 4.15. Using a constant RPW of 9.35 gives 4.08. This is puzzling, the formula that uses a custom RPW should preform better than the fixed method. I can get 4.05 from a RPW of 1/(.158-.06logRPG). So I don't think Tango's formula for RPW is very reliable.


tangotiger posted August 16th, 2001 12:37 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

If you look at the 67 teams that had a run differential of at least +150 runs, you get the following numbers:
5.04 rs/g, 3.83ra/g, .625 winning record.

Against my chart though, a team scoring 5.0 and allowing 3.8 has a record of .611

Using my r/w formula, you get 0.75*(5.04+3.83)+4=10.65

Therefore, you get a record of (5.04-3.83)/10.65 + .500=.614

So, my formula matches pretty well with the "expected" winning record.

The problem therefore lies in why does a team win more than their scoring distribution says it should? Now, either I have a problem with my program and maybe I should increase my sample size in the sim (though I doubt that's the problem), or that I cannot extrapolate the r/i directly into r/g.

================
Note: I have re-run the r/i to r/g program, but this time to create 100,000 games as opposed to 10,000 for each event. The win% for a 5.0/3.8 competition is .617. Can someone tell me how probable it is that a .617 team will play .625 over 10,582 games? If it is improbable, then it is possible you have in-game strategies that causes these .617 teams to play at .625
[edited by tangotiger on August 16th, 2001 at 12:56 PM]


Vinay posted August 16th, 2001 04:15 PM find more posts by Vinay    edit/delete message   reply w/ quote
Member
Member Since: Nov 2000
Location:

quote:
originally posted by tangotiger
Can someone tell me how probable it is that a .617 team will play .625 over 10,582 games? If it is improbable, then it is possible you have in-game strategies that causes these .617 teams to play at .625



It's fairly probable. I get the std. dev. of a .617 team over 10,582 games to be 50 games. (.625-.617)*10,582 is 84 games. So that's within two standard deviations, which is the commonly used interval for "statistically insignificant" (there's roughly a 67% chance of finishing within one SD, and 95% chance of finishing within two SDs).

(Excel can't handle binomial distributions with that many trials, unfortunately, so I can't give you a precise probability).


Vinay posted August 16th, 2001 04:19 PM find more posts by Vinay    edit/delete message   reply w/ quote
Member
Member Since: Nov 2000
Location:

quote:
originally posted by Vinay
(Excel can't handle binomial distributions with that many trials, unfortunately, so I can't give you a precise probability).


Excel can use normal distributions, though. There's a 4.5% chance that a .617 team could play .625 ball or better over 10,582 games.


tangotiger posted August 16th, 2001 06:06 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

I always had this question about probabilities. It's always ".625 OR BETTER". That really means an AVERAGE of .632 or something.

Since we have 67 teams AVERAGING .625 (say they all finished between .575 and .675), then why do we look for .625 or better?

If you look at the distribution probability of .600-.605, .605-.610, etc, etc, wouldn't we find then that working backwards you have 1 slice at .700-.705, 3 slices at .695-.700, etc down to 1000 slices at .615-.620, etc, etc, so that the WEIGHTED AVERAGE of all these slices is .625? So, really, we want ".615 or better", knowing full well that the weighted average of this class will be .625.

Always bugged me...


David Smyth posted August 16th, 2001 09:10 PM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

Tango, are you gonna redo your W% chart using a larger sample of games. I always thought that 10,000 G might be too low. 50,000 should be fine.


tangotiger posted August 17th, 2001 09:47 AM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Actually, I posted the new chart with 100,000 games.

I think when I get the chance that i'll run 1 million games.


Patriot posted August 18th, 2001 12:47 PM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio
Good pitching beats good hitting?

Using my spreadsheet, I tried putting in a offense that was 50% above league average and defense that was 50% above league average. In a .5 RI league, the RIs were .75 and 1/3 respectively. And the resulting runs scored was just 4.36. I would have bet the farm that the result would be 4.5.

Anyway, it is close enought that it could just be because I'm using the wrong formula for Tango Distribution(the original rather than the revised or something), or did something else wrong, but I am very surprised by the result.



tangotiger posted August 21st, 2001 12:07 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

win% chart has been updated with 1,000,000 games run per pair matchups.

http://www.geocities.com/tmasc/winsrpg.txt



voros posted September 3rd, 2001 01:03 AM find more posts by voros    edit/delete message   reply w/ quote
Senior Member
Member Since: Jan 2001
Location: Chicago, IL

quote:
originally posted by Vinay
quote:
originally posted by Vinay
(Excel can't handle binomial distributions with that many trials, unfortunately, so I can't give you a precise probability).



Excel can use normal distributions, though. There's a 4.5% chance that a .617 team could play .625 ball or better over 10,582 games.


There's a freeware Excel Add-In (or file, I forget which), which indeed can handle Binomials up to very large numbers. I use it often and it is invaluable.


> rate this topic: 1: Worst 5: Best (5 is best)
 this topic is 3 pages long:    1   2   3   
Forum Rules:
Please read and follow our Community Standards.
You may use HTML, FanHome code or Smilies to format your posts.

post a new reply post a new topic
>show printable  >e-mail page to a friend
>back to top of page

admin options:
>open / close topic
>move thread
>delete topic
>edit topic


help>  about>  advertise>  affiliate>  contact us>  site map>

Copyright ©1999-2001, FanHome.com LLC. All rights reserved. Terms of Use and Privacy Policy.
FanHome, the FanHome logo, and 'Where Fans Connect' are service marks of FanHome.com LLC.