back to Main
FanHome > Baseball > Strategy and Sabermetrics
How To Calculate Runs/Inning distribution


post a new reply post a new topic
  author topic   this topic is 3 pages long:    1   2   3  
tangotiger posted July 25th, 2001 05:00 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Given that a team scores .500 runs/inning, how often will they score 0,1,2 runs, etc?

Unlike hockey, baseball does not follow a poisson distribution.

Anyway, here's how you can figure it:
RI = runs/inning = .500
a = RI*RI*.73 = .183
f0 = RI/(RI+a) = .733 = frequency of innings with 0 runs
dropRate = 1 - .73*f0 = .465 = f2/f1 = f3/f2 = f4/f3 ...

fo + f1 + f2 + ... + f20 = 1.00
solve for f1
(alot easier to do with Excel, and in this case it equals .143)

Anyway, here's the breakdown for the .500 example:
0 73.26%
1 14.30%
2 6.65%
3 3.09%
4 1.44%
5 0.67%
6 0.31%
7 0.14%
8 0.07%
9 0.03%
10 0.01%
11 0.01%

If you multiply the numbers out, you get exactly .500. Now, I've tested this from .200 to 1.000, and it works perfectly. I don't know why mathematically this works out, but maybe I've stumbled upon the Tango Distribution?


Warren posted July 25th, 2001 05:17 PM find more posts by Warren    edit/delete message   reply w/ quote
Senior Member
Member Since: Dec 1999
Location:

Keith Woolner looked at this a couple of years ago, if you're interested...

http://www.baseballprospectus.com/news/20000304woolner.html


CRS posted July 25th, 2001 06:09 PM find more posts by CRS    edit/delete message   reply w/ quote
Senior Member
Member Since: Feb 2000
Location:

How about something like a teams record in 1-run games given their RS/g and RA/g?

Extra credit if you can figure out the likelihood of each element of the score-matrix.

I don't know the answer, I just think that would be an interesting question and you seem good with numbers.


tangotiger posted July 26th, 2001 10:07 AM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Yes, I think that was one of Keith's best work. No offense to him, but I think my formula is a little easier to remember and handle.

(I based my probability of 0 runs/inning based on my sim, and so my formula matches that. However, seeing that he has actual data, I will modify slightly my formula to match his data.)

I was hoping that the math majors out there would have saved me the trouble of turning r/inn into r/gp. Any takers?

From that point, it is a simple matter to get a win% probability matrix of rs/gp vs ra/gp.


Patriot posted July 31st, 2001 09:41 AM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

Another extremely useful tool from Tango! I have fooled around with this...please tell me if I'm wrong Tango, but based on your example it seems as if:
f1 = (1-f0)*(1-Droprate)

If this is correct, than this little test I did has some basis. James printed 1982 AL and 1983 NL data for x runs in innings. I think Jarvis also has some available. Anyway, using this I checked the predicted probability of scoring v. actual, and the predicted probability of a big inning(James defined as >=3 runs). Anyway, it SEEMED as if the probability of scoring was a little low, and the probability of a big inning was a little high. Please note the word seemed as I didn't do a real careful analysis of it. Also, I didn't have acutal innings batted so I used G/9 to get RI which could hurt the accuracy.

Also, Tango, what is the signifcance of the .73 multiplier?

Anyway, great work.


bamadan posted July 31st, 2001 11:04 AM find more posts by bamadan    edit/delete message   reply w/ quote
All Star
Member Since: Aug 1999
Location:

It has been posited by Bill James and others that assuming two teams of equal gross runs scored, the team that scores runs 1 at a time rather than in bunches will have a substantial benefit. As such, one run strategies such as the sacrifice or steal may be less negative (or somewhat positive at times) than previously thought.

Tango's theoretical work above, apparently assumes random distribution of runs. Woolner's empirical data isn't broken down into Mauchian and Weaverite strategies. Does anyone know of the availability of a study showing distribution of runs as impacted by strategic decisions? Is the Bill James scenario a meaningful construct or a hypothetical which exists only in computer modeling?

As an aside, Jim Baker has a daily e-mail newsletter which often contains sabermetric commentary of that day's games. The following piece on run distribution is lifted from the 7/30 edition:

quote:

ALL RUNS ARE NOT CREATED EQUALLY
a guest column by Jeff Fogle

Hi folks.

With the dearth of games today, Jim thought it might be fun to make public a topic that he and I have been discussing on the phone and in emails the past few weeks. On my end, it has gotten to the point that I see almost any baseball issue through this particular prism. It has changed my view of things to a surprising degree.

We start with a premise put forth by Bill James in one of the old Abstracts, and add in a semi-related corollary from an article on bunting in the BILL JAMES GUIDE TO BASEBALL MANAGERS, (pages 132-133 specifically for those of you with the book).

Part 1: Each run has value, but runs one through five in a game are the most important and each run thereafter has diminishing value. In other words, all runs are important in a 4-3 game, but some are superfluous in an 8-3 game. It
has been more than a decade since I read the exact article, but a general point was made that a team scoring 5-5-5-5-5 each day would be a lot more successful than one scoring 10-0-10-0-10-0. They would have the same average but the first would win well over half their games as long as they had an
adequate pitching staff, while the other would fail to even reach .500 because every shutout is a loss.

Part 2: From the bunting article: Runs scored one at a time are more valuable than runs scored in bunches. Because pinpointing a way to measure the issue exactly is difficult, the article put forth the notion that runs scored one at a time could be worth as much as 11-24%, or even 50% more than runs scored in bunches. James then explained a study he ran which eventually concluded
that a team averaging 4.5 rpg scoring one at a time would go 90-72 in a full season of games facing a team that scored 4.5 rpg in three run bursts. Same scoring average, but a clear head to head edge.

Those points offer the basic backdrop for the following issues:

*THE SEATTLE MARINERS FIRST HALF EXPLOSION: The Mariners had an unbelievable first few months. And, the key was their ability to score consistently. Up through July 3 (their first 82 games), they reached five runs in 72% of the time. While most of baseball was struggling to score with the new strike zone and fighting some cool weather in the Northeast and Midwest, Seattle was on fire. No team could truly go 5-5-5-5-5, of course. But Seattle got close in the first half of the season, and proved that such consistency equals dramatic success.

*IS ICHIRO SUZUKI AN MVP CANDIDATE?: A more appropriate question back when he was hitting .350 (before he either did or did not hit a wall). But, at the time of the amazing Seattle run, he was arguably the lynch pin to their
consistency. Around the All-Star break he was the team leader in what you could call OFFENSIVE BASES, which is the sum of Total Bases, Walks, and Stolen Bases. (Those of you in positions of influence, please feel free to push this stat because it is amazing how it really emphasizes the difference between hitters and non-hitters). The team race was close. Because Seattle features a four- to five-horse merry-go-round offense (with big contributions from Bret Boone, Edgar Martinez, and Jon Olerud too). But, as the lead horse of the most amazing offensive run in years, Ichiro certainly could have -- at that time -- been considered an MVP candidate.

*WHY DO SUPERSTAR SLUGGERS FAIL TO MAKE MORE OF AN IMPACT AFTER MARQUEE TRADES? Because they do things in bunches, and runs scored in bunches are less valuable than runs scored one at a time. At the team level, 5-5-5-5-5 is
better than 10-0-10-0-10-0. The daily production of the superstar sluggers looks more like the latter. Imagine offensive bases here instead of runs. Maybe the daily offensive bases category looks something like 0-6-0-6-0-6
over the long haul. Another vote for Ichiro would be that he was doing something more like 3-3-3-3-3-3 during the first half of the season. So, his consistent production helped Seattle get to five runs every night, while the
guys who are most often considered for MVP are pushing their teams closer to the 10-0-10-0 framework by the very nature of their production.

*WHY IS THERE SO MUCH COMPETITIVE BALANCE IN BASEBALL? Because most teams are built identically, with their version of the slugger in the heart of the lineup. Those guys produce in random bursts so the teams are scoring runs in random bursts making it hard to careen out of the .420 to .580 range. Nobody is going to really go 10-0-10-0-10-0, but there are a lot of 3-7-3-7-3-7 type teams. And shuttling around the sluggers fails to change that, no matter how much you pay the guy.

...

Okay, this is becoming a novel. The idea I hope you will consider is that CONSISTENCY should be the goal of offenses. And the marquee players who are generally considered to be the offensive stars are ironically
anti-consistency, because they produce in fits and starts. This locks their team on day-to-day roller coasters that actually prevent the teams from reaching greatness.

This is why sluggers only end up being worth a few games a year. And why a healthy Seattle found a relatively slugger-less synergy that saw them standing at 61-21 on July 4th.



Not that I agree with everything the author posits (OFFENSIVE BASES without context of outs consumed is, at best, a half arsed stat) but interesting none-the-less.


tangotiger posted July 31st, 2001 11:44 AM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Pat,

f1 = (1-f0)*(1-Droprate)

I didn't even realize that it worked out that way! I'll have to look into this.

As for the .73 thing, it's trial-and-error. I "know" the chances that an inning should have 0 runs, and started off with that. I "realized" the relationship between R/I and % of innings with 0 runs followed the relationship I mentioned.

The dropoff rate was also another thing that I noticed from looking at high and low scoring teams (based on my sim).

Just trial-and-error, really....

I'll update it soon based on Woolner's "actual" rates.


tangotiger posted July 31st, 2001 01:06 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Thanks to Pat for that bit of insight, here is a functional version of the runs/inn calculator (you'll need Excel):

http://www.geocities.com/tmasc/runsinn.xls

I've ran tests against Woolner's data. That "control" value of .74 that I use also changes slightly based on run environment. Can be anywhere from .73 to .78 based on run environment. It does NOT alter the results that much in any case. I suggest that you use .74, and you'll get close enough data.

If someone can explain the mathematics of why it all adds up, I'd be thankful.

In any case, all you have to do is edit cell E2, and you;ll get your distribution.

In my next release, I'll update it so that it matches exactly with real-life, including a dynamic value for the "control" value.


tangotiger posted July 31st, 2001 08:21 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Ok, I input all of Woolner's real-life data, which is 20 YEARS worth of R/I data.

The best "control" value to use is .761. If anyone downloaded my file, make sure that you use that value.

Using the data that Woolner had, for the 3.0-3.5, 3.5-4.0, etc "classes", this is the "control" value to use:
.768
.772
.769
.757
.763
.751
.764

Overall, this works out to .761.

Here is how it works out for the entire group:
R/I Actual Expected
0 73.05% 73.06%
1 14.81% 14.99%
2 6.76% 6.65%
3 3.05% 2.95%
4 1.37% 1.31%
5 0.57% 0.58%
6 0.24% 0.26%
7 0.09% 0.11%
8 0.04% 0.05%
9 0.02% 0.02%
10 0.01% 0.01%

Here's how Keith Woolner and I see the 3.0-3.5 class of teams:

R/I Actual Tango Woolner

0 78.13% 78.28% 79.00%
1 13.07% 12.95% 13.20%
2 5.22% 5.23% 4.90%
3 2.13% 2.11% 1.80%
4 0.93% 0.85% 0.70%
5 0.30% 0.34% 0.30%
6 0.12% 0.14% 0.10%
7 0.07% 0.06% 0.00%
8 0.01% 0.02% 0.00%

How about the 4.5-5.0 class?

R/I Actual Tango Woolner
0 71.95% 71.85% 72.70%
1 14.99% 15.40% 14.90%
2 7.14% 6.97% 6.80%
3 3.31% 3.16% 3.10%
4 1.51% 1.43% 1.40%
5 0.65% 0.65% 0.60%
6 0.27% 0.29% 0.30%
7 0.10% 0.13% 0.10%
8 0.05% 0.06% 0.10%


Finally, the 5.5-6.0 class?

R/I Actual Tango Woolner
0 66.29% 66.39% 66.80%
1 16.29% 16.99% 16.50%
2 8.71% 8.40% 8.30%
3 4.85% 4.15% 4.20%
4 2.24% 2.05% 2.10%
5 0.86% 1.02% 1.10%
6 0.49% 0.50% 0.50%
7 0.20% 0.25% 0.30%
8 0.08% 0.12% 0.10%

Except for the 1 run estimation, practically a clean sweep for my formula.







tangotiger posted August 2nd, 2001 12:35 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

I am in the middle of writing a program that will accept two inputs: runs per inning of away team and home team. Based on the run distribution formula I have presented, it will then run through a million simulations to calculate the win% for every pair of team-inputs.

(This simulator will be much more accurate than my other one, because I am already starting off with the run distribution by inning. It will also be alot faster because I have less things to "simulate".)

I will make the executable available on my website sometime within the next few days. This should answer, once and for all, the relationship between runs and wins.


David Smyth posted August 2nd, 2001 09:19 PM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

What about different types of offenses? What would the distribution by inning be for am 80's Cardinal team, averaging .5 run per inning with tons of speed, meager power, and decent OBAs, vs a slow power team which also averages .5 runs per inning?


tangotiger posted August 3rd, 2001 01:02 AM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Now you're asking for a lot!

I'll come up with different run distribution curves. You input the "control value" as well. This lets you shape the curve any way you like..... I doubt there'd be much if any difference.


Patriot posted August 8th, 2001 09:39 AM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

Tango's formula is great, but I was playing around with regression to find the optimum formula for X in the form a*RI^b, which is what Tango used. I got .641(RI)^1.733. Now this is a lot harder to do than .73(RI)^2, but it is 71% more accurate with Woolner's data(RMSE). If you are using a spreadsheet anyway, you might as well use a more accurate formula. Of course, it is a small test, maybe it wouldn't do as well without the data it was tested on. Anyway, now I am having trouble improving on the drop rate, as my regression caused the predicted RI to stop adding up to the actual RI.


tangotiger posted August 8th, 2001 09:40 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Pat,glad you are enjoying that formula.

I have to caution you that any "improvement" must hold to the basic principle that your "input" of r/inn MUST match the output of r/inn AND the total freq add up to 1.

Having said taht, how can your formula be 71% more accurate. Both mine and Woolner are already pretty darn accurate (90 or 95%). Any alterations would only be 1 or 2 % more accurate.

If you have not noticed, Woolner's classes do not average out exactly. For example, his 4.5-5.0 run class is not 4.75 rpg. You actually have to figure it out exactly to get the rate. That's what I had to do.

In any case, using the chart I listed, add a column for your rates, and let's see how they compare. As well, my control value is .76 to match Woolner's reallife data.

But before you do, make sure that your formula is mathematically sound.


Patriot posted August 9th, 2001 09:45 AM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

Thanks, Tango. I didn't catch that part that that wasn't the exact average in Woolner's chart. Reading it again now(Woolner's article) I see that that is the case. That basically makes what I just posted worthless. Anyway, by 71% I meant that the RMSE of yor formula for f0 was .0203 and mine was .0119, which is 71% lower. Anyway, that is ruined by the fact that it is not based on actual RI.

You will notice that in the last sentance of my last post I mention that RI didn't sum to 1 which messed it up. Anyway, I will need to do some stuff over again. But I think that the X=.73RI^2 formula can be approved upon, as it is unlikely that the actual relationship involves the square of RI. But I'm not sure it is worth our time to pursue this any further; your's and Woolner's seem to work fine.


Patriot posted August 10th, 2001 04:01 PM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

Score one for Tango. I went back and figured what the actual RI for Woolner's groups were, and the X equation came out as .746(RI)^1.951, very close to Tango's original .73(RI)^2. Close enough that simiplicity is to be preferred to accuracy(which has only been tested with 6 data points, Tango's could be more accurate anyway). A Drop Rate of 1-(.848-.208*RI+.109*RI^2)*f0 gives percentages that add up to 100 and pretty much right on recalculated RI(.803 for a .8 team for example). But Tango's is right on the money. So it isn't worth the extra time and slight loss in theoretical accuracy.


tangotiger posted August 10th, 2001 04:16 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Pat, thanks for the "peer review". One thing that is missing in our "business" is such a thing. This is why I think DIPS and BaseRuns have great potential. I've tested them in my own way, independent of Voros and David, and have come out supremely impressed.


tangotiger posted August 11th, 2001 12:39 AM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

I wrote a little program to give out the r/game distribution based on my r/inn distribution. I then ran this for 2.5, 3.0, 3.5... 7.5 r/gp.

I have another program that take the r/gp distribution, and converts that into win%.

(I'll publish all this data in a post tomorrow.)

Anyway, the most accurate formula in determining the win% based on rs and ra is to take 70% of pythagorean (with a power of 1.8), and 30% of runs/win of 11.

Also, if you are looking for a good short-hand of runs/win, the following worked out pretty good:

0.75 * rEnv + 4 (where rEnv is the TOTAL runs/gp of both teams). So, scoring 5 runs, and allowing 3 would yield a runs/win of 10.

Therefore, if you want to convert marginal runs (LWTS) into marginal wins, this is how you can do it. The rEnv is the AFTER-THE-FACT runs. Therefore, if you are worth +1 r/gp, then the rEnv will be 1 run higher than average.


Patriot posted August 11th, 2001 11:25 AM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

Tango's upcoming work looks great because it appears that you can figure the near-exact probability of winning. I have something that's interesting IMO, but not as useful.

Using Bill James' Log5 method, I made a spreadsheet to figure out the expected outcome when a defense that allows 3 r/g plays an offense that scores 6 r/g in a league that scores 4.5 r/g. You input RI for the offense, defense, and league, and it spits out the new scoring breakdown(%time scoring 0, 1, 2, 3, etc. and r/g).

A slight problem is that the percentages in the recalculated Tango distribution don't always add to 1, but they are usually very close(I may solve this problem by rescaling them). I can send the spreadsheet to anyone who's interested; patriot@csuvikings.com

Anyway, for the example above(a meeting of a 6 offense and 3 defense in a 4.5 league) the offense should score about 3.88 r/g(again the percentages don't quite add up, only to .992 in this case, so the result would be a smidge higher). What about the flip side, a 3 offense versus 6 defense? Same thing, 3.88. What about a 5 offense verse a 4 defense? 4.43 I find this stuff fascinating, now if I could only find an application for it


tangotiger posted August 11th, 2001 05:35 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

http://www.geocities.com/tmasc/winsrpg.txt

This will give you the expected win% of any 2 teams. A 2.40 RPG team v a 1.20 RPG team? Expected win % is 71.0

I used my r/i distribution, ran this through 10,000 simulations to generate a r/g distribution. Then with all these r/g distribution for all the teams in the 0.10 to 10.00 RPG, I generate the win %. All of these (except for the simulator, but that is close enough) is mathematically modeled.


tangotiger posted August 11th, 2001 05:58 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

As well, if you have 2 teams each scoring 4.5 rpg, but the first team scores more 3+ runs, while the second team scores more 1-2 runs, who will win more? The second team will win 51% of the games. It's not that big of a deal, but for anyone who wanted to know...


David Smyth posted August 12th, 2001 09:18 AM find more posts by David Smyth    edit/delete message   reply w/ quote
All Star
Member Since: Dec 1999
Location: Lake Vostok

Tango, I like the Win% chart. Good work!

Do you know if it's more accurate than Woolner's "custom" Pythag. exponent method?

exponent = 1.5*(log total R/G) + .45


tangotiger posted August 12th, 2001 02:59 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

My chart is more accurate, because it is mathematically-based. You can consider that chart to be "true". Woolner's custom method would be considered a short-hand along the lines of your BsR being a short-hand to a simulator/real-life.

I have not looked into whether his custom-pythag matches to my chart, though I wouldn't be surprised if it did. When I get a chance, I'll see how it actually does.

As a side-note, I am surprised how little discussion your BsR has generated. I think it's great and tremendous work, but only you, me, and Patriot really think so.


Patriot posted August 13th, 2001 12:04 PM find more posts by Patriot    edit/delete message   reply w/ quote
All Star
Member Since: Jul 2000
Location: Ohio

Speaking of BsR, I hope David doesn't care, but I wrote a little bit about BsR for By the Numbers, the SABR Stat Anal Committee newsletter. Proper credit is given to David of course, with links to his article on Fraser's site and an invitation to check out the discussions about it here. I termed it a "review"; I just gave some examples of extreme teams and how BsR did better than RC or XR. I hope that it will help David's work get more attention from the sabermetric community, acutally I don't know if it will be published yet but I think it will be in the August edition. If anyone wants to read it, I will gladly send it to you.

Tango, just out of curiosity's sake, since David has not had exposure from the mainstream sabermetric sites(BPx2, Neyer, etc.), who would see it? There's only like 10 regulars on this board, so it's not really a surprise that no one knows about BsR.


tangotiger posted August 13th, 2001 01:04 PM find more posts by tangotiger    edit/delete message   reply w/ quote
All Star
Member Since: May 2000
Location:

Pat, in your article, I hope you also showed how well BsR did against the simulator, since that is the true validator.

As for who would see it, I would have hoped that other researchers also pop into this site every now and then, and take the cause.


> rate this topic: 1: Worst 5: Best (5 is best)
 this topic is 3 pages long:    1   2   3   
Forum Rules:
Please read and follow our Community Standards.
You may use HTML, FanHome code or Smilies to format your posts.

post a new reply post a new topic
>show printable  >e-mail page to a friend
>back to top of page

admin options:
>open / close topic
>move thread
>delete topic
>edit topic


help>  about>  advertise>  affiliate>  contact us>  site map>

Copyright ©1999-2001, FanHome.com LLC. All rights reserved. Terms of Use and Privacy Policy.
FanHome, the FanHome logo, and 'Where Fans Connect' are service marks of FanHome.com LLC.