[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

<< Back to main

Friday, February 02, 2024

Draft Function

By Tangotiger 11:08 AM

This is a mostly math post, and I'll be using draft data. If you don't care about either, you won't like this post.

I needed some data. It wasn't important for the purpose of this post what that data is, I just needed to convey the general point that the earlier the round the more value. Anyway, so this was total future WAR by draft round. Again, not important whether this is career WAR, or WAR through age 30, or WAR before reaching free agency, or whatnot. Y'all can do that heavy lifting after I go thru what I want to show.

Ok, no surprise in terms of the general shape, but maybe there's surprise in the steepness? I dunno. Anyway, so the objective is to create a function to connect all those points.

What helps is if we turn all those values into a "share" of the total WAR. In this data, we have 5261 total WAR. Players in the first round have a total of 2613 WAR, which conveniently is almost exactly 50%. Round 2 players have 11%, and it goes down from there. The total is obviously 100%. This is how it looks.

We instinctively knew that a 2nd and 3rd round pick is worth less than a 1st and 4th. Given the choice, we'd take 1+4 over 2+3. This is a good example of where 1+4 <> 2+3. You get a similar thing with exit velocity, where 110+60 is worth more than 90+80.

Indeed, given that the 1st round pick has 50% of all the WAR, this chart suggests that 1 = 2+3+4...+19+20. That's right, having a 1st round pick is worth the same as all other 19 picks combined. I'd bet you didn't know that! Well, at least that's what this data is saying. You gotta tease it to figure out what else it might be saying.

Back to math. When I look at this data, the first place I go to is 1/x. So, it's a question of what constant to put in the numerator, and how to represent the denominator. Let's start with a simple function of: 0.278/Round. This is how that looks.

As you could have guessed, that first round is woefully undervalued by our first attempt. 0.278/1 is obviously 27.8%, and we needed to have 50%. In addition, the dropoff just isn't there either.

Let's try another attempt, this time, instead of x = Round, let's make it Round-squared. The numerator in this case is 0.626, so naturally, the 1st pick will come out to 62.6%. So, the 1st round pick should be somewhere between 1/x and 1/x-squared. However. Look at Round 2. In either scheme, the value is above the data.

So, there's something that is still off. We've been treating Round 1 as a value of 1, and Round 2 as a value of 2. But, what if we made Round 1 a value of 0.5 and Round 2 as a value of 1.5. In other words, the scheme would be 1 / (Round - 0.5) . In this case, the numerator is 0.2. This makes Round 1 worth 40% and Round 2 worth 13.3%. You can see how we're on the right track here.


Indeed, our best-fit has the numerator at 0.16 and the denominator as Round - 0.68. That sets Round 1 worth 50%, Round 2 worth 12.1%. This is how the final chart looks.

Given that we've come up with a simple and smooth function, we are now in a position to say how much each Round is worth relative to other Rounds. Round 1 we already knew is worth the same as Round 2 thru 20.

How about Round 2? That's worth about the same as Round 3+4. Or the same as Round 5 thru 8 combined. Or 9 thru 17 combined.

I'd love to see similar charts in the other major sports from the AspiringSaberist.

#1    Alex Boisvert 2024/02/02 (Fri) @ 13:38

Since the values sum to 1, I’m a little surprised you didn’t try fitting a probability distribution. A Zipf distribution (https://en.wikipedia.org/wiki/Zeta_distribution) looks like it might be appropriate.


#2    Tangotiger 2024/02/02 (Fri) @ 14:22

Thank you, never heard of it!

Adapting that scheme, here’s how ZIPf compares to my function and the observed data.

I came up with this ZIPf function:
=1 / ( (Pick-1)*3 + 1) *LN(1.633)

Pick ZIPf Tango Observed
1 49.0% 50.0% 49.9%
2 12.3% 12.1% 12.1%
3 7.0% 6.9% 6.9%
4 4.9% 4.8% 4.8%
5 3.8% 3.7% 3.7%
6 3.1% 3.0% 3.0%
7 2.6% 2.5% 2.5%
8 2.2% 2.2% 2.2%
9 2.0% 1.9% 1.9%
10 1.8% 1.7% 1.7%
11 1.6% 1.6% 1.5%
12 1.4% 1.4% 1.4%
13 1.3% 1.3% 1.3%
14 1.2% 1.2% 1.2%
15 1.1% 1.1% 1.1%
16 1.1% 1.0% 1.0%
17 1.0% 1.0% 1.0%
18 0.9% 0.9% 0.9%
19 0.9% 0.9% 0.9%
20 0.8% 0.8% 0.8%

#3    Tangotiger 2024/02/02 (Fri) @ 15:05

Ok, so updating that ZIPf function comes out to this:

0.5/((Pick-1)*3.135+1)

So, I like that, that I can basically force in the 0.5 for the first pick.

Then just a matter of that constant, 3.135, to get the whole series from 2 to 20 to add up to the remaining 0.5

Anyway, expanding the above, we get:
0.5/(Pick*3.135-2.135)

Divide the numerator and denominator by 3.135 gives us this:
0.16/(Pick - .68)

The Tango version was this:
0.16/(Pick-0.68)

So, I ended up unwittingly matching to ZIPf!

 


#4    Tangotiger 2024/02/06 (Tue) @ 23:42

Here’s how it looks for the Prospect Rankings:

https://www.mlb.com/news/mlb-pipeline-20-years-of-prospect-rankings

The function is this:
0.74 / (Rank + 17)


Click MY ACCOUNT in top right corner to comment

<< Back to main


Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

Aug 20 12:31
How to evaluate HR-saving plays, part 3 of 4: Speed

Aug 17 19:39
Leadoff Walk v Single?

Aug 12 10:22
Walking Aaron Judge with bases empty?

Jul 15 10:56
King Willie is dead.  Long Live King Reid.

Jun 14 10:40
Bias in the x-stats?  Yes!