Tangotiger Blog

Friday, February 02, 2024

Draft Function

By Tangotiger 11:08 AM

This is a mostly math post, and I'll be using draft data. If you don't care about either, you won't like this post.

I needed some data. It wasn't important for the purpose of this post what that data is, I just needed to convey the general point that the earlier the round the more value. Anyway, so this was total future WAR by draft round. Again, not important whether this is career WAR, or WAR through age 30, or WAR before reaching free agency, or whatnot. Y'all can do that heavy lifting after I go thru what I want to show.

Ok, no surprise in terms of the general shape, but maybe there's surprise in the steepness? I dunno. Anyway, so the objective is to create a function to connect all those points.

What helps is if we turn all those values into a "share" of the total WAR. In this data, we have 5261 total WAR. Players in the first round have a total of 2613 WAR, which conveniently is almost exactly 50%. Round 2 players have 11%, and it goes down from there. The total is obviously 100%. This is how it looks.

We instinctively knew that a 2nd and 3rd round pick is worth less than a 1st and 4th. Given the choice, we'd take 1+4 over 2+3. This is a good example of where 1+4 <> 2+3. You get a similar thing with exit velocity, where 110+60 is worth more than 90+80.

Indeed, given that the 1st round pick has 50% of all the WAR, this chart suggests that 1 = 2+3+4...+19+20. That's right, having a 1st round pick is worth the same as all other 19 picks combined. I'd bet you didn't know that! Well, at least that's what this data is saying. You gotta tease it to figure out what else it might be saying.

Back to math. When I look at this data, the first place I go to is 1/x. So, it's a question of what constant to put in the numerator, and how to represent the denominator. Let's start with a simple function of: 0.278/Round. This is how that looks.

As you could have guessed, that first round is woefully undervalued by our first attempt. 0.278/1 is obviously 27.8%, and we needed to have 50%. In addition, the dropoff just isn't there either.

Let's try another attempt, this time, instead of x = Round, let's make it Round-squared. The numerator in this case is 0.626, so naturally, the 1st pick will come out to 62.6%. So, the 1st round pick should be somewhere between 1/x and 1/x-squared. However. Look at Round 2. In either scheme, the value is above the data.

So, there's something that is still off. We've been treating Round 1 as a value of 1, and Round 2 as a value of 2. But, what if we made Round 1 a value of 0.5 and Round 2 as a value of 1.5. In other words, the scheme would be 1 / (Round - 0.5) . In this case, the numerator is 0.2. This makes Round 1 worth 40% and Round 2 worth 13.3%. You can see how we're on the right track here.

Indeed, our best-fit has the numerator at 0.16 and the denominator as Round - 0.68. That sets Round 1 worth 50%, Round 2 worth 12.1%. This is how the final chart looks.

Given that we've come up with a simple and smooth function, we are now in a position to say how much each Round is worth relative to other Rounds. Round 1 we already knew is worth the same as Round 2 thru 20.

How about Round 2? That's worth about the same as Round 3+4. Or the same as Round 5 thru 8 combined. Or 9 thru 17 combined.

I'd love to see similar charts in the other major sports from the AspiringSaberist.

#1 Alex Boisvert 2024/02/02 (Fri) @ 13:38

Since the values sum to 1, I’m a little surprised you didn’t try fitting a probability distribution. A Zipf distribution (https://en.wikipedia.org/wiki/Zeta_distribution) looks like it might be appropriate.

#2 Tangotiger 2024/02/02 (Fri) @ 14:22

Thank you, never heard of it!

Adapting that scheme, here’s how ZIPf compares to my function and the observed data.

I came up with this ZIPf function:
=1 / ( (Pick-1)*3 + 1) *LN(1.633)

Pick ZIPf Tango Observed
1 49.0% 50.0% 49.9%
2 12.3% 12.1% 12.1%
3 7.0% 6.9% 6.9%
4 4.9% 4.8% 4.8%
5 3.8% 3.7% 3.7%
6 3.1% 3.0% 3.0%
7 2.6% 2.5% 2.5%
8 2.2% 2.2% 2.2%
9 2.0% 1.9% 1.9%
10 1.8% 1.7% 1.7%
11 1.6% 1.6% 1.5%
12 1.4% 1.4% 1.4%
13 1.3% 1.3% 1.3%
14 1.2% 1.2% 1.2%
15 1.1% 1.1% 1.1%
16 1.1% 1.0% 1.0%
17 1.0% 1.0% 1.0%
18 0.9% 0.9% 0.9%
19 0.9% 0.9% 0.9%
20 0.8% 0.8% 0.8%

#3 Tangotiger 2024/02/02 (Fri) @ 15:05

Ok, so updating that ZIPf function comes out to this:

0.5/((Pick-1)*3.135+1)

So, I like that, that I can basically force in the 0.5 for the first pick.

Then just a matter of that constant, 3.135, to get the whole series from 2 to 20 to add up to the remaining 0.5

Anyway, expanding the above, we get:
0.5/(Pick*3.135-2.135)

Divide the numerator and denominator by 3.135 gives us this:
0.16/(Pick - .68)

The Tango version was this:
0.16/(Pick-0.68)

So, I ended up unwittingly matching to ZIPf!

#4 Tangotiger 2024/02/06 (Tue) @ 23:42

Here’s how it looks for the Prospect Rankings:

https://www.mlb.com/news/mlb-pipeline-20-years-of-prospect-rankings