🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits #13267

tiangolo · 2025-01-28T19:30:58Z

🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits

FastAPI People Experts have not been updated in a few months because the script was hitting GitHub API rate limits.

The first step was to separate the logic into different scripts, I started with that before, so now contributors, reviewers, translators, and sponsors are computed in different scripts and GitHub Actions.

Then I was still hitting rate limits, I was thinking I would need to store the data in an external DB and update it gradually and continuously to avoid the rate limits, that would have been a big side project just to get that working. 😅

Then with this, I played a bit with different ways to get the data, how to limit the results, and also a bit to analyze the distribution of results to optimize the queries.

I discovered that (it seems) GitHub rate limits the GraphQL API based on an estimate of how much it would take to compute the queries in a worst-case scenario, not what they actually take to compute. So, using a query that would get 100 comments per discussion, I was consuming the rate limit per hour quickly. Nevertheless, there are only a handful of discussions that have more than 50 comments. When I updated the query to fetch only the 50 comments per discussion, that solved the rate limit. That's how I concluded that GitHub was assigning the rate limit "points" based on what would be a worst-case cost of computing each query before executing the query, so I was charged rate-limit-points before they were used.

And all this is just to say that the main thing that solved the problem was to set the query to fetch the first 50 comments instead of the first 100, even though most discussions have less than 50.

I also had a temporary snippet to do some quick stats of the fetched data and calculate how many had how many comments:

    max_comments = 0
    max_replies = 0
    max_comments_discussion = None
    max_replies_discussion = None
    comments_counter = Counter[int]()
    comments_by_counts = defaultdict(list)
    replies_counter = Counter[int]()
    replies_by_counts = defaultdict(list)
    for discussion in discussion_nodes:
        comments_counter[discussion.comments.totalCount] += 1
        comments_by_counts[discussion.comments.totalCount].append(discussion)
        if discussion.comments.totalCount > max_comments:
            max_comments = discussion.comments.totalCount
            max_comments_discussion = discussion
        for comment in discussion.comments.nodes:
            replies_counter[comment.replies.totalCount] += 1
            replies_by_counts[comment.replies.totalCount].append(comment)
            if comment.replies.totalCount > max_replies:
                max_replies = comment.replies.totalCount
                max_replies_discussion = discussion

This was run in an interactive window, so it's not even a full script.

The results as of the moment of making this PR:

comments_counter == Counter({1: 1272,
         2: 851,
         3: 640,
         4: 463,
         0: 461,
         5: 346,
         6: 251,
         7: 173,
         8: 129,
         9: 120,
         10: 86,
         11: 53,
         12: 50,
         13: 45,
         14: 40,
         18: 26,
         15: 25,
         16: 21,
         17: 19,
         21: 19,
         19: 15,
         20: 13,
         24: 8,
         23: 7,
         22: 7,
         25: 6,
         29: 4,
         28: 4,
         31: 3,
         34: 3,
         27: 3,
         33: 3,
         43: 2,
         40: 2,
         42: 2,
         30: 2,
         105: 1,
         46: 1,
         36: 1,
         26: 1,
         64: 1,
         53: 1,
         39: 1,
         35: 1,
         55: 1,
         98: 1,
         66: 1,
         37: 1,
         77: 1,
         52: 1})

and:

replies_counter == Counter({0: 19342,
         1: 991,
         2: 332,
         3: 201,
         4: 101,
         5: 65,
         6: 45,
         7: 19,
         8: 16,
         9: 16,
         10: 4,
         13: 4,
         12: 3,
         15: 2,
         22: 1,
         18: 1,
         14: 1})

…tching to handle rate limits

github-actions · 2025-01-28T19:36:58Z

📝 Docs preview for commit a80f26e at: https://bb5438ad.fastapitiangolo.pages.dev

Modified Pages

https://bb5438ad.fastapitiangolo.pages.dev/fastapi-people/ - (before)

github-actions · 2025-01-28T19:51:02Z

📝 Docs preview for commit 38dd04a at: https://13576a7f.fastapitiangolo.pages.dev

Modified Pages

https://13576a7f.fastapitiangolo.pages.dev/fastapi-people/ - (before)

github-actions · 2025-01-28T20:14:02Z

📝 Docs preview for commit aa04a8b at: https://a0e793f6.fastapitiangolo.pages.dev

Modified Pages

https://a0e793f6.fastapitiangolo.pages.dev/fastapi-people/ - (before)

github-actions · 2025-01-28T20:31:30Z

📝 Docs preview for commit de669af at: https://60cdb79f.fastapitiangolo.pages.dev

Modified Pages

https://60cdb79f.fastapitiangolo.pages.dev/fastapi-people/ - (before)

tiangolo added 3 commits January 28, 2025 19:20

🔨 Update FastAPI People Experts script, refactor and optimize data fe…

4196b77

…tching to handle rate limits

📝 Update FastAPI People Docs to handle skip users

46aadee

👷 Refactor FastAPI People Experts CI

a80f26e

github-actions bot added the docs Documentation about how to use FastAPI label Jan 28, 2025

tiangolo added 2 commits January 28, 2025 19:43

👷 Update CI, fix correct script to run

ddcac55

👷 Tweak sleep to handle secondary rate limit

38dd04a

👷 Tweak wait for secondary rate limit

aa04a8b

tiangolo added 2 commits January 28, 2025 20:23

🔥 Remove debugging code

60146b0

🔥 Remove old action

de669af

tiangolo added internal and removed docs Documentation about how to use FastAPI labels Jan 28, 2025

tiangolo marked this pull request as ready for review January 28, 2025 20:33

tiangolo merged commit ff68d08 into master Jan 28, 2025
56 checks passed

tiangolo deleted the people branch January 28, 2025 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits #13267

🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits #13267

🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits #13267

🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits #13267

Conversation

Modified Pages

Modified Pages

Modified Pages

Modified Pages