8000 🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits by tiangolo · Pull Request #13267 · fastapi/fastapi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits #13267

Merged
merged 8 commits into from
Jan 28, 2025

Conversation

tiangolo
Copy link
Member
@tiangolo tiangolo commented Jan 28, 2025

🔨 Update FastAPI People Experts script, refactor and optimize data fetching to handle rate limits

FastAPI People Experts have not been updated in a few months because the script was hitting GitHub API rate limits.

The first step was to separate the logic into different scripts, I started with that before, so now contributors, reviewers, translators, and sponsors are computed in different scripts and GitHub Actions.

Then I was still hitting rate limits, I was thinking I would need to store the data in an external DB and update it gradually and continuously to avoid the rate limits, that would have been a big side project just to get that working. 😅

Then with this, I played a bit with different ways to get the data, how to limit the results, and also a bit to analyze the distribution of results to optimize the queries.

I discovered that (it seems) GitHub rate limits the GraphQL API based on an estimate of how much it would take to compute the queries in a worst-case scenario, not what they actually take to compute. So, using a query that would get 100 comments per discussion, I was consuming the rate limit per hour quickly. Nevertheless, there are only a handful of discussions that have more than 50 comments. When I updated the query to fetch only the 50 comments per discussion, that solved the rate limit. That's how I concluded that GitHub was assigning the rate limit "points" based on what would be a worst-case cost of computing each query before executing the query, so I was charged rate-limit-points before they were used.

And all this is just to say that the main thing that solved the problem was to set the query to fetch the first 50 comments instead of the first 100, even though most discussions have less than 50.

I also had a temporary snippet to do some quick stats of the fetched data and calculate how many had how many comments:

    max_comments = 0
    max_replies = 0
    max_comments_discussion = None
    max_replies_discussion = None
    comments_counter = Counter[int]()
    comments_by_counts = defaultdict(list)
    replies_counter = Counter[int]()
    replies_by_counts = defaultdict(list)
    for discussion in discussion_nodes:
        comments_counter[discussion.comments.totalCount] += 1
        comments_by_counts[discussion.comments.totalCount].append(discussion)
        if discussion.comments.totalCount > max_comments:
            max_comments = discussion.comments.totalCount
            max_comments_discussion = discussion
        for comment in discussion.comments.nodes:
            replies_counter[comment.replies.totalCount] += 1
            replies_by_counts[comment.replies.totalCount].append(comment)
            if comment.replies.totalCount > max_replies:
                max_replies = comment.replies.totalCount
                max_replies_discussion = discussion

This was run in an interactive window, so it's not even a full script.

The results as of the moment of making this PR:

comments_counter == Counter({1: 1272,
         2: 851,
         3: 640,
         4: 463,
         0: 461,
         5: 346,
         6: 251,
         7: 173,
         8: 129,
         9: 120,
         10: 86,
         11: 53,
         12: 50,
         13: 45,
         14: 40,
         18: 26,
         15: 25,
         16: 21,
         17: 19,
         21: 19,
         19: 15,
         20: 13,
         24: 8,
         23: 7,
         22: 7,
         25: 6,
         29: 4,
         28: 4,
         31: 3,
         34: 3,
         27: 3,
         33: 3,
         43: 2,
         40: 2,
         42: 2,
         30: 2,
         105: 1,
         46: 1,
         36: 1,
         26: 1,
         64: 1,
         53: 1,
         39: 1,
         35: 1,
         55: 1,
         98: 1,
         66: 1,
         37: 1,
         77: 1,
         52: 1})

and:

replies_counter == Counter({0: 19342,
         1: 991,
         2: 332,
         3: 201,
         4: 101,
         5: 65,
         6: 45,
         7: 19,
         8: 16,
         9: 16,
         10: 4,
         13: 4,
         12: 3,
         15: 2,
         22: 1,
         18: 1,
         14: 1})

@github-actions github-actions bot added the docs Documentation about how to use FastAPI label Jan 28, 2025
Copy link
Contributor

Copy link
Contributor

Copy link
Contributor

Copy link
Contributor

@tiangolo tiangolo added internal and removed docs Documentation about how to use FastAPI labels Jan 28, 2025
@tiangolo tiangolo marked this pull request as ready for review January 28, 2025 20:33
@tiangolo tiangolo merged commit ff68d08 into master Jan 28, 2025
56 checks passed
@tiangolo tiangolo deleted the people branch January 28, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0