8000 code review WIP by blefaudeux · Pull Request #120 · Photoroom/datago · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

code review WIP #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 26, 2025
Merged

code review WIP #120

merged 1 commit into from
May 26, 2025

Conversation

blefaudeux
Copy link
Contributor
@blefaudeux blefaudeux commented May 26, 2025

storing ongoing work / following up to code review

@blefaudeux blefaudeux marked this pull request as draft May 26, 2025 20:28

// If we have enough tasks, we'll wait for the older one to finish
if tasks.len() >= max_tasks && consume_oldest_task(&mut tasks).await.is_ok() {
if tasks.len() >= max_tasks && tasks.join_next().await.unwrap().is_ok() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually better I believe @photoroman, because tasks have uneven duration and waiting for the first one which is done will be faster (on average) than waiting for the older one.
Moving all the workers to this, also removes this dedicated function shared across the files, which was not very clean, I agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@blefaudeux blefaudeux marked this pull request as ready for review May 26, 2025 20:35
@blefaudeux blefaudeux merged commit 81ebfaf into ben/webdataset May 26, 2025
blefaudeux added a commit that referenced this pull request Jun 2, 2025
* Better error messages on http path

- async tarball pull, but behavior is clunky
- general arch could be simpler and using tokio more
- handling jpg/png/jpeg/cls/txt/json types
- some shuffling handling

missing unit tests, and better behavior, doing pauses at the moment

better documentation

big rewrite, nicer and smaller code I believe (#117)

Co-authored-by: Benjamin Lefaudeux <ben@photoroom.com>

Async tarball pull and dispatch

Random_sampling in the config, at least for now. Thanks for the review Roman !

* Code review (#120)

Some missing items (would be good to propagate the archive name for instance), but most fixes should be there

* second round, hoopefully good to go. Perf could probably be improved, competing sample pull

* handling multi image samples (#121)

bugfixing the previous PR, ideally we should unit test more

* final update round

* second review, not perfect but feels like we can land this and carry on

---------

Co-authored-by: Benjamin Lefaudeux <ben@photoroom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0