You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[fix] Filesystem path: hash function not robust enough (#129)
* lucky catch, unit test which was randomly failing -and was right-
* switching to murmurhash
* back to default hash + seed
[perf] faster image processing (#125)
* needs some good benchmarking
* Should be good to go, a bit scary to change a working formula but this is much faster (x2 when image processing is involved)
[feat] Webdataset support (#111)
* Better error messages on http path
- async tarball pull, but behavior is clunky
- general arch could be simpler and using tokio more
- handling jpg/png/jpeg/cls/txt/json types
- some shuffling handling
missing unit tests, and better behavior, doing pauses at the moment
better documentation
big rewrite, nicer and smaller code I believe (#117)
Co-authored-by: Benjamin Lefaudeux <ben@photoroom.com>
Async tarball pull and dispatch
Random_sampling in the config, at least for now. Thanks for the review Roman !
* Code review (#120)
Some missing items (would be good to propagate the archive name for instance), but most fixes should be there
* second round, hoopefully good to go. Perf could probably be improved, competing sample pull
* handling multi image samples (#121)
bugfixing the previous PR, ideally we should unit test more
* final update round
* second review, not perfect but feels like we can land this and carry on
---------
Co-authored-by: Benjamin Lefaudeux <ben@photoroom.com>