8000 ENH: Make `vak.prep.data_dir` option more flexible · Issue #790 · vocalpy/vak · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ENH: Make vak.prep.data_dir option more flexible #790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
henricombrink opened this issue Apr 16, 2025 · 2 comments
Open

ENH: Make vak.prep.data_dir option more flexible #790

henricombrink opened this issue Apr 16, 2025 · 2 comments
Assignees
Labels
ENH: enhancement enhancement; new feature or request

Comments

@henricombrink
Copy link

I have .wav files stored in a deep directory structure. VAK fails to find the files if the directory structure is more than two layers deep. I edited the files.py file at: ....\Lib\site-packages\vak\common\files\files.py which solved the issue for me.

I provide the edited code if anybody else encounters this problem:
files.py.txt

@henricombrink henricombrink added the BUG Something isn't working label Apr 16, 2025
@NickleDave NickleDave changed the title BUG:Data with a deep directory structure not found (suggested fix) ENH: Make vak.prep.data_dir option more flexible Apr 21, 2025
@NickleDave NickleDave self-assigned this Apr 21, 2025
@NickleDave NickleDave added ENH: enhancement enhancement; new feature or request and removed BUG Something isn't working labels Apr 21, 2025
@NickleDave
Copy link
Collaborator
NickleDave commented Apr 21, 2025

Hi @henricombrink sorry this isn't working like you need it to, and thank you for taking the time to share your solution.

This is not a bug per se, it's just the way the code is designed right now.

I agree with you that it would be convenient to have a way to set the data_dir option in the config so that it does a recursive glob.

One of the reasons I have avoided doing so is because I don't want to make it really easy for someone with a very large dataset to accidentally "prep" their entire dataset and fill up their disk with spectrograms, or something like that. (The other reason is just because I have left it in the backlog of features to add 😅)

Instead of recursion by default, here's what I propose doing instead:

  • change the data_dir option in the prep table of the config file so it can work one of three ways
    1. the current way: if it's a string representing a single directory, everything works as it always have
    2. opt-in recursive glob, see allow data_dir to be recursive #282: if the string ends in "/**", this tells vak to do a recursive glob looking in all directories
    3. a list of directory names, see allow data_dir option in PREP section to be a list of directories #79: if data_dir is a list ("array" in TOML config-land), we treat each item as if it's 1 or 2

Please let me know if that sounds like it would address your use case.

If so, I can probably find time to start work on this towards the end of the week.

I need to think about how to actually implement -- once you confirm, I'll stare at the code as it is now and reply to myself with a plan for how to do it. I do appreciate you sharing your fix, but reading through this module again, I'm wondering if there's a way we can get pathlib to do most of the work for us, instead of writing more code to maintain. (IIRC, the code in files.py is written in a specific way to handle weird edge cases like Windows not handling file extensions in a case sensitive way)

@henricombrink
Copy link
Author

@NickleDave

Thank you for the detailed reply.

I also don't see this as a bug, I just included the code here as it might help someone in the same situation. My "fix" as posted above solved my specific problem, so no need to change anything right now.

I think option two above (adding "/**" at the end of the data_dir string) would be the easiest to allow for recursive file discovery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ENH: enhancement enhancement; new feature or request
Projects
None yet
Development

No branches or pull requests

2 participants
0