8000 Clarification on Task 1 Zero-Shot Batch Size and Memory Usage · Issue #9 · kundajelab/DART-Eval · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Clarification on Task 1 Zero-Shot Batch Size and Memory Usage #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
HaniyehBarghi opened this issue Apr 1, 2025 · 3 comments
Open

Comments

@HaniyehBarghi
Copy link

Hi,

I'm running Task 1 zero-shot evaluation on an NVIDIA A100 (80 GB VRAM) using the default batch_size=1024 in gena_lm.py and encountering CUDA OOM errors. Could you please clarify what batch sizes you used for this task? If 1024 was used, how did you avoid running into memory issues on similar hardware?

Thanks for your help!

@austintwang
Copy link
Member

Hi!

We did run our evaluations on similar hardware using the parameters we specified in the code. However, we noticed that GENA-LM's tokenizer has a tendency to produce large variations in sequence lengths, resulting in significant fluctuations in memory usage from batch to batch.

I'd expect that decreasing batch size would mitigate this, but let us know if issues persist.

@HaniyehBarghi
Copy link
Author

Hi again,

reducing the batch size does indeed prevent the OOM errors, but the evaluation procedure is very slow. The estimation time this specific task is about 16 days to finish on one A100.

Did you observe a similar runtime in your experiments, or did you apply any additional optimizations to speed things up?

@austintwang
Copy link
Member

16 days does seem like a long time. What batch size are you using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0