Clarification on Task 1 Zero-Shot Batch Size and Memory Usage #9

HaniyehBarghi · 2025-04-01T07:38:34Z

Hi,

I'm running Task 1 zero-shot evaluation on an NVIDIA A100 (80 GB VRAM) using the default batch_size=1024 in gena_lm.py and encountering CUDA OOM errors. Could you please clarify what batch sizes you used for this task? If 1024 was used, how did you avoid running into memory issues on similar hardware?

Thanks for your help!

austintwang · 2025-04-08T22:56:36Z

Hi!

We did run our evaluations on similar hardware using the parameters we specified in the code. However, we noticed that GENA-LM's tokenizer has a tendency to produce large variations in sequence lengths, resulting in significant fluctuations in memory usage from batch to batch.

I'd expect that decreasing batch size would mitigate this, but let us know if issues persist.

HaniyehBarghi · 2025-04-18T11:18:36Z

Hi again,

reducing the batch size does indeed prevent the OOM errors, but the evaluation procedure is very slow. The estimation time this specific task is about 16 days to finish on one A100.

Did you observe a similar runtime in your experiments, or did you apply any additional optimizations to speed things up?

austintwang · 5D2D

16 days does seem like a long time. What batch size are you using?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on Task 1 Zero-Shot Batch Size and Memory Usage #9

Clarification on Task 1 Zero-Shot Batch Size and Memory Usage #9

Uh oh!

Uh oh!

Uh oh!

Clarification on Task 1 Zero-Shot Batch Size and Memory Usage #9

Clarification on Task 1 Zero-Shot Batch Size and Memory Usage #9

Comments

Uh oh!

Uh oh!

Uh oh!