-
Notifications
You must be signed in to change notification settings - Fork 24.2k
Memory leak in multi-thread inference #64412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@mrshenli and I talked about this a little bit and (1) nailing this down to the specific operator that's leaking would be a big help, (2) verifying it doesn't leak on CUDA and (3) we strongly suspect it's MKL/MKLDNN related, because we don't really do heavy thread caching inside the library itself |
Hey @cloudhan I wasn't able to reproduce the leak. My CPU info is: import torch
import psutil
import os
def run():
# input_size * hidden_size >= 2048 cause leak
# also observed extreme performace degeneration
input_size = 16
hidden_size = 128
gru = torch.nn.GRU(input_size, hidden_size)
inputs = torch.randn(1, 1, input_size)
counter = 0
process = psutil.Process(os.getpid())
while True:
counter += 1
_, out = gru(inputs)
if (counter % 100) == 0:
print(f"Round {counter}: memory {process.memory_info().rss // 1000000} MB")
if __name__ == "__main__":
with torch.no_grad():
run() Here is what I observed. And I also tried setting
|
Fair enough, 6138 supports AVX512, so no leak is expected. |
Experiencing similar mem leak with libtorch 1.7. |
@cloudhan I could reproduce the memory leak using above code even with a Xeon 8280 (avx512 supported). Though I still saw some leakage when mkl (linear op) or oneDNN/ideep (conv op) were used. So I'm still checking those. @ezyang I wonder if there's any update on this issue from Pytorch team. Thanks. |
I have similar issue in custom dataloader with threads and image transforms. As long as I move the data out of the thread in numpy array and apply transforms in the main thread it runs fine, when I move transforms to the thread the leak appears and memory usage slowly grows over the epoch leading to OOM (vram or ram, depending if I apply transforms on GPU or CPU).
|
Uh oh!
There was an error while loading. Please reload this page.
Symptom
The memory usage of a process significantly increases when using more threads. It's likely PyTorch or some 3rd-party dependencies created expensive (much more than the ~8MB stack size) thread-local states. See #61920 for more discussion. I tried build without MKLDNN, doesn't help. This issue seems has existed for a long time (https://discuss.pytorch.org/t/heap-size-increase-constantly-when-inference-with-new-thread/57621).
Repro
The following code creates 20 threads and run inference through the same ResNet50 model instance using the same input tensor. The process memory consumption keeps increasing when more threads are used.
The same ResNet50 inference code with a single thread does not hit this issue.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @VitalyFedyunin @ngimel @heitorschueroff
The text was updated successfully, but these errors were encountered: