10000 Memory leak in C++ when running module in separate thread · Issue #24237 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Memory leak in C++ when running module in separate thread #24237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Aug 13, 2019 · 18 comments
Closed

Memory leak in C++ when running module in separate thread #24237

ghost opened this issue Aug 13, 2019 · 18 comments
Labels
module: cpp Related to C++ API module: memory usage PyTorch is using more memory than it should, or it is leaking memory module: mkl Related to our MKL support triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ghost
Copy link
ghost commented Aug 13, 2019

🐛 Bug

When calling the forward function of a Module, some memory is allocated that is not de-allocated at the end of the thread.

To Reproduce

Steps to reproduce the behavior:

Module scripted from Python as in tutoriel:

import torchvision
import torch

model = torchvision.models.resnet18()
example = torch.rand(1,3,224,224)
my_torchscript_module = torch.jit.trace(model, example)
torch.jit.save(my_torchscript_module, "sciptedModule.pt")

Loaded and ran in C++ in separate thread:

#include "torch/script.h"
#include "torch/torch.h"


void runModel(at::Tensor, torch::jit::script::Module);

int main()
{
	torch::NoGradGuard no_guard;
	torch::jit::script::Module m_module = torch::jit::load("./sciptedModule.pt");
	m_module.eval();
	at::Tensor testTensor = torch::rand({ 1,3,224,224}, at::kFloat);
	testTensor = testTensor.div(testTensor.norm());
	for (int i = 0; i < 10000; i++) {
		std::thread newThread(&runModel, testTensor, m_module);
		newThread.join();
	}
}

void runModel(at::Tensor testTensor, torch::jit::script::Module m_module) {
	torch::NoGradGuard no_guard;
	at::Tensor out = m_module.forward({ testTensor }).toTensor().detach();
}

MemoryIncrease

Expected behavior

Inference is done in separate thread with no increase in memory

Environment

PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: None

OS: Microsoft Windows 10 Home
GCC version: Could not collect
CMake version: version 3.12.2

Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip] numpy==1.16.2
[pip] numpydoc==0.8.0
[pip] torch==1.2.0
[pip] torchvision==0.4.0
[conda] _tflow_1100_select 0.0.3 mkl
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl
[conda] cpuonly 1.0 0 pytorch
[conda] libmklml 2019.0.3 0
[conda] mkl 2019.1 144
[conda] mkl-include 2019.1 144
[conda] mkl-service 1.1.2 py36hb782905_5
[conda] mkl_fft 1.0.10 py36h14836fe_0
[conda] mkl_random 1.0.2 py36h343c172_0
[conda] pytorch 1.2.0 py3.6_cpu_1 [cpuonly] pytorch
[conda] tensorflow-base 1.10.0 mkl_py36h81393da_0
[conda] torchvision 0.4.0 py36_cpu [cpuonly] pytorch

Additional context

When running on main thread, the memory seems to be allocated once on first call and then re-used.
Python threading doesn't have this problem

@pietern
Copy link
Contributor
pietern commented Aug 13, 2019

This is likely to be caused by some thread local state that isn't cleaned up.

Could you try running without MKL and see what happens?

@pietern pietern added module: cpp Related to C++ API module: mkl Related to our MKL support module: memory usage PyTorch is using more memory than it should, or it is leaking memory triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 13, 2019
@ghost
Copy link
Author
ghost commented Aug 13, 2019

Hi @pietern , thanks for quick answer. Sure, do you mean in the python part when tracing the model? I don't think I use MKL in the C++ part, unless it's inside the torch lib.

@pietern
Copy link
Contributor
pietern commented Aug 13, 2019

I mean on the C++ side. PyTorch compiled with MKL support will transparently use it, I think.

@ghost
Copy link
Author
ghost commented Aug 13, 2019

I'm not sure how to check if it uses it or run without it. Could you walk me through or point to some documentation?
I only downloaded the latest stable version and link to it.

@ghost
Copy link
Author
ghost commented Aug 22, 2019

I found a way to not use std::thread in my application, so this is not a problem for me anymore.

@pietern
Copy link
Contributor
pietern commented Aug 26, 2019

While this still might be an issue by itself, I'll close the issue since you found a workaround.

@pietern pietern closed this as completed Aug 26, 2019
@Bsting
Copy link
Bsting commented Oct 15, 2019

@tiberiusferreira
Copy link

I'm facing the same problem using the Rust bindings.
The thread local variables seem to not be cleaned after inference.

@jingxil
Copy link
jingxil commented Jan 18, 2020

Hi @pietern, I am facing a similar problem. The memory usage is keeping going up when I do the inference in separate threads. And without the mkl lib, the memory usage is stable. I tried to set the env variable MKL_DISABLE_FAST_MM=1. But it did not work out.

@Junan007
Copy link

I am facing a similar problem.

@WilliamTambellini
Copy link
Contributor

Still the same issue with libtorch 1.7.

@ofaucoz
Copy link
ofaucoz commented Oct 12, 2022

I'm having the exact same issue using libtorch called in a thread from unity.
This is a derived code from my problem:

C++ script

torch::NoGradGuard no_grad;
at::Tensor tensor_image = torch::from_blob(...)
tensor_image.set_requires_grad(false);

vector<torch::jit::IValue> inputs;
inputs.push_back(tensor_image);
at::Tensor output;
output = model.forward(inputs).toTensor();
...

(Unity) C# script calling the libtorch script

void Update(){
    ...
    ThreadMl = new Thread(Action);
    ThreadMl.Start();
}

private void Action(){ // launched in a thread
    ...
    ScriptML(...); // Memory being leaked
    ...
}

libtorch version : (CPU - Windows) 1.11

@RVirmoors
Copy link

this is very much still an active issue and should probably be reopened

@joshhansen
Copy link

I seem to be getting this - coming from Rust like @tiberiusferreira. In a multithreaded async environment, inference appears to leak memory.

@ofaucoz
Copy link
ofaucoz commented Sep 6, 2023
< 8000 table class="d-block user-select-contain" data-paste-markdown-skip>

Any news to fix this issue ?

@ezorita
Copy link
ezorita commented Feb 1, 2024

Also having the same issue running a model for inference in a Thread class.

@thecargocultnz
Copy link

Why is this closed???
This is still an issue in Feb 2025, and renders torch non-viable in a commercially released product.
Any serious inference is gonna cause this massive memory leak.

To be clear: we cannot use torch any longer. I don't know how anyone does? Is it just for academics?

@thecargocultnz
Copy link

@pietern can we get it reopened please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cpp Related to C++ API module: memory usage PyTorch is using more memory than it should, or it is leaking memory module: mkl Related to our MKL support triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

0