8000 [libtorch] Loading in Java two differente libtorch_cpu.so from different versions fails · Issue #70191 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

8000 [libtorch] Loading in Java two differente libtorch_cpu.so from different versions fails #70191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
carlosuc3m opened this issue Dec 20, 2021 · 5 comments
Labels
feature A request for a proper, new feature. module: binaries Anything related to official binaries that we release to users oncall: java triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@carlosuc3m
Copy link
carlosuc3m commented Dec 20, 2021

🐛 Describe the bug

Hello, I am creating an application in Java that allows loading different Pytorch versions to make inference. For that I have to dynamically load different versions of libtorch_cpu.so. However, it is not possible as the program crashes with different error messages depending on the version. I am loading the native libraries using JNIs loaded on dynamically created classloaders but even after being garbage collected, the native library is not unloaded. Some of the errors produced when trying to dynamically laod a different libtorch_cpu.so are:
Exception in thread "main" Unknown device: 51. If you have recently updated the caffe2.proto file to add a new device type, did you forget to update the DeviceTypeName() function to reflect such recent changes?

The following JVM fatal error

A fatal error has been detected by the Java Runtime Environment:

  SIGSEGV (0xb) at pc=0x00007f5712723ad1, pid=22628, tid=0x00007f5794d1f700

 JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10)
 Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode linux-amd64 compressed oops)
 Problematic frame:
 C  [libtorch_cpu.so+0xbd4ad1]  c10::impl::OperatorEntry::updateDispatchTableEntry_(c10::Dispatcher const&, c10::DispatchKey)+0x51

 Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

or

Caused by: java.lang.UnsatisfiedLinkError: /home/carlos/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/libtorch_cpu.so: /home/carlos/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/libtorch_cpu.so: undefined symbol: _ZNK3c1010TensorImpl23shallow_copy_and_detachERKNS_15VariableVersionEb

Looking at the already existing issues I have found something similar in:
#60341
#13541

Versions

For every version I have tried: 1.7.0, 1.7.1, 1.8.1, 1.9.0 and 1.9.1

cc @ezyang @seemethere @malfet

@malfet malfet added feature A request for a proper, new feature. module: binaries Anything related to official binaries that we release to users oncall: java labels Dec 20, 2021
@malfet
Copy link
Contributor
malfet commented Dec 20, 2021

Loading two versions of libtorch_cpu.so into the same address space is supported (and unlikely ever be) But dlopeninig and dlclosing libtorch should be possible.

Can you please run something like the following in your environment, and let me know if it works for you:

#include <iostream>
#include <dlfcn.h>

int main() {
  for(int i = 0; i < 3; ++i) {
    std::cout << "i=" << i << std::endl;
     void *handle = dlopen("libtorch_cpu.so", RTLD_GLOBAL);
     if (handle == nullptr) {
       std::cout << "Failed to dlopen " << dlerror() << std::endl;
     }
     int rc = dlclose(handle);
     if (rc != 0) {
       std::cout << "Failed to dlclose " << dlerror() << std::endl;
     }
  }
  return 0;
}

@carlosuc3m
Copy link
Author

Thanks for your answer @malfet ! I guess you meant that it is not supported, didnt you?
Even though I am not very familiar with C I will try and get you back. However, my application is in Java, do you think I could make a JNI to workaround the issue using dlopen and dlclosing to load the native library in Java?

@malfet malfet added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 28, 2021
@carlosuc3m
Copy link
Author

HEllo back again. I have done some more tests and it seems that libtorch_cpu.so leaves something open. In windows I am able to load and unload .dlls from different versions with no problem but in Linux I always get the error:
libtorch_cpu.so: undefined symbol: plus some random characers.
That is for versions >1.7.1.
For versions <=1.7.0 I get a fatal error.
REgards,
Carlos

@carlosuc3m
Copy link
Author

I also want to mention that this issue only happens in Linux as in Windows I am perfectly able to load and unload.
Regards,
Carlos

@carlosuc3m
Copy link
Author

This is the program that I used to test it in Linux. Regard that this same thing can be done with no error in Windows:

#include <iostream>
#include <dlfcn.h>

int main() {

     void *handle = dlopen("/home/carlos/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/libtorch_cpu.so", RTLD_NOW);
     if (handle == nullptr) {
       std::cout << "Failed to dlopen " << dlerror() << std::endl;
     }
     int rc = dlclose(handle);
     if (rc != 0) {
       std::cout << "Failed to dlclose " << dlerror() << std::endl;
     }
     std::cout << "Loaded first native library\n";

     void *handle2 = dlopen("/home/carlos/.djl.ai/pytorch/1.9.0-cpu-linux-x86_64/libtorch_cpu.so", RTLD_NOW);
     if (handle2 == nullptr) {
       std::cout << "Failed to dlopen " << dlerror() << std::endl;
     }
     int rc2 = dlclose(handle2);
     if (rc2 != 0) {
       std::cout << "Failed to dlclose " << dlerror() << std::endl;
     }
     std::cout << "ENd\n";


  return 0;
}

The first dlopen loads libtorch corresponding to pytorch 1.8.1 and the second libtorch corresponding to Pytorch 1.9.0.
Regards,
Carlos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: binaries Anything related to official binaries that we release to users oncall: java triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants
0