8000 Failed to load delegate from libedgetpu.so.1.0 with tflite_runtime 1.14 · Issue #32743 · tensorflow/tensorflow · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Failed to load delegate from libedgetpu.so.1.0 with tflite_runtime 1.14 #32743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Namburger opened this issue Sep 23, 2019 · 33 comments
Closed
Assignees
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 1.14 for issues seen with TF 1.14 type:bug Bug

Comments

@Namburger
Copy link
Namburger commented Sep 23, 2019

System information

  • Have I written code (based on the docs):
from tflite_runtime.interpreter import Interpreter
from tflite_runtime.interpreter import load_delegate
model_path='my_compiled_model.tflite'
interpreter = Interpreter(model_path,
  experimental_delegates=[load_delegate('libedgetpu.so.1.0')])
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  Operating System: Ubuntu 18.04.3 LTS
            Kernel: Linux 4.15.0-60-generic
      Architecture: x86-64
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: laptop
  • TensorFlow installed from (source or binary):
    pip3 install tflite_runtime-1.14.0-cp36-cp36m-linux_x86_64.whl
  • TensorFlow version (use command below): tflite_runtime 1.14
  • Python version: Python 3.6.5 :: Anaconda, Inc.
  • Bazel version (if compiling from source): n/a
  • GCC/Compiler version (if compiling from source): n/a
  • CUDA/cuDNN version: n/a
  • GPU model and memory: n/a

This is the code that I ran:

from tflite_runtime.interpreter import Interpreter
from tflite_runtime.interpreter import load_delegate
model_path='my_compiled_model.tflite'
interpreter = Interpreter(model_path,
  experimental_delegates=[load_delegate('libedgetpu.so.1.0')])

following this tutorial:
https://www.tensorflow.org/lite/guide/python

This was working before, but somehow broken with this error:

Traceback (most recent call last):
  File "/home/nam/anaconda3/lib/python3.6/site-packages/tflite_runtime/interpreter.py", line 165, in load_delegate
    delegate = Delegate(library, options)
  File "/home/nam/anaconda3/lib/python3.6/site-packages/tflite_runtime/interpreter.py", line 119, in __init__
    raise ValueError(capture.message)
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "evaluate_edgetpu_cifar10.py", line 51, in <module>
    interpreter = Interpreter(file_name,experimental_delegates=[load_delegate('libedgetpu.so.1.0')])
  File "/home/nam/anaconda3/lib/python3.6/site-packages/tflite_runtime/interpreter.py", line 168, in load_delegate
    library, str(e)))
ValueError: Failed to load delegate from libedgetpu.so.1.0

I have been messing around a lot with my machine since by installing different versions of tf. But for the purpose of using the tflite_runtime.interpreter's load_delegate function, shouldn't just the pip install works?
Very weird behavior :/
also I do have libedgetpu.so.1.0 installed here:

% ls /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0
/usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0

Thanks in advance for the help!

[EDIT]
I guess I'll update the issue here with a solution so that any body else can reference:
ValueError: Failed to load delegate from libedgetpu.so.1.0 really is just due to the delegate library not being able to communicate with the edgetpu. This is a very standard linux problem and has nothing to do with the tensorflow library or libedgetpu. The failures most likely stems from some type of errno from the kernel which returns as failure to the user side.

So the easiest fix is to run with sudo:

$ sudo python your_script.py

But the most permanent fix is to add your linux user to the plugdev group which will allows you to access devices without sudo (this will requires a reboot after):

$ sudo usermod -aG plugdev $USER
@gadagashwini-zz gadagashwini-zz self-assigned this Sep 24, 2019
@gadagashwini-zz gadagashwini-zz added comp:lite TF Lite related issues type:bug Bug TF 1.14 for issues seen with TF 1.14 labels Sep 24, 2019
@gadagashwini-zz gadagashwini-zz added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 24, 2019
@EdjeElectronics
Copy link

Hi @Namburger , I had the same issue!

Make sure your Coral USB Accelerator is plugged in when you run your code. If the USB Accelerator isn't plugged in when you call the 'load_delegate' function, it will result in that error. If it IS plugged in, that error won't occur.

@programmer290399
Copy link

I am facing the same issue , even when the USB accelerator is plugged in and the LED in it is shining bright ........

@Namburger
Copy link
Author

@EdjeElectronics wow... you were correct... thanks.

@tensorflow-bot
Copy link
tensorflow-bot bot commented Oct 9, 2019

Are you satisfied with the resolution of your issue?
Yes
No

@Namburger
Copy link
Author
Namburger commented Oct 18, 2019

I am facing the same issue , even when the USB accelerator is plugged in and the LED in it is shining bright ........

@programmer290399 you might need to add your linux user to plugdev group:

$ sudo usermod -aG plugdev [your username]

@krishna-nag
Copy link

I am facing the same issue , even when the USB accelerator is plugged in and the LED in it is shining bright ........

@programmer290399 you might need to add your linux user to plugdev group:

$ sudo usermod -aG plugdev [your username]

I plugged in the device, and I did this too, it still is showing the same error. I am using ubuntu installed on a virtualbox in Mac. The USB device is getting attached, but load_delegate is giving error

@Namburger
Copy link
Author

@krishna-nag ahh, I see, most likely the usb device isn't detected in your VM. Possibly this will help: https://dev.to/kojikanao/coral-edgetpu-usb-with-virtualbox-57e1

@jiayiliu
Copy link
8000 jiayiliu commented Nov 12, 2019

I got the same problem in native Ubuntu. Reboot helps.

@cruzzer
Copy link
cruzzer commented Jan 4, 2020

There were two issues that needed resolving on my setup, coming from a fresh installed Pi 4 + fresh installed edge TPU.

  1. A udev rules needs to be added for the TPU, which according to the "Getting Started" page should have happened when installing the libedgetpu1-* package.
/etc/udev/rules.d/99-edgetpu-accelerator.rules
SUBSYSTEM=="usb",ATTRS{idVendor}=="1a6e",GROUP="plugdev"
SUBSYSTEM=="usb",ATTRS{idVendor}=="18d1",GROUP="plugdev"
  1. The user needs to be part of the plugdev group as mentioned by @Namburger .
    sudo usermod -aG plugdev [your username]

  2. Reboot
    This seems to be a bug with the libedgetpu1-* packages. I tried both -std and -max versions (v12-1).

@adr-arroyo
Copy link

Thanks @cruzzer for the suggestions!

Does anyone have tried to use the TPU with a program in a Docker container?
I am getting the same Failed to load delegate from libedgetpu.so.1.0 error when running my docker container.

@Namburger
Copy link
Author
Namburger commented Jan 10, 2020

@adr-arroyo you can check this out :)
google-coral/tflite#3 (comment)
tl;dr: most likely just have to throw it a --privileged flag

@Wonfee
Copy link
Wonfee commented Jan 12, 2020

OS env:

# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.6 LTS
Release:	14.04
Codename:	trusty

# python3 -V
Python 3.5.3

runtime error message:

# python3 coral/tflite/python/examples/classification/classify_image.py --model edgetpu/classify/models/output_tflite_graph_edgetpu.tflite --labels edgetpu/classify/models/labels.txt --input edgetpu/classify/flower.jpg 
Traceback (most recent call last):
  File "e-AI/coral/tflite/python/examples/classification/classify_image.py", line 118, in <module>
    main()
  File "e-AI/coral/tflite/python/examples/classification/classify_image.py", line 95, in main
    interpreter = make_interpreter(args.model)
  File "e-AI/coral/tflite/python/examples/classification/classify_image.py", line 69, in make_interpreter
    {'device': device[0]} if device else {})
  File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 165, in load_delegate
    delegate = Delegate(library, options)
  File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 89, in __init__
    self._library = ctypes.pydll.LoadLibrary(library)
  File "/usr/lib/python3.5/ctypes/__init__.py", line 425, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.5/ctypes/__init__.py", line 347, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/x86_64-linux-gnu/libedgetpu.so.1: symbol _ZTTNSt7__cxx1119basic_ostringstreamIcSt11char_traitsIcESaIcEEE, version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference
Exception ignored in: <bound method Delegate.__del__ of <tflite_runtime.interpreter.Delegate object at 0x7f7a5ff57438>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 124, in __del__
    if self._library is not None:
AttributeError: 'Delegate' object has no attribute '_library'

checked as below:

# strings /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0 |grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.11
GLIBCXX_3.4.21
GLIBCXX_3.4.9
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.18
GLIBCXX_3.4.17
GLIBCXX_3.4.5
GLIBCXX_3.4.20
GLIBCXX_3.4.19

# strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 |grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBCXX_3.4.23
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_3.4.26
GLIBCXX_3.4.27
GLIBCXX_3.4.28
GLIBCXX_DEBUG_MESSAGE_LENGTH

@adr-arroyo
Copy link

Thanks @Namburger !

Now my program is able to use the TPU from docker, I had to add -v /dev/bus/usb:/dev/bus/usb along with the --priviledge flag to the docker run command so that it works.

@adr-arroyo
Copy link

I have another question for you guys,
Regarding the output of the TPU, it is returned as type uint8 (mandatory and set in the post-training quantization, as well as the input). How do you cast it back to your original type?

In my case I cast float32 values between (0 and 1) to uint8, then I need this cast back to a proper float32 values to use my scaler and descale the results.

@taf2
Copy link
taf2 commented Jan 18, 2020

same issue on a raspberry pi

pi@raspberrypi:~/coral/google-coral/examples-camera/raspicam \> lsb_release -a
No LSB modules are available.
Distributor ID:	Raspbian
Description:	Raspbian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

pi@raspberrypi:~/coral \> ls
tflite  tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl
pi@raspberrypi:~/coral \> pip3 install tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: tflite-runtime==1.14.0 from file:///home/pi/coral/tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl in /usr/local/lib/python3.7/dist-packages (1.14.0)
pi@raspberrypi:~/coral \> ls
tflite  tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl
pi@raspberrypi:~/coral \> git clone https://github.com/tensorflow/examples --depth 1
Cloning into 'examples'...
remote: Enumerating objects: 1293, done.
remote: Counting objects: 100% (1293/1293), done.
remote: Compressing objects: 100% (836/836), done.
remote: Total 1293 (delta 369), reused 994 (delta 250), pack-reused 0
Receiving objects: 100% (1293/1293), 8.28 MiB | 2.15 MiB/s, done.
Resolving deltas: 100% (369/369), done.
pi@raspberrypi:~/coral \> cd examples/lite/examples/image_classification/raspberry_pi
pi@raspberrypi:~/coral/examples/lite/examples/image_classification/raspberry_pi \> bash download.sh /tmp
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (from -r requirements.txt (line 2)) (1.16.2)
Requirement already satisfied: picamera in /usr/lib/python3/dist-packages (from -r requirements.txt (line 3)) (1.13)
Requirement already satisfied: Pillow in /usr/lib/python3/dist-packages (from -r requirements.txt (line 4)) (5.4.1)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2997k  100 2997k    0     0  1761k      0  0:00:01  0:00:01 --:--:-- 1762k
Archive:  mobilenet_v1_1.0_224_quant_and_labels.zip
  inflating: /tmp/labels_mobilenet_quant_v1_224.txt  
   creating: /tmp/__MACOSX/
  inflating: /tmp/__MACOSX/._labels_mobilenet_quant_v1_224.txt  
  inflating: /tmp/mobilenet_v1_1.0_224_quant.tflite  
  inflating: /tmp/__MACOSX/._mobilenet_v1_1.0_224_quant.tflite  
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4428k  100 4428k    0     0  1917k      0  0:00:02  0:00:02 --:--:-- 1916k
Downloaded files are in /tmp
pi@raspberrypi:~/coral/examples/lite/examples/image_classification/raspberry_pi \> python3 classify_picamera.py \
>   --model /tmp/mobilenet_v1_1.0_224_quant.tflite \
>   --labels /tmp/labels_mobilenet_quant_v1_224.txt
INFO: Initialized TensorFlow Lite runtime.
^CTraceback (most recent call last):
  File "classify_picamera.py", line 96, in <module>
    main()
  File "classify_picamera.py", line 82, in main
    Image.ANTIALIAS)
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 1806, in resize
    return self._new(self.im.resize(size, resample, box))
KeyboardInterrupt
pi@raspberrypi:~/coral/examples/lite/examples/image_classification/raspberry_pi \> cd
pi@raspberrypi:~ \> cd coral/
pi@raspberrypi:~/coral \> ls
examples  tflite  tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl
pi@raspberrypi:~/coral \> mv examples/ obj-example
pi@raspberrypi:~/coral \> mkdir google-coral && cd google-coral
pi@raspberrypi:~/coral/google-coral \> git clone https://github.com/google-coral/examples-camera.git --depth 1
Cloning into 'examples-camera'...
remote: Enumerating objects: 37, done.
remote: Counting objects: 100% (37/37), done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 37 (delta 12), reused 14 (delta 1), pack-reused 0
Unpacking objects: 100% (37/37), done.
pi@raspberrypi:~/coral/google-coral \> cd examples-camera
pi@raspberrypi:~/coral/google-coral/examples-camera \> sh download_models.sh
--2020-01-17 20:23:10--  https://dl.google.com/coral/canned_models/all_models.tar.gz
Resolving dl.google.com (dl.google.com)... 172.217.7.206, 2607:f8b0:4004:802::200e
Connecting to dl.google.com (dl.google.com)|172.217.7.206|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 184961153 (176M) [application/octet-stream]
Saving to: ‘all_models.tar.gz’

all_models.tar.gz               100%[=======================================================>] 176.39M  3.01MB/s    in 77s     

2020-01-17 20:24:27 (2.30 MB/s) - ‘all_models.tar.gz’ saved [184961153/184961153]

./
./mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite
./mobilenet_v2_1.0_224_inat_bird_quant.tflite
./inception_v3_299_quant_edgetpu.tflite
./mobilenet_v2_1.0_224_quant_edgetpu.tflite
./inat_insect_labels.txt
./inception_v2_224_quant_edgetpu.tflite
./mobilenet_v1_1.0_224_quant.tflite
./mobilenet_v2_1.0_224_inat_insect_quant_edgetpu.tflite
./mobilenet_v1_1.0_224_quant_embedding_extractor_edgetpu.tflite
./mobilenet_v2_1.0_224_quant.tflite
./inception_v2_224_quant.tflite
./imagenet_labels.txt
./mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite
./coco_labels.txt
./inception_v1_224_quant_edgetpu.tflite
./mobilenet_v2_1.0_224_inat_plant_quant.tflite
./inat_bird_labels.txt
./mobilenet_v1_1.0_224_quant_edgetpu.tflite
./mobilenet_ssd_v1_coco_quant_postprocess.tflite
./inception_v4_299_quant.tflite
./pet_labels.txt
./inception_v3_299_quant.tflite
./mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite
./mobilenet_ssd_v2_coco_quant_postprocess.tflite
./mobilenet_ssd_v1_coco_quant_postprocess_edgetpu.tflite
./inat_plant_labels.txt
./mobilenet_v2_1.0_224_inat_insect_quant.tflite
./inception_v1_224_quant.tflite
./inception_v4_299_quant_edgetpu.tflite
./mobilenet_ssd_v2_face_quant_postprocess.tflite
./mobilenet_v1_1.0_224_quant_embedding_extractor.tflite
./mobilenet_v2_1.0_224_inat_plant_quant_edgetpu.tflite
pi@raspberrypi:~/coral/google-coral/examples-camera \> cd raspicam
pi@raspberrypi:~/coral/google-coral/examples-camera/raspicam \> bash install_requirements.sh
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: picamera in /usr/lib/python3/dist-packages (1.13)
pi@raspberrypi:~/coral/google-coral/examples-camera/raspicam \> python3 classify_capture.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tflite_runtime/interpreter.py", line 165, in load_delegate
    delegate = Delegate(library, options)
  File "/usr/local/lib/python3.7/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
    raise ValueError(capture.message)
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "classify_capture.py", line 91, in <module>
    main()
  File "classify_capture.py", line 55, in main
    interpreter = common.make_interpreter(args.model)
  File "/home/pi/coral/google-coral/examples-camera/raspicam/common.py", line 27, in make_interpreter
    {'device': device[0]} if device else {})
  File "/usr/local/lib/python3.7/dist-packages/tflite_runtime/interpreter.py", line 168, in load_delegate
    library, str(e)))
ValueError: Failed to load delegate from libedgetpu.so.1

pi@raspberrypi:~/coral/google-coral/examples-camera/raspicam \> sudo usermod -aG plugdev pi
pi@raspberrypi:~/coral/google-coral/examples-camera/raspicam \> python3 classify_capture.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tflite_runtime/interpreter.py", line 165, in load_delegate
    delegate = Delegate(library, options)
  File "/usr/local/lib/python3.7/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
    raise ValueError(capture.message)
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "classify_capture.py", line 91, in <module>
    main()
  File "classify_capture.py", line 55, in main
    interpreter = common.make_interpreter(args.model)
  File "/home/pi/coral/google-coral/examples-camera/raspicam/common.py", line 27, in make_interpreter
    {'device': device[0]} if device else {})
  File "/usr/local/lib/python3.7/dist-packages/tflite_runtime/interpreter.py", line 168, in load_delegate
    library, str(e)))
ValueError: Failed to load delegate from libedgetpu.so.1

@taf2
Copy link
taf2 commented Jan 18, 2020

rebooting my pi - fixed the issue for me

@gblue1223
Copy link

In my case, Coral USB accelerator works on USB3.0 so, I've changed USB compatibility option to USB 3.0 and it solved.

@Namburger
Copy link
Author

by the way guys, please also upgrading the tflite_runtime package.

@Syirrus
Copy link
Syirrus commented Feb 25, 2020

USB compatibility option

Can you explain how you did that?
How you set (assuming raspberry pi 4) to USB3.0 compatibility mode/option?

@Namburger
Copy link
Author

@Syirrus are you still having the same issue?
this all fixed now if you upgrade to tflite_runtime-2.1.0.post1 instead of 1.14.
I don't think USB2.0 or USB3.0 matter because it works consistently on my RPI3 b+ (2.0).
Anyways, to answer your question, the pi4 has 2 USB2.0 ports and 2 USB3.0 ports, just switch it to the correct port.

@Syirrus
Copy link
Syirrus commented Feb 25, 2020

tflite_runtime-2.1.0.post1

I installed the tflite_runtime-2.1.0 (tflite_runtime-2.1.0.post1-cp37-cp37m-linux_armv7l.whl). I still get the same behavior on a RPI4. However, if I plug the coral USB stick in to a USB 2.0 port, instead of the USB3.0 port it works. When I switch to the native 3.0 USB port on the RPI4 I get the same error as above.

In fact, after I run the classification script and they run a lsusb, the Coral USB stick is no longer attached logically to the system, though physically it is still plugged in.

@Syirrus
Copy link
Syirrus commented Feb 25, 2020

@Syirrus are you still having the same issue?
this all fixed now if you upgrade to tflite_runtime-2.1.0.post1 instead of 1.14.
I don't think USB2.0 or USB3.0 matter because it works consistently on my RPI3 b+ (2.0).
Anyways, to answer your question, the pi4 has 2 USB2.0 ports and 2 USB3.0 ports, just switch it to the correct port.

Does that make sense?

@Namburger
Copy link
Author

@Syirrus Ahh, I see.
So it's working in USB 2.0 but not 3.0? That's annoying, because with 2.0, data transferring speed is going to be a bottle neck. But this looks like a rpi4 issue, check here.
For my pi3 b+, this isn't even an option since it only has 2.0 :/, good luck with yours

@Syirrus
Copy link
Syirrus commented Feb 25, 2020

@Syirrus Ahh, I see.
So it's working in USB 2.0 but not 3.0? That's annoying, because with 2.0, data transferring speed is going to be a bottle neck. But this looks like a rpi4 issue, check here.
For my pi3 b+, this isn't even an option since it only has 2.0 :/, good luck with yours

Exactly, it is working on USB 2.0, but NOT USB 3.0 which is a bottleneck :(. I will check out the link to RPI4 issues. Thank you so much!

@Namburger
Copy link
Author

No problems!

@Syirrus
Copy link
Syirrus commented Feb 27, 2020

No problems!

After spending many hours on and off thinking about this problem and combing the net, I finally solved this problem with this "Failed to load delegate from libedgetpu.so.1.0 error". Essentially, the cable that Google provided with my Coral USB TPU stinks. I broke down and purchased a USB 3.1 (10Gbps) (NOT 5Gbps) cable and everything worked perfectly on the USB 3.0 port for the Raspberry Pi 4b (4GB). I hope this helps someone else in the same situation I was in.

@bmachin
Copy link
9E88 bmachin commented Dec 30, 2020

Weird as it may seem, I can confirm what @Syirrus reports. After many hours of fighting this error, it finally worked when I changed the USB cable. Thanks!

@pliablepixels
Copy link
pliablepixels commented Jan 23, 2021

I recently moved from a USB 2.0 system to USB 3.0 and had the same issue. It is intermittent, so it's not a plugdev issue.

As suggested above it does seem to be an issue with Google’s cable. I bought this cable and replaced the google provided cable. That seems to have completely eliminated the issue and I’m getting good inference speed (averaging 19ms for an 800px image using mobiledet and pycoral wrappers)

@TurboTronix
Copy link
TurboTronix commented Mar 21, 2021

I am getting the same error, running a docker in win10...Using Google coral edge tpu pci version

@pliablepixels
Copy link
pliablepixels commented Mar 28, 2021

(last edit: May 28,2021, to handle situation when lsusb works, but coral doesn't load the library)

I'm going to leave another finding here (Only applies to intermittent libedgetpu delegate loading issues, unrelated to plugdev/permissions)

  1. Replacing the cable eliminates most problems related to intermittent loading libedgetpu issues
  2. However, there are times this happens, and especially if I interrupt an ongoing TPU operation
  3. When this happens, a reboot usually fixes it, but I'd prefer not to have to reboot. There are two approaches I found.
    • (Works for me) You reset the full USB ecosystem - this script does that - tested on ubuntu 20. Depending on your OS it may require different paths - see comments in that link. I use this approach.

    • (Does not work for me) If you prefer not to reset the entire USB ecosystem, this is a more focussed way (You can get the coral device details by doing lsusb | grep -i google. However, in my case, when the TPU actually fails, this method does not work - it mulls around for a while and then errors out

So the rest of the post is how I go about detecting failure and then I reset the full USB system.
Here is what I observed:

  • Sometimes, when coral fails, lsusb also fails showing the google device
  • I realized later, that there are times that coral fails to load, but lsusb continues to show the device

So to make sure the device always works, I do the following:

  • Try lsusb - if it fails, obviously coral is not working, restart usb
  • If lsusb shows google, try to load the delegate using pycoral - if that fails restart usb
  • If that passes, nothing to do

Specifically, I have set up a cron file that checks that the coral device is detected every hour:

pp@homeserver:~$ cat /etc/cron.d/coral-usb-checker 
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# every hour
0 * * * * root usb_reset_all.sh

The modified script (credit above in my notes):

pp@homeserver:~$ cat /usr/local/bin/usb_reset_all.sh 
#!/bin/bash

filename='/tmp/coral_status.txt'
now=$(date)
if [[ $EUID != 0 ]] ; then
  echo "${now}:$0  must be run as root!" 
  echo "${now}:$0  must be run as root!" > ${filename}
  exit 1
fi

# keep log file to 100 lines
if [ -f "${filename}" ] ;then
        echo "$(tail -100 ${filename})" > ${filename}
fi

 [[ "${1}" == "--force" ]] &&  IS_FORCED=true || IS_FORCED=false

is_coral_working() {
        # Note if you have multiple results from this lsusb command
        # You can specifically check with lsusb -d <deviceid>
        # Example: lsusb -d 18d1:9302 in my case which is what lsusb prints as the "ID" for my device

        if [[ -z `lsusb | grep -i google` ]] ; then
                echo "${now}:lsusb check failed" >> ${filename}

                return 0
        fi
        echo "${now}:lsusb check passed, checking load_edgetpu_delegate()" >> ${filename}
        res=`python << HEREDOC
from pycoral.utils.edgetpu import load_edgetpu_delegate
try:
        load_edgetpu_delegate()
        print ('success')
except Exception as e:
        print('error')
HEREDOC
`
        if [[ "${res}" = "success" ]]; then
                return 0
        else
                return 1
        fi
}
restart_usb() {
        # credit: http://billauer.co.il/blog/2013/02/usb-reset-ehci-uhci-linux/
        echo "${now}:Restarting USB" >> ${filename}
        for xhci in /sys/bus/pci/drivers/?hci_hcd ; do
          if ! cd $xhci ; then
            echo "${now}:Weird error. Failed to change directory to $xhci" >> ${filename}
            exit 1
          fi

          echo "${now}:Resetting devices from $xhci..." >> ${filename}

          for i in ????:??:??.? ; do
            echo -n "$i" > unbind 2>/dev/null
            echo -n "$i" > bind 2>/dev/null
          done
        done
        echo "${now}:Completed operation" >> ${filename}
}


echo "${now}--------------------------------------------------------" >> ${filename}

if [[ ${IS_FORCED} == true ]] ; then
        echo "${now}:Forcing restart as user specified --force" >> ${filename}
        restart_usb

elif is_coral_working ; then 
        echo "${now}:Coral working fine" >> ${filename}

else
        echo "${now}:ERROR: Coral detection failed" >> ${filename}
        restart_usb
fi

The nice part is my resident services that depend on coral automatically get restarted (not sure how, but it does). So I don't need to restart it manually.

@joefernandez
Copy link
Member
joefernandez commented May 28, 2021

USB Accelerator connected to a Proxmox VM failure scenario (FIXED)

I experienced a similar but different problem when connecting the USB Accelerator to a VM hosted on proxmox:

When I plugged the USB Accelerator into the physical machine (proxmox physical host), it showed up like this:

root@proxmox:~# lsusb
...
Bus 001 Device 008: ID 1a6e:089a Global Unichip Corp. 
...

So added device to the VM where I want to work with Coral:

root@proxmox:~# qm set 202 -usb0 host=1a6e:089a
update VM 202: -usb0 host=1a6e:089a

And rebooted the VM. The 1a6e:089a device showed up in the lsusb command inside the VM. Yay! Now the fun begins! I try to run the sample, and I get loooooong pause then the error at the top of this bug:

jfernandez@docker:~/dev/coral/pycoral$ python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 152, in load_delegate
    delegate = Delegate(library, options)
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 111, in __init__
    raise ValueError(capture.message)
ValueError
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "examples/classify_image.py", line 84, in <module>
    main()
  File "examples/classify_image.py", line 61, in main
    interpreter = make_interpreter(*args.model.split('@'))
  File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 66, in make_interpreter
    delegates = [load_edgetpu_delegate({'device': device} if device else {})]
  File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 42, in load_edgetpu_delegate
    return tflite.load_delegate(_EDGETPU_SHARED_LIB, options or {})
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 154, in load_delegate
    raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1

After trying a bunch of fixes from this thread, I noticed the USB Accelerator device had changed:

root@proxmox:~# lsusb
...
Bus 001 Device 012: ID 18d1:9302 Google Inc.
...

What?! Looks like running the example changed the device host ID of the USB Accelerator (was: 1a6e:089a, is now:18d1:9302). A side effect of loading a USB driver for it? Only the USB driver daemons know... Point is, now the VM can't talk to the USB device anymore, so I have to re-connect the USB device to the VM with the new device ID:

root@proxmox:~# qm set 202 -usb0 host=18d1:9302
update VM 202: -usb0 host=18d1:9302

Reboot my VM again. And the sample runs!

Also, at one point, I ran this to add myself to the plugdev group, but I'm not sure it this had any effect:

sudo usermod -aG plugdev [your username]

Hope this saves someone a bit of time.

\\Joe

@kiteklan
Copy link

USB Accelerator connected to a Proxmox VM failure scenario (FIXED)

I experienced a similar but different problem when connecting the USB Accelerator to a VM hosted on proxmox:

When I plugged the USB Accelerator into the physical machine (proxmox physical host), it showed up like this:

root@proxmox:~# lsusb
...
Bus 001 Device 008: ID 1a6e:089a Global Unichip Corp. 
...

So added device to the VM where I want to work with Coral:

root@proxmox:~# qm set 202 -usb0 host=1a6e:089a
update VM 202: -usb0 host=1a6e:089a

And rebooted the VM. The 1a6e:089a device showed up in the lsusb command inside the VM. Yay! Now the fun begins! I try to run the sample, and I get loooooong pause then the error at the top of this bug:

jfernandez@docker:~/dev/coral/pycoral$ python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 152, in load_delegate
    delegate = Delegate(library, options)
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 111, in __init__
    raise ValueError(capture.message)
ValueError
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "examples/classify_image.py", line 84, in <module>
    main()
  File "examples/classify_image.py", line 61, in main
    interpreter = make_interpreter(*args.model.split('@'))
  File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 66, in make_interpreter
    delegates = [load_edgetpu_delegate({'device': device} if device else {})]
  File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 42, in load_edgetpu_delegate
    return tflite.load_delegate(_EDGETPU_SHARED_LIB, options or {})
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 154, in load_delegate
    raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1

After trying a bunch of fixes from this thread, I noticed the USB Accelerator device had changed:

root@proxmox:~# lsusb
...
Bus 001 Device 012: ID 18d1:9302 Google Inc.
...

What?! Looks like running the example changed the device host ID of the USB Accelerator (was: 1a6e:089a, is now:18d1:9302). A side effect of loading a USB driver for it? Only the USB driver daemons know... Point is, now the VM can't talk to the USB device anymore, so I have to re-connect the USB device to the VM with the new device ID:

root@proxmox:~# qm set 202 -usb0 host=18d1:9302
update VM 202: -usb0 host=18d1:9302

Reboot my VM again. And the sample runs!

Also, at one point, I ran this to add myself to the plugdev group, but I'm not sure it this had any effect:

sudo usermod -aG plugdev [your username]

Hope this saves someone a bit of time.

\Joe

I have the same setup using the tpu on a proxmox VM , added the as a pci device with an adapter.
In my case the resetting device works.
with lspci get device id:
02:00.0 System peripheral: Device 1ac1:089a
find the corresponding the device folder:
sys/bus/pci/devices/0000:02:00.0/uevent:PCI_ID=1AC1:089A
and reset the device:
echo 1 >/sys/bus/pci/devices/0000:02:00.0/remove
echo 1 >/sys/bus/pci/rescan

I believe resetting the USB will also work as mentioned above.

@aslanpour
Copy link

I am facing the same issue , even when the USB accelerator is plugged in and the LED in it is shining bright ........

@programmer290399 you might need to add your linux user to plugdev group:

$ sudo usermod -aG plugdev [your username]

This works for me when my application inside the container is not running as root user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 1.14 for issues seen with TF 1.14 type:bug Bug
Projects
None yet
Development

No branches or pull requests

0