smaller model runs slower than a larger one when compiled for edgetpu #50951

Drulludanni · 2021-07-26T13:35:41Z

System information-

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: coral usb edgetpu
TensorFlow version (use command below): v1.12.1-49562-gee58e600bfc 2.5.0-dev20210125
Python version: 3.7

Describe the current behavior
So I have two models (U-nets) that are nearly identical except one of them uses fewer filters in some of the convolutional layers which makes that network strictly smaller, and when running the tflite version of the models the smaller one is indeed faster than the larger one, however when compiled and run on the edgetpu the smaller network runs slower than the larger network.

Describe the expected behavior
performance gain form tflite should be the same on the edgetpu

Standalone code to reproduce the issue
https://drive.google.com/drive/folders/1-u9GpNwRdbCAxtaMuAdDZazMWqeMIt_n?usp=sharing

Other info / logs
I already made an issue at the google coral edgetpu page seen here, they said the issue was with interpreter.invoke() in the script ../lib/python3.8/site-packages/tflite_runtime/interpreter.py and that I should contact the tensorflow team.

Saduf2019 · 2021-07-26T15:30:41Z

@Drulludanni
We are unable to open the files in the drive shared, can you share the performance on a colab gist and share for us to analyse.

Drulludanni · 2021-07-27T10:40:35Z

that is very weird, the folder is shared with anyone with a link (if I open the link in incognito I can still view all the files).

But here is the code:

from pycoral.utils import edgetpu
import numpy as np
import time

models = ['large.tflite', 'small.tflite', 'large_edgetpu.tflite', 'small_edgetpu.tflite']

for model_path in models:
    print(model_path)
    interpreter = edgetpu.make_interpreter(model_path)
    interpreter.allocate_tensors()
    input_index = interpreter.get_input_details()[0]['index']
    output_index = interpreter.get_output_details()[0]['index']


    n_trials = 10
    total = 0
    x = np.zeros((1,256,256,3), dtype=np.uint8)

    # first call is usually slower so we skip it
    interpreter.set_tensor(input_index, x)
    interpreter.invoke()
    pred = interpreter.get_tensor(output_index)

    for i in range(n_trials):
        t =  time.perf_counter()
        interpreter.set_tensor(input_index, x)
        interpreter.invoke()
        pred = interpreter.get_tensor(output_index)
        delta =  time.perf_counter() - t
        print(delta)

        total += delta


    print("inference time:", total/n_trials)

and this is the output:

large.tflite
8.4517966
8.4927569
8.474044400000004
8.477497300000003
8.472683700000005
8.491142599999996
8.470729400000003
8.475646299999994
8.474072800000002
8.462358199999997
inference time: 8.47427282
small.tflite
5.282427299999995
5.293356000000003
5.279340199999993
5.272431900000001
5.2895331
5.280650100000003
5.273774800000012
5.278881600000005
5.27215240000001
5.2793834
inference time: 5.280193080000002
large_edgetpu.tflite
0.01768240000001242
0.017120800000014924
0.016798199999982444
0.016576700000001665
0.016357799999980216
0.016318699999999353
0.016506899999995994
0.01634260000000154
0.01632789999999318
0.016752200000013318
inference time: 0.016678419999999507
small_edgetpu.tflite
0.021253099999995584
0.021638700000011113
0.02142359999999144
0.020244700000006333
0.01973069999999666
0.019953200000003335
0.019520999999997457
0.01960610000000429
0.02138159999998379
0.02115660000001185
inference time: 0.020590930000000184

I tried to make a google colab to run the code but well, I have no idea how to make it run since an edgetpu is required and I don't know if it is possible to somehow make a virtual one in google colab, but here it is anyways: https://colab.research.google.com/drive/1YipG-DUlg0MzGOHV_y4_zadd38YI3wlz?usp=sharing and it should also include the models in my test if you wanna download them to use locally.

Saduf2019 · 2021-07-28T07:33:17Z

@Drulludanni

Could you please refer to these links:link, link1 and let us know.

Drulludanni · 2021-07-29T09:19:41Z

Neither of those links are helpful. The reason the code wont run is because there is no edgetpu connected to the colab, and that is the problem I don't know how to either have an edgetpu connected to the colab or how to fake the edgetpu being there with some kind of edgetpu emulation and as far as I'm aware nobody has done/tried that which is why I don't think I can ever make the google colab work for my problem.

mohantym · 2022-10-06T08:55:51Z

Hi @Drulludanni !
We are checking to see whether you still need help in this issue .
You can check the quantization issues through quantization debugger for above models now.

There might be some operations which is not leveraging gpu of your edge tpus.You can find those operation using below flag.
tf.lite.experimental.Analyzer.analyze(model_content=fb_model, gpu_compatibility=True)
Ref.

Thank you!

google-ml-butler · 2022-10-14T00:26:23Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2022-10-21T01:18:40Z

Closing as stale. Please reopen if you'd like to work on this further.

Drulludanni added the type:performance Performance Issue label Jul 26, 2021

google-ml-butler bot assigned saikumarchalla Jul 26, 2021

Saduf2019 assigned Saduf2019 and unassigned saikumarchalla Jul 26, 2021

Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Jul 26, 2021

Saduf2019 added comp:tpus tpu, tpuestimator TF 2.5 Issues related to TF 2.5 and removed stat:awaiting response Status - Awaiting response from author labels Jul 29, 2021

Saduf2019 assigned ymodak and unassigned Saduf2019 Jul 29, 2021

ymodak assigned petewarden and unassigned ymodak Jul 30, 2021

mohantym self-assigned this Jul 5, 2022

mohantym removed their assignment Aug 24, 2022

mohantym added the comp:micro Related to TensorFlow Lite Microcontrollers label Oct 6, 2022

mohantym added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Oct 6, 2022

mohantym self-assigned this Oct 6, 2022

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 14, 2022

google-ml-butler bot closed this as completed Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smaller model runs slower than a larger one when compiled for edgetpu #50951

smaller model runs slower than a larger one when compiled for edgetpu #50951

smaller model runs slower than a larger one when compiled for edgetpu #50951

smaller model runs slower than a larger one when compiled for edgetpu #50951

Comments