Description
I am currently deploying a custom neural network on a DepthAI device, leveraging a RegNet backbone followed by two fully connected layers. Based on its architecture, I anticipated a superior frame rate performance compared to traditional object detectors. For context, models like MobileNet SSD achieve around 30 FPS on the same setup. However, my custom model is significantly underperforming in terms of speed, yielding only about 2 FPS.
I suspect the bottleneck may be due to the model's output handling, where the device waits for the entire neural network (NN) message before proceeding. This leads me to question if there's a method to predefine the layers of interest, allowing me to streamline the output process by focusing only on specific layers' data before executing the xout function. Such a capability would presumably reduce processing time and enhance frame rate efficiency.
Is there an existing feature within the DepthAI API or a workaround that facilitates this selective output processing? Any guidance on optimizing the FPS by limiting the output to certain layers would be greatly appreciated.