Do you have any performance metrics on the compiled OBB models on the degirum model zoo? something which has details like the output of the parse-hef command using hailortcli , or stats like FPS, which host cpu was used during testing , including post processing times …IoU, bbox decoding …etc
Hi @arinjay4756
Welcome to the DeGirum community. Here is the benchmark info for the requested model. Please note that the model is compiled for Hailo8l but can run on Hailo8 as well. We benchmarked it on Hailo8. If there is a specific configuration for which you want to see the benchmark, please let us know.
Thanks a lot Shashi, this is really detailed and helpful, I am surprised that so much of the preprocessing and postprocessing has been offloaded to the neural core as well, what are the outputs after the post processing in the core for a single frame? do they still need host side post processing like IoU, bbox decoding or RMS? or is that the part that has been offloaded to the neural core
Hi @arinjay4756
Actually not much has been offloaded to the neural core in this model. The core here in the results refers to PySDK core and not neural core of Hailo. Only the Inference part runs on Hailo accelerator and rest runs on the CPU. For YOLO models, there is no pre-processing after the input is resized to the size expected by the model (benchmarks do not measure resize overhead). The postprocessing is written by us in C++. This starts with the output tensors (bboxes, scores, angles) at different resolutions and performs bbox decoding and NMS. Since this is a powerful CPU and since the code is in C++, the overhead is negligible.
Okay👍 Thank you for the clarification, how exactly is the FPS calculated in the benchmarks that you have mentioned? Assuming Batch size to be 1, and if the average total frame duration is around 65ms , shouldnt that come upto around 15fps? The relevant stat for me would be how many inference results can I get per second , assuming one Inference result = 2d position and orientation of the object that I want to detect
Hi @arinjay4756
The way AI accelerators work is that multiple inference jobs are in the pipeline. So, even though the latency is 65ms, the throughput is much higher as multiple jobs are submitted in a queue. As soon as frame1 finishes some layers, frame2 gets started, and so on. In Hailo systems, approximately 6 frames can be in the pipeline (our guess from extensive experimentation) and hence the FPS numbers are much higher than 1/latency. To answer your question, if there are no other bottlenecks in the systems, you should see the FPS we reported. Hope this clarifies your issue.