Yolov8 FPS on Hailo8

Hey,
As I am compiling on DeGirum platform and running the on hailo8 + imx8 plus board which has Hailo tappas v3.29.1 which is more than a year old.

When i run:

hailortcli run model_path

I get around 24 FPS for YOLOv8m and 21 for YOLOv11m. If I upgrade Hailo tappas version to latest version will there be any improvement is FPS?

Hi,

With my Hailo8 Hat in a RPI5 8Gb I get:

Yolov8m: 76.13 fps with hailo8 model (yolov8n_coco–640x640_quant_hailort_hailo8_1) vs 58.59 fps with multidevice version model (yolov8n_coco–640x640_quant_hailort_multidevice_1)
Yolov8s: 220.05 fps with hailo8 model
Yolov8n: 227.85 fps with hailo8 model

Yolo11s: 99.37 fps with hailo8 model
Yolo11n: 189.32 fps with hailo8 model

With HailoRT-CLI in version 4.20.0

Hi @suraj.upadhyay

Tappas version has no effect on FPS.

@dario

We need to check your yolov8n model. performance seems lower than expected.

@shashi thanks for your answer, didn’t noticed I had something wrong…

What should be the normal result for Yolov8n then?
I must say that I didn’t use hailortcli run model_path as it gave me error:

$ hailortcli run yolov8n_coco–640x640_quant_hailort_hailo8_1.hef
Running streaming inference (yolov8n_coco–640x640_quant_hailort_hailo8_1.hef):
Transform data: true
Type:      auto
Quantized: true
\[HailoRT\] \[error\] CHECK failed - Failed to create vdevice. there are not enough free devices. requested: 1, found: 0
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT CLI\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74) - Failed creating vdevice

I used this script instead:

import degirum as dg
import degirum_tools

iterations = 2500 # Number of iterations to run with the model

# For testing the local hardware:
hw_location = "@local"
# For testing inference on an AIServer running locally or on your LAN, uncomment:
# hw_location = "localhost" or AIServer IP

# For testing a model file from the DeGirum AI Hub:
# Any model from https://hub.degirum.com/degirum/hailo
#model_name = "yolov8n_relu6_face--640x640_quant_hailort_hailo8_1"
model_name = "yolo11n_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolo11s_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolov8n_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolov8s_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolov8m_coco--640x640_quant_hailort_hailo8_1"


# Load the model
model = dg.load_model(
    model_name=model_name,
    inference_host_address=hw_location,
    token="<>",
    zoo_url="https://hub.degirum.com/degirum/hailo",
    device_type="HAILORT/HAILO8"
)

# If instead, you want to test a local model file, say:
# model_name = "local_model_name"
# You must ensure that the model .hef file is adjacent to its corresponding model parameter JSON file.
# For information on PySDK model parameter JSON file formats, look at examples for similar models in the DeGirum AI Hub
# or refer to: https://docs.degirum.com/pysdk/user-guide-pysdk/model-json-structure

# Specify zoo_url parameter as either a path to a local model zoo directory
# or a direct path to a model's .json configuration file.
# model = dg.load_model(
#     model_name=model_name,
#     inference_host_address="@local",
#     zoo_url="path/to/your/model_name.json",
# )

# Set the model's thread pack size for maximum performance
model._model_parameters.ThreadPackSize = 6

# Turn off C++-based post-processing (Does not affect models with a 'PythonFile' python-based postprocessor!)
model.output_postprocess_type = "None"

results = degirum_tools.model_time_profile(model, iterations)
print(f"Observed FPS: {results.observed_fps:5.2f}")

What I should do to diagnose that maulfunction you see?

@shashi I did a few runs more of thr benchmark script and the average value is 220 for yolov8n.

So I stopped the hailort service and tried again but this time with the hailort run, this is the result:

DeGirum $ hailortcli run ./zoo/yolov8n_coco--640x640_quant_hailort_hailo8_1/yolov8n_coco--640x640_quant_hailort_hailo8_1.hef
Running streaming inference (./zoo/yolov8n_coco--640x640_quant_hailort_hailo8_1/yolov8n_coco--640x640_quant_hailort_hailo8_1.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network yolov8n_coco_v8/yolov8n_coco_v8: 100% | 1596 | FPS: 318.83 | ETA: 00:00:00
> Inference result:
 Network group: yolov8n_coco_v8
    Frames count: 1596
    FPS: 318.84
    Send Rate: 3134.29 Mbit/s
    Recv Rate: 3114.70 Mbit/s

Hi @dario

Can you try model.output_postprocess_type = "Null" ?

Hi @shashi ,

The benchmark script already has the model.output_postprocess_type = "Null" in it…

Hi @dario

Can you update to the latest degirum_toolsand add one more parameter to the model_time_profile

model_time_profile(model, iterations, input_image_format = “RAW”)

tried with and without that image_format declaration on the benchmark.py and after updating…
Successfully installed degirum-0.19.0 degirum_tools-0.22.4 the FPS went down drastically in Yolo11 while with Yolov8 went up:

Lower speeds:

Yolov8m -> Observed FPS: 31.93 (previously  76.13 fps)
Yolov8n (multidevice) -> Observed FPS: 115.74 (previously 217.53 fps)
Yolo11s -> Observed FPS: 51.85 (previously 99.37 fps)
Yolo11n -> Observed FPS: 104.02 (previously 189.32 fps)

High speeds:

Yolov8s -> Observed FPS: 318.19 (previously 220.05 fps)
Yolov8n -> Observed FPS: 318.99 (previously 227.85 fps)

Hi @dario

Thanks for the info. We will take a look at these numbers. if possible, can you run model_time_profile with and without input_image_format=”RAW” with the same version of degirum and degirum_tools. New release of degirum has some changes that could be causing this.

Hi @shashi
Just a suggestion can you guys add Optimization and compression level as option in Degirum platform. As With that we can get better FPS. For example for YOLOv8l model on Degirum platform i got 14 FPS but locally with compression and optimization level as 3 i got 24FPS. Same with YOLOv8m from degirum i got 24 and locally I have achieved 40+.

Hi @suraj.upadhyay

Thanks for the suggestion. Much appreciated. We will evaluate this internally and keep you posted.

That were the results I’ve posted with the latest version installed degirum-0.19.0 degirum_tools-0.22.4

There is almost no difference in the results when using or not input_image_format = "RAW" besides getting slightly lowwer values this round of executions…

| model name                                   | FPS with format RAW | FPS without format RAW |
| yolo11n_coco--640x640_quant_hailort_hailo8_1 |              103.99 |                 104.01 |
| yolo11n_coco--640x640_quant_hailort_multidevice_1 |          94.36 |                  94.35 |
| yolo11s_coco--640x640_quant_hailort_hailo8_1 |               51.85 |                  51.84 |
| yolo11s_coco--640x640_quant_hailort_multidevice_1 |          45.33 |                  45.32 |
| yolov8n_coco--640x640_quant_hailort_hailo8_1 |              245.71 |                 224.87 |
| yolov8n_coco--640x640_quant_hailort_multidevice_1 |         115.72 |                 115.71 |
| yolov8s_coco--640x640_quant_hailort_hailo8_1 |              239.36 |                 222.83 |
| yolov8s_coco--640x640_quant_hailort_multidevice_1 |          55.85 |                  55.86 |
| yolov8m_coco--640x640_quant_hailort_hailo8_1 |               31.93 |                  31.93 |
| yolov8m_coco--640x640_quant_hailort_multidevice_1 |          31.44 |                  31.44 |

Hi @dario

So, when you run again, you are not seeing these numbers?

Hi @suraj.upadhyay

We had a discussion regarding your suggestions and unfortunately, at this time, we are not going to be able to implement these suggestions as they are prohibitively expensive from a compute point of view. Just curious: when you compile locally, how long does it take when you enable compression and optimization level as 3?

Hi @shashi
my server Configuration is :

Model: AMD Ryzen 9 9950X3D 16-Core Processor, Number Of Cores: 32, 64 GB RAM.
depending on the level of optimization and compression it takes somewhere between 1-12 hours. It will further increase if I use - - performance tag.

Don’t know exactly whay but that numbers vary…

At first glance I thought maybe the new version bumped up the FPS when I got the 318 FPS the first round (each round I run the script 5 times and post the average value), but after answering the first question the second round the results were again at 22x instead 31X

Today I’m getting 318.28 and also observed something at least curious, when using the server and the client stops their inference the server does not clean the model from the memory…

When I took the screenshot, no inference was running on the device. Even after closing and reopening the monitor, it displayed the models, which makes me think the issue isn’t with the monitor itself. However, I’m unsure if this behavior is expected.

Hi @suraj.upadhyay

Thanks for sharing your setup, which confirms how compute intensive these options can be. We cannot provide such options at scale (1000s of users). We have paid options for enterprise customers that we offer on a case-by-case basis where we help with custom compilation and optimization.