Yolov8m: 76.13 fps with hailo8 model (yolov8n_coco–640x640_quant_hailort_hailo8_1) vs 58.59 fps with multidevice version model (yolov8n_coco–640x640_quant_hailort_multidevice_1)
Yolov8s: 220.05 fps with hailo8 model
Yolov8n: 227.85 fps with hailo8 model
Yolo11s: 99.37 fps with hailo8 model
Yolo11n: 189.32 fps with hailo8 model
@shashi thanks for your answer, didn’t noticed I had something wrong…
What should be the normal result for Yolov8n then?
I must say that I didn’t use hailortcli run model_path as it gave me error:
$ hailortcli run yolov8n_coco–640x640_quant_hailort_hailo8_1.hef
Running streaming inference (yolov8n_coco–640x640_quant_hailort_hailo8_1.hef):
Transform data: true
Type: auto
Quantized: true
\[HailoRT\] \[error\] CHECK failed - Failed to create vdevice. there are not enough free devices. requested: 1, found: 0
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
\[HailoRT CLI\] \[error\] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74) - Failed creating vdevice
I used this script instead:
import degirum as dg
import degirum_tools
iterations = 2500 # Number of iterations to run with the model
# For testing the local hardware:
hw_location = "@local"
# For testing inference on an AIServer running locally or on your LAN, uncomment:
# hw_location = "localhost" or AIServer IP
# For testing a model file from the DeGirum AI Hub:
# Any model from https://hub.degirum.com/degirum/hailo
#model_name = "yolov8n_relu6_face--640x640_quant_hailort_hailo8_1"
model_name = "yolo11n_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolo11s_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolov8n_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolov8s_coco--640x640_quant_hailort_hailo8_1"
#model_name = "yolov8m_coco--640x640_quant_hailort_hailo8_1"
# Load the model
model = dg.load_model(
model_name=model_name,
inference_host_address=hw_location,
token="<>",
zoo_url="https://hub.degirum.com/degirum/hailo",
device_type="HAILORT/HAILO8"
)
# If instead, you want to test a local model file, say:
# model_name = "local_model_name"
# You must ensure that the model .hef file is adjacent to its corresponding model parameter JSON file.
# For information on PySDK model parameter JSON file formats, look at examples for similar models in the DeGirum AI Hub
# or refer to: https://docs.degirum.com/pysdk/user-guide-pysdk/model-json-structure
# Specify zoo_url parameter as either a path to a local model zoo directory
# or a direct path to a model's .json configuration file.
# model = dg.load_model(
# model_name=model_name,
# inference_host_address="@local",
# zoo_url="path/to/your/model_name.json",
# )
# Set the model's thread pack size for maximum performance
model._model_parameters.ThreadPackSize = 6
# Turn off C++-based post-processing (Does not affect models with a 'PythonFile' python-based postprocessor!)
model.output_postprocess_type = "None"
results = degirum_tools.model_time_profile(model, iterations)
print(f"Observed FPS: {results.observed_fps:5.2f}")
What I should do to diagnose that maulfunction you see?
tried with and without that image_format declaration on the benchmark.py and after updating… Successfully installed degirum-0.19.0 degirum_tools-0.22.4 the FPS went down drastically in Yolo11 while with Yolov8 went up:
Thanks for the info. We will take a look at these numbers. if possible, can you run model_time_profile with and without input_image_format=”RAW” with the same version of degirum and degirum_tools. New release of degirum has some changes that could be causing this.
Hi @shashi
Just a suggestion can you guys add Optimization and compression level as option in Degirum platform. As With that we can get better FPS. For example for YOLOv8l model on Degirum platform i got 14 FPS but locally with compression and optimization level as 3 i got 24FPS. Same with YOLOv8m from degirum i got 24 and locally I have achieved 40+.
That were the results I’ve posted with the latest version installed degirum-0.19.0 degirum_tools-0.22.4
There is almost no difference in the results when using or not input_image_format = "RAW" besides getting slightly lowwer values this round of executions…
We had a discussion regarding your suggestions and unfortunately, at this time, we are not going to be able to implement these suggestions as they are prohibitively expensive from a compute point of view. Just curious: when you compile locally, how long does it take when you enable compression and optimization level as 3?
Model: AMD Ryzen 9 9950X3D 16-Core Processor, Number Of Cores: 32, 64 GB RAM.
depending on the level of optimization and compression it takes somewhere between 1-12 hours. It will further increase if I use - - performance tag.
At first glance I thought maybe the new version bumped up the FPS when I got the 318 FPS the first round (each round I run the script 5 times and post the average value), but after answering the first question the second round the results were again at 22x instead 31X
Today I’m getting 318.28 and also observed something at least curious, when using the server and the client stops their inference the server does not clean the model from the memory…
When I took the screenshot, no inference was running on the device. Even after closing and reopening the monitor, it displayed the models, which makes me think the issue isn’t with the monitor itself. However, I’m unsure if this behavior is expected.
Thanks for sharing your setup, which confirms how compute intensive these options can be. We cannot provide such options at scale (1000s of users). We have paid options for enterprise customers that we offer on a case-by-case basis where we help with custom compilation and optimization.