DeGirum YOLOv11s Object detection evaluation: Accuracy metrics decreased compared to Hailo-model-zoo result

LongVu-Tr · October 24, 2025, 5:00am

Hi,

I’d like to share some notes about benchmarking object detection with Yolov11s.hef on HAILO8:

Setup:

Hardware: UP Ai edge board, using Hailo8 accelerator.

Benchmark model: YOLOv11s.hef from Hailo-model-zoo github repository. Pretrained, 80 classes, default release by Hailo.

YOLO model download Path: link.

Dataset: COCO-2017-val (5000 images)

Libraries used: From DeGrium evaluation guide: Hailo guide: Evaluating model accuracy after compilation .

Library versions:
Ubuntu OS 22.04 Jammy

Python: 3.10.19
HailoRT: 4.20.0

Hailo DFC: v3.31.0

Hailo Model Zoo: 2.16.0

DeGirum Tools: 0.22.4.

Results after running evaluation:

FPS: 77 to hailo-model-zoo’s 111 FPS. (Used model_time_profile module from DeGirum Tools)

mAp 50-95: 38% to hailo-model-zoo’s mAp 45.2%.

FPS dropped about 30% compared to the value stated in Hailo model zoo. I investigated and found PCIE Gen3 only running 2 our of 4 lanes available, this might be part of the reason why the FPS slowed down.

Accuracy dropped while it should be approximately the same as stated by Hailo model zoo.

Question 1: Is there possibly any other factors that might cause FPS slowing down during inference

Question 2: What are the possible causes for accuracy decrease when evaluating using DeGirum Tools on the yolov11s.HEF model?

Please help! Thank you.

shashi · October 24, 2025, 1:57pm

Hi @LongVu-Tr

Welcome to DeGirum community.

Just to be sure: the hef file is from hailo model zoo or did you compile another yolo11s by yourself? Can you please share how you obtained the 111FPS and 45.2% mAP? Are these just numbers from the hailo model zoo github page?

shashi · October 25, 2025, 5:47am

Hi @LongVu-Tr

We ran the evaluation of yolo11s and obtained the following results:

yolo11s_coco--640x640_quant_hailort_hailo8_1
[array([0.45358596, 0.62901933, 0.48389093, 0.2788012 , 0.4938611 ,
       0.624764  , 0.35266808, 0.57291141, 0.61078248, 0.43059618,
       0.66046639, 0.774312  ])]

So the mAP of 45.35% is very close to the reported 45.2%. If you share the exact script you used to evaluate the model (along with model JSON), we can see if there is any discrepancy.

Regarding FPS: It is host dependent. if you believe PySDK is slower than Hailo benchmark, please run hailortclicommand on the hef file. On our system, we get the following results:
degirum:0.19.0 Observed FPS=83.7

With hailortcli, we get the below numbers:

FPS     (hw_only)               = 83.55
        (streaming)               = 84.4837
Latency (hw)                     = 10.1392 ms
Device 0000:03:00.0:
Power in streaming mode (average) = 1.97945 W
                                            (max)     = 1.99689 W

LongVu-Tr · November 26, 2025, 11:18am

Hi,
Sorry for my late reply.

I confirm that the HEF file is from the Hailo Model Zoo. I re-evaluated the model using the attached JSON and script configurations with hailomz eval and obtained the following results:
Yolov11s.hef: mAP50-95 = 45%

When I run hailomz eval, I get 45% mAP, which matches the number shown in the Hailo Model Zoo. However, when evaluated with the DeGirum library, it shows 37% mAP as mentioned in the question before. I assume something might be missing or different in the JSON configuration. Could you please check the attached files?

Additionally, for custom model compilations with a different number of classes, what exactly needs to be changed in order to enable proper evaluation after compilation?

Thank you.

JSON: {
“ConfigVersion”: 11,
“Checksum”: “da96ad3b3730500d56c8e13d164d44a78eb6f062516717d4c4195f7995a8c391”,
“DEVICE”: [
{
“DeviceType”: “HAILO8”,
“RuntimeAgent”: “HAILORT”,
“SupportedDeviceTypes”: “HAILORT/HAILO8L, HAILORT/HAILO8”
}
],
“PRE_PROCESS”: [
{
“InputN”: 1,
“InputH”: 640,
“InputW”: 640,
“InputC”: 3,
“InputQuantEn”: true
}
],
“MODEL_PARAMETERS”: [
{
“ModelPath”: “yolov11s-80class.hef”
}
],
“POST_PROCESS”: [
{
“OutputPostprocessType”: “DetectionYoloHailo”,
“OutputNumClasses”: 80,
“LabelsPath”: “labels_yolov11.json”
}
]
}

import degirum as dg
import degirum_tools
from degirum_tools.detection_eval import ObjectDetectionModelEvaluator
import numpy as np 
# Load the detection model
model = dg.load_model(
    model_name="yolov11s-80class",
    inference_host_address="@local",
    zoo_url="/home/Downloads/vutl1-hailo-test/models/yolov11s-80class/yolov11s-80class.json",
    token=''
)

# Optional class ID remapping: model â†’ COCO
classmap = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
            27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51,
            52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77,
            78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90]

# Create evaluator
evaluator = ObjectDetectionModelEvaluator(model, classmap=classmap)

# Evaluation inputs
image_dir = "/home/Downloads/vutl1-hailo-test/datasets/vutlval2017/val2017"
coco_json = "/home/Downloads/vutl1-hailo-test/datasets/vutlval2017/annotations/instances_val2017.json"

# Evaluate and return mAP results
results = evaluator.evaluate(image_dir, coco_json, max_images=0)

# Print COCO-style mAP results
# print("COCO mAP stats:", results[0])

metric_labels = [
    "mAP@[IoU=0.50:0.95]",
    "mAP@0.50",
    "mAP@0.75",
    "mAP_small",
    "mAP_medium",
    "mAP_large",
    "AR@1",
    "AR@10",
    "AR@100",
    "AR_small",
    "AR_medium",
    "AR_large"
]

# Extract and print with metric names
print("COCO mAP/Eval Results:\n")
for label, value in zip(metric_labels, results[0]):
    print(f"{label:<20}: {value:.4f}")

# Compute overall statistics
mean_val = np.mean(results[0])
max_val = np.max(results[0])
min_val = np.min(results[0])

print("\nSummary Statistics:")
print(f"Mean: {mean_val:.4f}")
print(f"Max:  {max_val:.4f}")
print(f"Min:  {min_val:.4f}")


'''

lawrence · November 26, 2025, 6:26pm

Hi @LongVu-Tr

COCO is usually evaluated at specific thresholds, which have not been set in your evaluation script.

After loading the model, you need to set the output confidence threshold, nms threshold, and the max detections.

model.output_confidence_threshold =  0.001
model.output_nms_threshold = 0.7
model.output_max_detections = 300
model.output_max_detections_per_class = 300

This changes the mAP50:95 from 0.3860027 to 0.44955649 on my end.

LongVu-Tr · December 4, 2025, 6:38am

This worked for my script. Thank you all for the assistance!

Topic		Replies	Views
Difference between Yolo11n and Yolo11s General	14	230	December 3, 2025
Yolov8 FPS on Hailo8 General hailo	18	226	October 16, 2025
Yolo11 and HAILO performance and accurancy General hailo , pysdk	2	419	August 14, 2025
Hailo guide: Evaluating model accuracy after compilation Hailo Guides hailo , pysdk , degirum-tools	0	254	August 25, 2025
HEF Model stats and analysis data General hailo	5	121	July 23, 2025

DeGirum YOLOv11s Object detection evaluation: Accuracy metrics decreased compared to Hailo-model-zoo result

Related topics