Issues After Compiling Fine-Tuned YOLO11L Model with Degirum for Hailo-8

I compiled my fine-tuned YOLO11L model using Degirum, but the model’s behavior has become abnormal. While the model worked correctly in a CUDA environment, after converting it into an HEF file, it started producing an excessive number of detections.

My hardware setup uses Hailo-8. During compilation, I set the input image width and height to 640, and configured the runtime and device as HAILORT and HAILO8, respectively. For calibration, I randomly selected 100 images from the training dataset.

The generated JSON file after compilation looked correct, so I expected the HEF model to run normally. However, when I ran the real-time detection module with a Raspberry Pi camera, the screen was filled with a large number of random bounding boxes.

I would greatly appreciate your technical support and guidance on this matter.

I will also attach the following files for reference:

    1. An image showing the corrupted detections after converting to HEF

      Thank you.

Thanks for reporting this. To help us reproduce the issue and debug quickly, could you please share:

  • A set of ~100 images used in your compilation pipeline since the original dataset is 1TB.

  • The exact sample image mentioned in your comment

  • Access to the Google Drive link you referenced (ensure sharing is enabled), or send the .pt model file directly

Once we have these, we’ll replicate your setup and follow up with a fix or clear workaround. Appreciate your help!

Hello!

I’m sending you the 100 calibration images we used.
These images are part of the same dataset that was used for fine-tuning.

Please let me know if you need any additional information.

Thank you!

2025년 10월 29일 (수) 오전 3:35, Mehrdad via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성:

(Attachment calib640JPEG.zip is missing)

I was unable to send the compressed file, so I am sharing it again via a Google Drive link.

https://drive.google.com/drive/folders/1QHTsOJ-rpfY02sUuU2t5GdjTNAmho6YC?usp=drive_link

2025년 10월 29일 (수) 오전 10:49, 이창재 <changjae.lee@rovoroad.com>님이 작성:

When I click on the link, I see a ‘Request Access’ button.
Please grant access so I can download the PT file and the images.

original image

2025년 10월 29일 (수) 오전 10:57, 이창재 <changjae.lee@rovoroad.com>님이 작성:

I have uploaded the .pt file, 100 calibration images, and the test images to the Google Drive with granted access.

2025년 10월 29일 (수) 오전 11:11, Mehrdad via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성:

We reproduced the behavior: the model shows the same failures under OpenVINO INT8 (while FP32 works), which indicates the network is highly sensitive to quantization rather than a Hailo-specific issue. Note that OpenVINO INT8 quantization is less restrictive than Hailo’s full-INT8 flow, so sensitivity here is a strong signal.

Recommendations

Hello,

So, does this mean that with the current model there’s no way to solve the issue?

Do you have any recommended model sizes? For example, would the M size work, and have you seen any cases where it was used successfully?

But one thing I’m curious about: the base YOLO11-L model quantized and ran well on Hailo-8, and its performance was good. Why is it that once we fine-tune the model, quantization doesn’t work as well?

2025년 10월 30일 (목) 오전 4:33, Mehrdad via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성:

Hi @changjae.lee

This behavior is very dependent on the dataset and there is no easy rule to predict for what models this can happen. Do you know the size of training data for this model? And how many epochs you trained it for?

The training dataset is a total of 500GB.
We trained the model for 3 epochs, repeated 3 times.

2025년 10월 30일 (목) 오전 11:24, Shashi via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성:

Hi @changjae.lee

Sorry, by dataset size I meant number of images. So, you started with coco trained weights of YOLO11-L and ran 3 epochs of training? What does repeated 3 times mean?

Sorry for the confusion. The dataset contains about 700,000 images in total.
And for training, you can consider it as having run for a total of 9 epochs.

2025년 10월 30일 (목) 오전 11:43, Shashi via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성:

To be precise,
the dataset was stored on Google Drive and I trained it in a Colab environment.
Since I couldn’t upload the entire 700GB dataset at once,
I split it into 100GB chunks for training.

So the process was: upload 100GB → train for 3 epochs → upload the next 100GB → train again … → repeat this process over the entire dataset 3 times.

Could training in this way have affected the quantization results?

2025년 10월 30일 (목) 오전 11:43, Shashi via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성:

Hi @changjae.lee

Thanks for all the information. We will discuss internally and let you know if we have any suggestions.

네 감사합니다!

2025년 10월 30일 (목) 오후 1:06, Shashi via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성:

Hi Changjae

This is mostly a matter of dataset size and overfitting. The Yolov8n.pt and Yolov8l.pt training hyperparameters are different on the augmentation scales so overfitting does not happen.
The yolov8l or yolo11l have more aggressive augmentation, attached.
–scale 0.9 --mixup 0.15 --copy-paste 0.3
vs
yolov8n and yolov8s augmentation
–scale 0.5 --mixup 0 --copy-paste 0

This will help with overfitting and generalization which also help with quantization sensitivity.

My suggestions are

  1. train smaller model, S or M
  2. train with relu6
  3. When training larger models, use more aggressive augmentation

Hope this helps
Mehrdad

(attachments)


Thank you! I’ll test it as you suggested and get back to you afterward.

2025년 10월 31일 (금) 오전 3:59, Mehrdad via DeGirum Community <notifications@degirum.discoursemail.com>님이 작성: