Hailo guide: Evaluating model accuracy after compilation

This guide explains how to evaluate model accuracy after you’ve compiled and deployed your model in PySDK format to a Hailo accelerator (or cloud/local server).
Evaluating accuracy helps ensure your model performs as expected.

Tip: You can compile your model using the AI Hub Cloud Compiler.
If you’re new to the tool, check out the Cloud Compiler Quickstart Guide.

DeGirum PySDK provides built-in tools in degirum_tools to compute:

  • mAP (mean Average Precision ) for object detection, pose, and segmentation models
  • Top-K Accuracy for image classification models

This guide shows you how to use these utilities with clean Python snippets.

Setting up your environment

This guide assumes that you have installed PySDK, the Hailo AI runtime and driver, and DeGirum Tools.

Click here for more information about installing PySDK.
Click here for information about installing the Hailo runtime and driver.

To install degirum_tools, run:

pip install degirum_tools

General evaluation steps

Follow these general steps to evaluate your model:

  1. Load the compiled model:
model = dg.load_model("your_model_name")
  1. Select the appropriate evaluation class:
  • ObjectDetectionModelEvaluator – for object detection, pose estimation, or segmentation
  • ImageClassificationModelEvaluator – for image classification
  1. Provide the necessary inputs:
  • A directory of images
  • COCO-style annotations (required for detection tasks)
  1. Run evaluation and review results:
result = evaluator.evaluate()
print(result)

Object detection mAP evaluation example

For detection models, including Hailo-compiled YOLO variants:

import degirum as dg
import degirum_tools
from degirum_tools.detection_eval import ObjectDetectionModelEvaluator

# Load the detection model
model = dg.load_model(
    model_name="yolov8n_relu6_face--640x640_quant_hailort_multidevice_1",
    inference_host_address="@local",
    zoo_url="degirum/hailo",
    token=''
)

model.output_confidence_threshold =  0.001
model.output_nms_threshold = 0.7
model.output_max_detections = 300
model.output_max_detections_per_class = 300

# Optional class ID remapping: model → COCO
classmap = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
            27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51,
            52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77,
            78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90]

# Create evaluator
evaluator = ObjectDetectionModelEvaluator(model, classmap=classmap)

# Evaluation inputs
image_dir = "/path/to/val2017/images"
coco_json = "/path/to/annotations/instances_val2017.json"

# Evaluate and return mAP results
results = evaluator.evaluate(image_dir, coco_json, max_images=0)

# Print COCO-style mAP results
print("COCO mAP stats:", results[0])

When to use a classmap

Use a classmap if your model’s category IDs differ from standard COCO category IDs or if you’re using a custom label set for evaluation against a standard COCO dataset.

Example classmap:

{
  "0": "car",
  "1": "person",
  "2": "dog"
}

This ensures accurate mapping between predicted labels and ground truth.

COCO category examples

Category COCO ID
person 0
bicycle 1
car 2
toothbrush 87

You can find the full COCO category map in the official annotation JSON file.

Image classification Top-K accuracy evaluation example

Use this if your image directory has subfolders per class:

import degirum as dg
import degirum_tools
from degirum_tools.classification_eval import ImageClassificationModelEvaluator

# Load classification model
model = dg.load_model(
    model_name="yolov8s_imagenet--224x224_quant_hailort_hailo8_1",
    inference_host_address="@local",
    zoo_url="degirum/hailo",
    token=''
)

# Create evaluator
evaluator = ImageClassificationModelEvaluator(
    model,
    top_k=[1, 5],
    show_progress=True
)

# Folder structure should be: /images/cat/, /images/dog/, etc.
image_dir = "/path/to/classification_test"

# Run evaluation (no annotation file required)
results = evaluator.evaluate(image_dir, ground_truth_annotations_path="", max_images=0)

# Print top-k accuracy
print("Top-K Accuracies:", results[0])

Metric breakdown

Detection output (results[0])

  • AP: Overall mean Average Precision
  • AP50: Precision at IoU ≥ 0.5
  • AP75: Precision at IoU ≥ 0.75
  • AP_small, AP_medium, AP_large: Size-specific precision
  • AR: Recall statistics

Classification output (results[0])

  • Top-1 Accuracy: Ground truth label is the top prediction
  • Top-5 Accuracy: Ground truth label is among top 5 predictions

Summary

  • Use ObjectDetectionModelEvaluator for detection models
  • Use ImageClassificationModelEvaluator for classification models
  • For COCO-style evaluation, use a classmap if category IDs don’t match
  • Use the max_images parameter to limit evaluation size for faster testing

You’re ready to evaluate!

You’re now ready to accurately assess your model’s performance.

Try evaluating your own models using these steps, and share your experience or questions by replying below.

2 Likes