This guide explains how to evaluate model accuracy after you’ve compiled and deployed your model in PySDK format to a Hailo accelerator (or cloud/local server).
Evaluating accuracy helps ensure your model performs as expected.
Tip: You can compile your model using the AI Hub Cloud Compiler.
If you’re new to the tool, check out the Cloud Compiler Quickstart Guide.
DeGirum PySDK provides built-in tools in degirum_tools to compute:
- mAP (mean Average Precision ) for object detection, pose, and segmentation models
- Top-K Accuracy for image classification models
This guide shows you how to use these utilities with clean Python snippets.
Setting up your environment
This guide assumes that you have installed PySDK, the Hailo AI runtime and driver, and DeGirum Tools.
Click here for more information about installing PySDK.
Click here for information about installing the Hailo runtime and driver.To install degirum_tools, run:
pip install degirum_tools
General evaluation steps
Follow these general steps to evaluate your model:
- Load the compiled model:
model = dg.load_model("your_model_name")
- Select the appropriate evaluation class:
ObjectDetectionModelEvaluator– for object detection, pose estimation, or segmentationImageClassificationModelEvaluator– for image classification
- Provide the necessary inputs:
- A directory of images
- COCO-style annotations (required for detection tasks)
- Run evaluation and review results:
result = evaluator.evaluate()
print(result)
Object detection mAP evaluation example
For detection models, including Hailo-compiled YOLO variants:
import degirum as dg
import degirum_tools
from degirum_tools.detection_eval import ObjectDetectionModelEvaluator
# Load the detection model
model = dg.load_model(
model_name="yolov8n_relu6_face--640x640_quant_hailort_multidevice_1",
inference_host_address="@local",
zoo_url="degirum/hailo",
token=''
)
model.output_confidence_threshold = 0.001
model.output_nms_threshold = 0.7
model.output_max_detections = 300
model.output_max_detections_per_class = 300
# Optional class ID remapping: model → COCO
classmap = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90]
# Create evaluator
evaluator = ObjectDetectionModelEvaluator(model, classmap=classmap)
# Evaluation inputs
image_dir = "/path/to/val2017/images"
coco_json = "/path/to/annotations/instances_val2017.json"
# Evaluate and return mAP results
results = evaluator.evaluate(image_dir, coco_json, max_images=0)
# Print COCO-style mAP results
print("COCO mAP stats:", results[0])
When to use a classmap
Use a classmap if your model’s category IDs differ from standard COCO category IDs or if you’re using a custom label set for evaluation against a standard COCO dataset.
Example classmap:
{
"0": "car",
"1": "person",
"2": "dog"
}
This ensures accurate mapping between predicted labels and ground truth.
COCO category examples
| Category | COCO ID |
|---|---|
| person | 0 |
| bicycle | 1 |
| car | 2 |
| … | … |
| toothbrush | 87 |
You can find the full COCO category map in the official annotation JSON file.
Image classification Top-K accuracy evaluation example
Use this if your image directory has subfolders per class:
import degirum as dg
import degirum_tools
from degirum_tools.classification_eval import ImageClassificationModelEvaluator
# Load classification model
model = dg.load_model(
model_name="yolov8s_imagenet--224x224_quant_hailort_hailo8_1",
inference_host_address="@local",
zoo_url="degirum/hailo",
token=''
)
# Create evaluator
evaluator = ImageClassificationModelEvaluator(
model,
top_k=[1, 5],
show_progress=True
)
# Folder structure should be: /images/cat/, /images/dog/, etc.
image_dir = "/path/to/classification_test"
# Run evaluation (no annotation file required)
results = evaluator.evaluate(image_dir, ground_truth_annotations_path="", max_images=0)
# Print top-k accuracy
print("Top-K Accuracies:", results[0])
Metric breakdown
Detection output (results[0])
- AP: Overall mean Average Precision
- AP50: Precision at IoU ≥ 0.5
- AP75: Precision at IoU ≥ 0.75
- AP_small, AP_medium, AP_large: Size-specific precision
- AR: Recall statistics
Classification output (results[0])
- Top-1 Accuracy: Ground truth label is the top prediction
- Top-5 Accuracy: Ground truth label is among top 5 predictions
Summary
- Use
ObjectDetectionModelEvaluatorfor detection models - Use
ImageClassificationModelEvaluatorfor classification models - For COCO-style evaluation, use a classmap if category IDs don’t match
- Use the
max_imagesparameter to limit evaluation size for faster testing
You’re ready to evaluate!
You’re now ready to accurately assess your model’s performance.
Try evaluating your own models using these steps, and share your experience or questions by replying below.