Hailo guide: Multi-model multi-stream inference with DeGirum PySDK and Hailo

gowna · September 18, 2025, 3:09am

This guide shows how to build multi-stream, multi-model apps on Hailo using DeGirum PySDK and DeGirum Tools.

We’ll cover three common patterns:

3 models, 3 video streams — each in its own thread
3 models, 1 video stream — fuse results with a compound model and a manual fusion variant
3 models, 3 video streams — iterate results together in one loop (single‑threaded)

Let’s get started by installing the degirum and degirum_tools packages!

Setting up your environment

This guide assumes that you have installed PySDK, the Hailo AI runtime and driver, and DeGirum Tools.

Click here for more information about installing PySDK.
Click here for information about installing the Hailo runtime and driver.

To install degirum_tools, run:
pip install degirum_tools

Prerequisites

degirum and degirum_tools installed
A Hailo device (e.g., HAILO8L) with drivers/runtime installed
Access to the DeGirum public Hailo model zoo (no token required), or a private zoo (token required)

Tip: This guide uses DeGirum’s Hailo zoo by default. Swap model names/zoo to match your project as needed.

ModelSpec: One place to define models

Use ModelSpec from degirum_tools to declare models once and load them consistently. Keep device/runtime details in model_properties.

# === Specify where to run the inference ===
# hw_location: where you want to run inference
#     "@cloud" to use DeGirum cloud
#     "@local" to run on local machine
#     IP address for AI server inference
# model_zoo_url: url/path for model zoo
#     cloud_zoo_url: valid for @cloud, @local, and ai server inference options
#     '': ai server serving models from local folder
#     path to json file: single model zoo in case of @local inference

from degirum_tools import ModelSpec  # Adjust import if needed
from degirum_tools import remote_assets

hw_location = "@local"
model_zoo_url = "degirum/hailo"

# === Sources (define once, reuse everywhere) ===
src1 = 0  # webcam (or your device index)
src2 = remote_assets.person_face_hand  # sample clip / replace with your path/URL
src3 = remote_assets.person_face_hand  # another source (replace as needed)

# === Model specs ===
model1_spec = ModelSpec(
    model_name="yolov8n_relu6_face--640x640_quant_hailort_multidevice_1",
    zoo_url=model_zoo_url,
    inference_host_address=hw_location,
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)

model2_spec = ModelSpec(
    model_name="yolov8n_relu6_hand--640x640_quant_hailort_multidevice_1",
    zoo_url=model_zoo_url,
    inference_host_address=hw_location,
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)

model3_spec = ModelSpec(
    model_name="yolov8n_relu6_person--640x640_quant_hailort_multidevice_1",
    zoo_url=model_zoo_url,
    inference_host_address=hw_location,
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)

# === Load model objects from specs (simple) ===
model1 = model1_spec.load_model()
model2 = model2_spec.load_model()
model3 = model3_spec.load_model()

Use Case 1 — 3 models, 3 video streams, each in a separate thread

When to use: Each model runs independently on its own source. You want maximum concurrency with minimal coordination.

import threading
import degirum_tools

# Map models to sources and labels
configurations = [
    {"model": model1, "source": src1, "display_name": "Model 1 (Face)"},
    {"model": model2, "source": src2, "display_name": "Model 2 (Hand)"},
    {"model": model3, "source": src3, "display_name": "Model 3 (Person)"},
]

# Single-stream runner
def run_inference(model, source, display_name):
    with degirum_tools.Display(display_name) as output_display:
        for inference_result in degirum_tools.predict_stream(model, source):
            output_display.show(inference_result)
    print(f"✅ Stream '{display_name}' has finished.")

# Launch independent threads
threads = []
for cfg in configurations:
    t = threading.Thread(
        target=run_inference,
        args=(cfg["model"], cfg["source"], cfg["display_name"]),
        daemon=True,
    )
    threads.append(t)
    t.start()

# Wait for all threads to complete
for t in threads:
    t.join()

print("🎉 All inference streams have been processed.")

Notes

degirum_tools.predict_stream() is optimized for streaming; it handles capture and inference efficiently.
Each thread owns its model and source; there’s no cross‑thread state.

Use Case 2 — 3 models, 1 video stream: combine results

Two options here:

A) Compound model (simplest)

Let the tooling fuse results for you using CombiningCompoundModel.

import degirum_tools

# Use the first source for the single-stream case
video_source = src1

# Compose a compound model from your three models
combined_model = degirum_tools.CombiningCompoundModel(
    degirum_tools.CombiningCompoundModel(model2, model1),
    model3,
)

# Stream + display
with degirum_tools.Display("Compound: Models 1+2+3") as display:
    for inference_result in degirum_tools.predict_stream(combined_model, video_source):
        display.show(inference_result)

B) Manual fusion (more control)

Run three predictors off the same video stream and merge results yourself.

import degirum_tools
from itertools import zip_longest

with degirum_tools.Display("Manual Fusion (Single Stream)") as display, \
     degirum_tools.open_video_stream(src1) as video_stream:

    # Create prediction generators bound to the same underlying stream
    p1 = model1.predict_batch(degirum_tools.video_source(video_stream))
    p2 = model2.predict_batch(degirum_tools.video_source(video_stream))
    p3 = model3.predict_batch(degirum_tools.video_source(video_stream))

    # Iterate in lockstep; guard against None frames
    for r1, r2, r3 in zip_longest(p1, p2, p3):
        if r1 is None or r2 is None or r3 is None:
            continue

        # Merge detections into one result; reuse r1 as the carrier
        r1.results.extend(r2.results)
        r1.results.extend(r3.results)

        display.show(r1.image_overlay)

When to choose which

Compound model: cleanest code, good default for most apps.
Manual fusion: choose this if you need per-model thresholds, class remapping, or custom merging logic.

Use Case 3 — 3 models, 3 video streams, iterated together (single thread)

When to use: You want a single control loop (no threads) that advances all streams step‑by‑step. This is handy for deterministic playback or when you want explicit ordering.

import degirum_tools
from itertools import zip_longest

# Use a separate display per stream
with degirum_tools.Display("Model 1 (src1)") as d1, \
     degirum_tools.Display("Model 2 (src2)") as d2, \
     degirum_tools.Display("Model 3 (src3)") as d3, \
     degirum_tools.open_video_stream(src1) as s1, \
     degirum_tools.open_video_stream(src2) as s2, \
     degirum_tools.open_video_stream(src3) as s3:

    # Create prediction generators
    p1 = model1.predict_batch(degirum_tools.video_source(s1))
    p2 = model2.predict_batch(degirum_tools.video_source(s2))
    p3 = model3.predict_batch(degirum_tools.video_source(s3))

    # Advance all three streams in lockstep
    for r1, r2, r3 in zip_longest(p1, p2, p3):
        if r1 is not None:
            d1.show(r1)
        if r2 is not None:
            d2.show(r2)
        if r3 is not None:
            d3.show(r3)

Notes

zip_longest avoids stalling if one stream ends earlier; we skip None frames.
If your sources have very different frame rates, consider threads (Use Case 1) or a queue‑based multiplexer.

Additional Resources

The code snippets provided above are also available as a Jupyter Notebook.

For more detailed examples and instructions on deploying models with Hailo hardware, visit the Hailo Examples Repository. This repository includes tailored scripts for optimizing AI workloads on edge devices.

Topic		Replies	Views
Hailo guide: Running multi-stream inference with DeGirum PySDK Hailo Guides hailo , pysdk	2	387	September 12, 2025
Hailo guide 3: Simplifying object detection on a Hailo device using DeGirum PySDK Hailo Guides hailo , pysdk	5	1149	December 26, 2025
License plate detection and recognition – pipelining two models on Hailo devices Guides hailo , pysdk , degirum-tools	10	202	October 12, 2025
Hailo guide 1: Hailo world – running your first inference on a Hailo device using DeGirum PySDK Hailo Guides hailo , pysdk	17	1073	October 29, 2025
Hailo guide 2: Running your first object detection model on a Hailo device using DeGirum PySDK Hailo Guides hailo , pysdk	0	723	February 7, 2025