This guide shows how to build multi-stream, multi-model apps on Hailo using DeGirum PySDK and DeGirum Tools.
We’ll cover three common patterns:
-
3 models, 3 video streams — each in its own thread
-
3 models, 1 video stream — fuse results with a compound model and a manual fusion variant
-
3 models, 3 video streams — iterate results together in one loop (single‑threaded)
Let’s get started by installing the degirum and degirum_tools packages!
Setting up your environment
This guide assumes that you have installed PySDK, the Hailo AI runtime and driver, and DeGirum Tools.
Click here for more information about installing PySDK.
Click here for information about installing the Hailo runtime and driver.To install degirum_tools, run:
pip install degirum_tools
Prerequisites
-
degirumanddegirum_toolsinstalled -
A Hailo device (e.g., HAILO8L) with drivers/runtime installed
-
Access to the DeGirum public Hailo model zoo (no token required), or a private zoo (token required)
Tip: This guide uses DeGirum’s Hailo zoo by default. Swap model names/zoo to match your project as needed.
ModelSpec: One place to define models
Use ModelSpec from degirum_tools to declare models once and load them consistently. Keep device/runtime details in model_properties.
# === Specify where to run the inference ===
# hw_location: where you want to run inference
# "@cloud" to use DeGirum cloud
# "@local" to run on local machine
# IP address for AI server inference
# model_zoo_url: url/path for model zoo
# cloud_zoo_url: valid for @cloud, @local, and ai server inference options
# '': ai server serving models from local folder
# path to json file: single model zoo in case of @local inference
from degirum_tools import ModelSpec # Adjust import if needed
from degirum_tools import remote_assets
hw_location = "@local"
model_zoo_url = "degirum/hailo"
# === Sources (define once, reuse everywhere) ===
src1 = 0 # webcam (or your device index)
src2 = remote_assets.person_face_hand # sample clip / replace with your path/URL
src3 = remote_assets.person_face_hand # another source (replace as needed)
# === Model specs ===
model1_spec = ModelSpec(
model_name="yolov8n_relu6_face--640x640_quant_hailort_multidevice_1",
zoo_url=model_zoo_url,
inference_host_address=hw_location,
model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model2_spec = ModelSpec(
model_name="yolov8n_relu6_hand--640x640_quant_hailort_multidevice_1",
zoo_url=model_zoo_url,
inference_host_address=hw_location,
model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model3_spec = ModelSpec(
model_name="yolov8n_relu6_person--640x640_quant_hailort_multidevice_1",
zoo_url=model_zoo_url,
inference_host_address=hw_location,
model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
# === Load model objects from specs (simple) ===
model1 = model1_spec.load_model()
model2 = model2_spec.load_model()
model3 = model3_spec.load_model()
Use Case 1 — 3 models, 3 video streams, each in a separate thread
When to use: Each model runs independently on its own source. You want maximum concurrency with minimal coordination.
import threading
import degirum_tools
# Map models to sources and labels
configurations = [
{"model": model1, "source": src1, "display_name": "Model 1 (Face)"},
{"model": model2, "source": src2, "display_name": "Model 2 (Hand)"},
{"model": model3, "source": src3, "display_name": "Model 3 (Person)"},
]
# Single-stream runner
def run_inference(model, source, display_name):
with degirum_tools.Display(display_name) as output_display:
for inference_result in degirum_tools.predict_stream(model, source):
output_display.show(inference_result)
print(f"✅ Stream '{display_name}' has finished.")
# Launch independent threads
threads = []
for cfg in configurations:
t = threading.Thread(
target=run_inference,
args=(cfg["model"], cfg["source"], cfg["display_name"]),
daemon=True,
)
threads.append(t)
t.start()
# Wait for all threads to complete
for t in threads:
t.join()
print("🎉 All inference streams have been processed.")
Notes
-
degirum_tools.predict_stream()is optimized for streaming; it handles capture and inference efficiently. -
Each thread owns its model and source; there’s no cross‑thread state.
Use Case 2 — 3 models, 1 video stream: combine results
Two options here:
A) Compound model (simplest)
Let the tooling fuse results for you using CombiningCompoundModel.
import degirum_tools
# Use the first source for the single-stream case
video_source = src1
# Compose a compound model from your three models
combined_model = degirum_tools.CombiningCompoundModel(
degirum_tools.CombiningCompoundModel(model2, model1),
model3,
)
# Stream + display
with degirum_tools.Display("Compound: Models 1+2+3") as display:
for inference_result in degirum_tools.predict_stream(combined_model, video_source):
display.show(inference_result)
B) Manual fusion (more control)
Run three predictors off the same video stream and merge results yourself.
import degirum_tools
from itertools import zip_longest
with degirum_tools.Display("Manual Fusion (Single Stream)") as display, \
degirum_tools.open_video_stream(src1) as video_stream:
# Create prediction generators bound to the same underlying stream
p1 = model1.predict_batch(degirum_tools.video_source(video_stream))
p2 = model2.predict_batch(degirum_tools.video_source(video_stream))
p3 = model3.predict_batch(degirum_tools.video_source(video_stream))
# Iterate in lockstep; guard against None frames
for r1, r2, r3 in zip_longest(p1, p2, p3):
if r1 is None or r2 is None or r3 is None:
continue
# Merge detections into one result; reuse r1 as the carrier
r1.results.extend(r2.results)
r1.results.extend(r3.results)
display.show(r1.image_overlay)
When to choose which
-
Compound model: cleanest code, good default for most apps.
-
Manual fusion: choose this if you need per-model thresholds, class remapping, or custom merging logic.
Use Case 3 — 3 models, 3 video streams, iterated together (single thread)
When to use: You want a single control loop (no threads) that advances all streams step‑by‑step. This is handy for deterministic playback or when you want explicit ordering.
import degirum_tools
from itertools import zip_longest
# Use a separate display per stream
with degirum_tools.Display("Model 1 (src1)") as d1, \
degirum_tools.Display("Model 2 (src2)") as d2, \
degirum_tools.Display("Model 3 (src3)") as d3, \
degirum_tools.open_video_stream(src1) as s1, \
degirum_tools.open_video_stream(src2) as s2, \
degirum_tools.open_video_stream(src3) as s3:
# Create prediction generators
p1 = model1.predict_batch(degirum_tools.video_source(s1))
p2 = model2.predict_batch(degirum_tools.video_source(s2))
p3 = model3.predict_batch(degirum_tools.video_source(s3))
# Advance all three streams in lockstep
for r1, r2, r3 in zip_longest(p1, p2, p3):
if r1 is not None:
d1.show(r1)
if r2 is not None:
d2.show(r2)
if r3 is not None:
d3.show(r3)
Notes
-
zip_longestavoids stalling if one stream ends earlier; we skipNoneframes. -
If your sources have very different frame rates, consider threads (Use Case 1) or a queue‑based multiplexer.
Additional Resources
The code snippets provided above are also available as a Jupyter Notebook.
For more detailed examples and instructions on deploying models with Hailo hardware, visit the Hailo Examples Repository. This repository includes tailored scripts for optimizing AI workloads on edge devices.