Real-time object detection on RTSP streams with DeGirum PySDK

DeGirum PySDK enables seamless integration of AI models into real-time applications, including live video analysis from RTSP-enabled cameras. This guide walks you through processing and displaying AI inference results dynamically from an RTSP stream using the YOLOv8 object detection model.

Prerequisites

  1. DeGirum PySDK: Installed and configured on your system. See DeGirum/hailo_examples for instructions.
  2. RTSP camera stream: Obtain the RTSP URL of your camera. Replace username, password, ip, and port in the script with your camera’s credentials.
  3. Token: If using cloud inference, ensure you have a valid token. For local inference, leave the token empty.

Script overview

This script:

  1. Loads the YOLOv8 object detection model.
  2. Processes the RTSP video stream to detect objects in real time.
  3. Displays the inference results dynamically in a dedicated window.

Code example

import degirum as dg, degirum_tools

# Choose inference host address
inference_host_address = "@cloud"
# inference_host_address = "@local"

# Choose zoo_url
zoo_url = "degirum/models_hailort"
# zoo_url = "../models"

# Set token
token = degirum_tools.get_token()
# token = '' # Leave empty for local inference

# Specify the AI model and video source
model_name = "yolov8n_relu6_coco--640x640_quant_hailort_hailo8l_1"
video_source = "rtsp://username:password@ip:port/cam/realmonitor?channel=1&subtype=0"  # Replace with your camera RTSP URL

# Load the AI model
model = dg.load_model(
    model_name=model_name, 
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=token,
)

# Run AI inference on the video stream and display the results
with degirum_tools.Display("AI Camera") as output_display:
    for inference_result in degirum_tools.predict_stream(model, video_source):
        output_display.show(inference_result)

Steps to run the script

  1. Set up the RTSP stream:
  • Replace the video_source string with the RTSP URL of your camera.
  • Example format: rtsp://username:password@ip:port/cam/realmonitor?channel=1&subtype=0.
  1. Configure inference:
  • Use @cloud for cloud inference or @local for local device inference.
  • Specify the appropriate zoo_url for accessing your model zoo.
  1. Load the model:
  • Replace model_name with your desired model if you want to detect objects other than the default YOLOv8 configuration.
  1. Run the script:
  • Execute the script to process the RTSP feed in real time.
  • The detected objects will be displayed dynamically in the window labeled “AI Camera.”
  1. Stop the display:
  • Press x or q to exit the display window.

Applications

  • Surveillance: Monitor live feeds for security and safety.
  • Traffic analysis: Analyze vehicles and pedestrians in real time.
  • Industrial monitoring: Detect objects in manufacturing or warehouse operations.

Additional resources

For more examples and advanced use cases, visit our Hailo Examples Repository. This repository provides scripts and guidance for deploying AI models on various hardware configurations.

Hello @shashi , good day to you.
Thank you for sharing the above guide.

I am trying to implement the same, however I get the following errors while executing it,
error while decoding
left block unavailable for requested intra mode
cabac decode of qscale diff failed at

The code that I have used is as follows:

import degirum as dg, degirum_tools

#Basic setup
inference_host_address = "@local"
zoo_url = "/home/pi/tests/models"
model_name = "yolov8n_relu6_human_head--640x640_quant_hailort_hailo8l_1"
video_source='rtsp://192.168.100.54:8554/cam'

print("Load Model..")
model = dg.load_model(
    model_name = model_name,
    inference_host_address = inference_host_address,
    zoo_url = zoo_url
)

print("Press 'x' or 'q' to stop.")

# show results of inference
with degirum_tools.Display("AI Camera") as output_display:
    for inference_result in degirum_tools.predict_stream(model, video_source):
        output_display.show(inference_result)

Setup:

  1. The RTSP server is a Raspberry Pi Zero 2 W which is connected to the Raspberry Pi Camera 3 module.
  2. The RTSP stream was validated through VLC Media player over laptop running windows and it was okay.
  3. The Above code is running on Raspberry Pi 5 with the AI Hat containing Hailo8l chip.

NOTE: The above mentioned Raspberry Pi Zero 2W, Raspberry Pi5 and the laptop are all connected to the same WiFi, hence in the same LAN. All three devices are present in the same network segment.

Observations:

  1. While the video output in the VLC media player was smooth, there was issues while visualising the output through output_display.
  2. It looked like the video stream was not getting completely rendered.
  3. However, when I stand for few seconds near the camera and wait for the video/frame to get stabilised, then the face/head is detected and the rectangle is visible, however the entire image is never completely rendered.

Would you be kind to please let me know the reason for the above errors and the corresponding corrections that I must make from my side in order to perform the inference correctly ?

Hi @Akshay.Kumar
We found a thread on StackOverflow that this is related to frame buffer overflow, something which we did not observe before: ffmpeg - opencv read error:[h264 @ 0x8f915e0] error while decoding MB 53 20, bytestream -7 - Stack Overflow. This could be because we have been using more powerful hardware and have not encountered such bottlenecks. Here are our suggestions to debug this further:

  1. Check if RTSP stream renders smoothly without any inference: you can use the code below for this:
import degirum as dg, degirum_tools
 
video_source="<your rtsp url>"
 
# show video
with degirum_tools.Display("Camera") as display:
    with degirum_tools.open_video_stream(video_source) as video_stream:
        for frame in degirum_tools.video_source(video_stream):
            display.show(frame)
  1. Check if you can decrease the FPS and/or resolution of your RTSP stream as video decoding and resizing can take up a lot of compute.

Hello @shashi ,
I highly appreciate the guidance provided, thank you.

  1. I followed your instructions and I observed that the behaviour did not change in any way. In order to reduce the computation, I even changed the code to just print. The errors started appearing even before the print statement was executed,

  2. I have reduced the fps of the video to 15 (original was 30), however no change. I will try and change the resolution and lower further the fps in order to see if that helps.

It looks like the reference link you provided has a good explanation for it. I will go through it more deeply and try to work on it.

Thank you once again.

Our team implemented a new method based on the suggestions in the thread:

import degirum as dg, degirum_tools
import threading
import degirum_tools.streams

# Basic setup
inference_host_address = "@cloud"
zoo_url = "degirum/hailo"
model_name = "yolov8n_relu6_human_head--640x640_quant_hailort_hailo8l_1"
video_source = 0

print("Load Model..")
model = dg.load_model(
    model_name=model_name,
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=degirum_tools.get_token(),
)


class BufferedStream:
    def __init__(self, source):
        self.abort = False
        self._source = source
        self._buffer = degirum_tools.streams.Stream(maxsize=10, allow_drop=True)
        self._idx = 0
        self._thread = threading.Thread(target=self._run)
        self._thread.start()

    def _run(self):
        with degirum_tools.open_video_stream(self._source) as video_stream:
            for frame in degirum_tools.video_source(video_stream):
                print("put ", self._idx)
                self._buffer.put((frame, self._idx))
                self._idx += 1
                if self.abort:
                    self._buffer.put(None)
                    break

    def __call__(self):
        for frame in self._buffer:
            print("get ", frame[1])
            yield frame


print("Press 'x' or 'q' to stop.")

# show results of inference
buffered_source = BufferedStream(video_source)
with degirum_tools.Display("AI Camera") as output_display:
    for inference_result in model.predict_batch(buffered_source()):
        output_display.show(inference_result)
buffered_source.abort = True

You can try to see if this helps.

Also, do you have a USB camera that you can directly attach to the RPi5 with Hailo8L? You can just test how that performance looks.

1 Like

Hello @shashi ,
Thanks a million!

I tried the above code snipped using a USB Webcam, and it was definitely very smooth. Could process around 24fps , there were no errors.
However, when the RTSP source was used, it still contained errors and the rendering was still not okay, which effected the object detection.

Based on the above results I think I will try to work using a WebCam instead of RTSP, but will also try to understand why the RTSP simply won’t work even though the resolution was decreased and fps was decreased to 5fps.

As always, I appreciate the help and guidance provided.
Thank you very much.