Two processes created by one script

HI everyone!

Today was a very bad day. It started because I noticed a very high CPU consumption in my test pc and once the search started I saw maaany scripts opened calling the degirum service

At first I though I was due to my lack of knowledge and because lack of locking mechanisms in the api I’d created to start the script. When the request to the api is made it popens a new process with the inference script… that was the idea…
The problem: many processes was being created at the same time

After spending all the day trying and implementing many locking systems, because all was pointing to some sort of race contition between workers, I tried everything from local locks with variables, to system wide locking using files, even I tried some system agnostic locking using redis without no luck.

I’m lying there was luck because at first when I sent the request to the api the number of processes created was over 4 to 5, after implementing the first lock_process system using vaiables the number of processes per request was down to only 2, but this is a problem because the api and te redis only detected 1 process being created.

Now was about to surrender when I did the test I should have done first: to run the script directly on the terminal without the api call and my soul dropped down to the floor:

running the script directly on the terminal creates 2 different processes.

The script I’m running is basically the ‘smart_nvr’ demo but adding arguments to be able to ‘personalize’ the inference (I use the api call to change the selected object to detect, or to change the confidence, etc)

 python3 ./stream.py --input rtmp://input.server/live/livestream --output rtmp://output.server/live/livestream --model_zoo_url aiserver://home/pi/DeGirum/zoo --model_name yolov8s_coco--640x640_quant_hailort_hailo8_1 --device HAILORT/HAILO8 --confidence 0.3 --classes clock

After calling the script in the terminal ps shows two different processes:

livestream # ps -aux | grep stream
root     2100842 95.3  0.7 2721372 504864 pts/0  Sl   22:31   0:34 python3 ./stream.py --input rtmp://input.server/live/livestream --output rtmp://output.server/live/livestream --model_zoo_url aiserver://home/pi/DeGirum/zoo --model_name yolov8s_coco--640x640_quant_hailort_hailo8_1 --device HAILORT/HAILO8 --confidence 0.3 --classes clock 
root     2100870  0.5  0.1 995444 100684 pts/0   Sl   22:31   0:00 python3 ./stream.py --input rtmp://input.server/live/livestream --output rtmp://output.server/live/livestream --model_zoo_url aiserver://home/pi/DeGirum/zoo --model_name yolov8s_coco--640x640_quant_hailort_hailo8_1 --device HAILORT/HAILO8 --confidence 0.3 --classes clock
root     2101117  0.0  0.0   6612  2196 pts/0    S+   22:32   0:00 grep --color=auto stream

Main problem is when I call the api to ‘kill’ the program Is is only able to kill one of them, so after some calls, there is enough scripts running to kill the hailo device preformance at all

This is my version of the smart_nvr (it gets one rtmp or rtsp stream and forwards the inference with tracking id and red bounding boxes to another server):

import degirum as dg, degirum_tools, time
from degirum_tools import streams as dgstreams
import argparse

parser = argparse.ArgumentParser(description="Stream video with object detection.")

parser.add_argument('--input', type=str, default="rtmp://input.server/live/livestream", help='The video source URL.')
parser.add_argument('--output', type=str, default="rtmp://output.serverh/live/livestream", help='The output URL path.')
parser.add_argument('--model_name', type=str, default="yolo11s_coco--640x640_quant_hailort_hailo8_1", help='The model choosen to do the inference')
parser.add_argument('--confidence', type=float, default=0.5, help='Confidence threshold value')
parser.add_argument('--classes', type=str, default="people", help='classes label to search for')
parser.add_argument('--model_zoo_url', type=str, default="aiserver://home/pi/DeGirum/zoo", help='URL path of the model_zoo.')
parser.add_argument('--device', type=str, default="HAILORT/HAILO8", help='Neural Chip type')

args = parser.parse_args()


dg.log.DGLog.set_verbose_state('DEBUG')

hw_location="@local"
model_name = args.model_name
model_zoo_url= args.model_zoo_url
video_source = args.input
video_output= args.output
classes = set(args.classes.split(','))
device_type = args.device
confidence = args.confidence

model_manager = dg.connect(
    inference_host_address=hw_location,
    zoo_url = model_zoo_url
)

model = model_manager.load_model(
    model_name=model_name,
    device_type=device_type,
    output_confidence_threshold=confidence,
    input_pad_method="letterbox",
    image_backend='opencv',
    overlay_color=[255,0,0],
    output_class_set=classes
)

anchor = degirum_tools.AnchorPoint.CENTER

# create object tracker
tracker = degirum_tools.ObjectTracker(
    class_list=classes,
    track_thresh=0.35,
    track_buffer=100,
    match_thresh=0.9999,
    trail_depth=20,
    anchor_point=anchor,
    show_only_track_ids = True,
    annotation_color = [255,0,0]
)

cam_source = dgstreams.VideoSourceGizmo(video_source)

degirum_tools.attach_analyzers(model, [tracker])

detector = dgstreams.AiSimpleGizmo(model)

streamer = dgstreams.VideoStreamerGizmo(video_output, show_ai_overlay=True)

dgstreams.Composition(cam_source >> detector >> streamer).start()

Does anyone knows why this ‘smart_nvr.py’ is spawning two processes instead only one? And how can I fix that?

Hi @dario ,

Indeed, VideoStreamerGizmo creates subprocess, which runs ffmpeg. This is needed because we use system ffmpeg for video encoding - we do not want to link ffmpeg to degirum_tools to avoid system compatibility issues. Also it is nice to offload your main python process from a heavy task of video encoding.
This ffmpeg process is terminated when VideoStreamerGizmo.run() method exits:
VideoStreamerGizmo.run calls: with VideoStreamer(…)and VideoStreamer.__exit__() calls VideoStreamer.stop(), which cleanly closes ffmpeg process.
So if you stop your composition normal way, that ffmpeg process should be closed gracefully.

To stop the composition, you can modify your code

# create named composition object
c = dgstreams.Composition(cam_source >> detector >> streamer)

# start composition non-blocking way
c.start(wait=False)
# all composition gizmos are now running in their threads

#
# here you do what you want - wait for stop event, do your other jobs etc.
#

# end when it is time to stop, just call stop:
c.stop()

Hi @vladk

I tried your ‘fix’ but now the script is not working at all, when I run it, it is still opening 2 processes, but now it simply does nothing :sleepy_face:

I also tried to put the c = dgstreams.Composition(cam_source >> detector >> streamer) in a try/finally block with the c.stop() in the finally , but that does not close the ‘secondary’ script when the first one is closed/killed

Not to mention that also tried, hoping to make it close gracefully, to put it in a with block like this one:

with dgstreams.Composition(cam_source >> detector >> streamer).start():
    pass

that’s intersting, never though of that…

What would the ‘normal way’ of closing the script when it is called by a ‘externall’ app ?

I mean, I have a ‘running’ stream in the input server and when I want to start the inference I make a request to the api’s endpoint to start the script with the config I want :

command = ['python3', './stream.py', '--input', video_source, '--output', video_output,
                            '--model_zoo_url', model_zoo_url,
                            '--model_name', model_name, '--device', device_type,
                            '--confidence', str(confidence_threshold),
                            '--classes', classes_str]

                
logger.info(f"Executing command: {' '.join(map(str, command))}")
try:
    proc = subprocess.Popen(command, shell=False)
except Exception as e:
    logger.exception("Failed to start process for %s: %s", process_key, e)
    return jsonify({"error": 12, "msg": f"Error launching process: {str(e)}"}), 500

# Save the pid for closing the script
pid = proc.pid

I save the pid, to be able to close the scrpt later when calling the api’s ‘stop’ endpoint, but at this moment I only know the pid of the first process created: 2154601, but not the secondary one 2154617…

EDIT

Found one possible solution while was answering.

I’m now using pgid to kill the process and all his children…

# opening the script with pgid
proc = subprocess.Popen(command, preexec_fn=os.setsid, shell=False)

# later in the moment of the killing
pgid = os.getpgid(pid)
os.killpg(pgid, signal.SIGTERM)
p = psutil.Process(pid)
try:
    p.wait(timeout=10)  # waiting for SIGTERM
    logger.info(f"Process group pgid={pgid} terminated gracefully")
except psutil.TimeoutExpired:
    logger: logger.warning(f"Process group pgid={pgid} did not terminate, killing...")
    os.killpg(pgid, signal.SIGKILL)  # force killing

Hi @dario ,

I did the following simple script which does streaming to RTSP server (you may need to adjust hw_location, model_zoo_url, model_name, and video_source to fit your environment).

import degirum as dg, degirum_tools, time
from degirum_tools import streams as dgstreams


hw_location = "@cloud"
model_zoo_url = "degirum/public"
model_name = "yolo_v5s_person_det--512x512_quant_n2x_orca1_1"
video_source = "rtsp://user:password@192.168.0.xx:554"
url_path = "/my-ai-stream"


# load model
model = dg.load_model(
    model_name,
    hw_location,
    model_zoo_url,
    overlay_show_probabilities=True,
    overlay_line_width=1,
)

# video source gizmo
cam_source = dgstreams.VideoSourceGizmo(video_source)

# detection gizmo
detector = dgstreams.AiSimpleGizmo(model)

# video streamer gizmo
streamer = dgstreams.VideoStreamerGizmo(f"rtsp://localhost:8554{url_path}", show_ai_overlay=True)

# start media server to serve RTSP streams
with degirum_tools.MediaServer():
    c = dgstreams.Composition(cam_source >> detector >> streamer)
    c.start(wait=False)
    time.sleep(10)
    c.stop()


And it works as expected: it starts ffmpeg and mediamtx processes, then it does streaming for 10 seconds (time.sleep(10)) then it exists normally. ps shows no processes left:

This is my bash output:

degirum@dgsrv105:~/vladk/pp-bench$ python3 pptest.py &
[1] 759282
degirum@dgsrv105:~/vladk/pp-bench$ ps
    PID TTY          TIME CMD
 757620 pts/12   00:00:00 bash
 759282 pts/12   00:00:04 python3
 759330 pts/12   00:00:00 mediamtx
 759370 pts/12   00:00:00 ps
degirum@dgsrv105:~/vladk/pp-bench$ ps
    PID TTY          TIME CMD
 757620 pts/12   00:00:00 bash
 759282 pts/12   00:00:06 python3
 759330 pts/12   00:00:00 mediamtx
 759399 pts/12   00:00:04 ffmpeg
 759486 pts/12   00:00:00 ps
degirum@dgsrv105:~/vladk/pp-bench$ ps
    PID TTY          TIME CMD
 757620 pts/12   00:00:00 bash
 759282 pts/12   00:00:08 python3
 759330 pts/12   00:00:00 mediamtx
 759399 pts/12   00:00:08 ffmpeg
 759516 pts/12   00:00:00 ps
degirum@dgsrv105:~/vladk/pp-bench$ ps
[1]+  Done                    python3 pptest.py
    PID TTY          TIME CMD
 757620 pts/12   00:00:00 bash
 759528 pts/12   00:00:00 ps

@vladk, thanks for the tests, but as you can see when you use VideoStreamerGizmo he is behaving different, in your case you have bash then the python3 adn the mediamx (due to the local media server).
In my case I don’t use local media server, so my ps output is the same process (with same arguments) duplicated, I’m only ablle to ‘see’ the ffmpeg process if I use pstree:

# ps -aux | grep stream
root     2475670 69.6  0.8 2572888 530264 ?      Ssl  12:44   0:17 python3 ./stream.py --input rtmp://input.server/live/livestream --output rtmp://output.server/live/livestream --hw_location 10.0.0.2:8778 --model_zoo_url aiserver://home/pi/DeGirum/zoo --model_name yolov8s_coco--640x640_quant_hailort_hailo8_1 --device HAILORT/HAILO8 --confidence 0.3 --classes clock --notification_config mailtos://[censored]&from='AI notification <noreply@mail.com>'&to=destination@mail.com --clip_save --clip_duration 1 --bucket_name bucket
root     2475687  1.1  0.1 1001108 106656 ?      Sl   12:44   0:00 python3 ./stream.py --input rtmp://input.server/live/livestream --output rtmp://output.server/live/livestream --hw_location 10.0.0.2:8778 --model_zoo_url aiserver://home/pi/DeGirum/zoo --model_name yolov8s_coco--640x640_quant_hailort_hailo8_1 --device HAILORT/HAILO8 --confidence 0.3 --classes clock --notification_config mailtos://[censored]&from='AI notification <noreply@mail.com>'&to=destination@mail.com --clip_save --clip_duration 1 --bucket_name bucket
           ├─gunicorn(2210062)─┬─gunicorn(2210065)
           │                   ├─gunicorn(2210066)───python3(2475670)─┬─ffmpeg(2475729)─┬─{ffmpeg}(2475757)
           │                   │                                      │                 ├─{ffmpeg}(2475758)
           │                   │                                      │                 ├─{ffmpeg}(2475759)
           │                   │                                      │                 ├─{ffmpeg}(2475760)
           │                   │                                      │                 ├─{ffmpeg}(2475761)
           │                   │                                      │                 ├─{ffmpeg}(2475762)
           │                   │                                      │                 ├─{ffmpeg}(2475763)
           │                   │                                      │                 ├─{ffmpeg}(2475764)
           │                   │                                      │                 ├─{ffmpeg}(2475766)
           │                   │                                      │                 ├─{ffmpeg}(2475767)
           │                   │                                      │                 ├─{ffmpeg}(2475768)
           │                   │                                      │                 ├─{ffmpeg}(2475769)
           │                   │                                      │                 ├─{ffmpeg}(2475770)
           │                   │                                      │                 ├─{ffmpeg}(2475771)
           │                   │                                      │                 ├─{ffmpeg}(2475772)
           │                   │                                      │                 ├─{ffmpeg}(2475773)
           │                   │                                      │                 ├─{ffmpeg}(2475774)
           │                   │                                      │                 ├─{ffmpeg}(2475775)
           │                   │                                      │                 ├─{ffmpeg}(2475776)
           │                   │                                      │                 ├─{ffmpeg}(2475777)
           │                   │                                      │                 ├─{ffmpeg}(2475778)
           │                   │                                      │                 ├─{ffmpeg}(2475779)
           │                   │                                      │                 ├─{ffmpeg}(2475780)
           │                   │                                      │                 └─{ffmpeg}(2475781)
           │                   │                                      ├─python3(2475687)───{python3}(2475688)

Your example is perfectly good when you know the duration of the stream, and when you use VideoStreamerGizmo to boot up a local media server, but in my case I don’t know the duration and the stream is pushed to ‘external’ video server, and also the execution of the script is controlled externally, started/stoped by a primary script

But as said, usign pgid to kill it in the ‘caller’ script solved the orphan process when closed by the caller script.

Any ways thank you, because now I know how to properly close it when fixed duration streams are being used :smiley:

Hi @dario ,

It is good that you solved the problem.
But for the sake of clarity I would like to add that the script I provided works fine even when MediaMTX is stated separately: just remove with degirum_tools.MediaServer(): line.

Also, when you call c.stop(), all gizmos will be stopped. And it works with any stream including infinite one. BTW, my example uses RTSP camera as a source, so it is infinite stream.
After script completion, ffmpeg process is closed OK - just checked with mediamtx started manually.

I don’t know what is making it behave different, but I tried your approach and, in my tests, it simply does nothing, I mean is like it was trying to ‘read’ the stream but the ffmpeg is not even started, I mean the ‘secondary’ process is being created, but the ffmpeg output is not show (as it does normally) nor the degirum server is being called (I’m doing hailo monitor to ‘see’ the workload and the inference never actually starts)

Is like the script stops right after starting…