How can I get the track_id of a model (yolov8n_relu6_coco_pose) when is combined with another Cropped Model?

Hi, I’m combining three models, one of them (person_model) is tracked, the other ones (hands, landmarks) are combined in a cropped operation. When I receive “results” for the first one (person_model), does include its track_id, bbox, keypoints,etc. but after that I receive results for the other one: landmarks in cropped hands, but track_id is “0”. Here you have important parts of my code:

person_model = dg.load_model(
    model_name=f"yolov8n_relu6_coco_pose--640x640_quant_hailort_hailo{HAILO_VERSION}_1",
    inference_host_address="@local",
    zoo_url="/home/jchidalgo/zoo",
    overlay_color=[(255,255,0), (0,255,0), (255,0,0)]
)
hand_det_model = dg.load_model(
    model_name=f"yolov8n_relu6_hand--640x640_quant_hailort_hailo{HAILO_VERSION}_1",
    inference_host_address="@local",
    zoo_url="/home/jchidalgo/zoo",
    overlay_show_labels=False,
    overlay_show_probabilities=False,
    overlay_line_width=1,
    overlay_color=[(255,0,0)]
)

# load palm landmarks detection model
palm_model = dg.load_model(
    model_name=f"hand_landmark_lite--224x224_quant_hailort_hailo{HAILO_VERSION}_1",
    inference_host_address="@local",
    zoo_url="/home/jchidalgo/zoo",
    overlay_show_probabilities=False,
    overlay_show_labels=False,
    overlay_line_width=1,
)
model_handLM = degirum_tools.CroppingAndDetectingCompoundModel(
    hand_det_model,
    palm_model,
    crop_extent=30.0,
    add_model1_results= True,  # Agregar resultados del modelo de detección de manos
)

tracker = degirum_tools.ObjectTracker(
    class_list=['person'], #['face'], 
    track_thresh=0.5,
    track_buffer=50, 
    match_thresh=0.9,
    trail_depth=50,
    show_overlay=True,
    anchor_point=degirum_tools.AnchorPoint.BOTTOM_CENTER
)
degirum_tools.attach_analyzers( person_model, [ tracker ]) #Analizador se agrega al detector de caras
# Create a compound model that combines the three models
#combined_model=degirum_tools.CombiningCompoundModel( model_handLM,person_model )
combined_model=degirum_tools.CombiningCompoundModel(person_model,model_handLM )

VIDEO_WIDTH =  1280 #  640 #960 # 
VIDEO_HEIGHT = 720 #  480 #540 # 

Here I receive results:

    for result in combined_model.predict_batch(frame_source()):
        T1 = time.perf_counter() * 1000  

        if stop_thread:
            print("Deteniendo ejecución...")
            break
        overlay = result.image_overlay() if callable(result.image_overlay) else result.image_overlay

        # Enviar el diccionario 'result' descomprimido a la otra Raspberry Pi
        # Debe establecerse la conexion Ethernet entre las dos Raspberry Pi: asignar IPs fijas

        send_result_to_vue3(result)

“result” only include “track_id” when is the information related to “person_model” but when “result” is generated with information related to “model_handLM” include a track_id=0. I need to know whose hand it is?, to whom belongs? How can achieve this??? The only way I can see now is to check if hand’s bbox is included in person’s bbox, and then get its track_id…

Hi @jhidalgo,

Please be advised that CombiningCompoundModel just runs two models in parallel and combines their results not altering any of them. Your object tracker is attached to the person detection model, which is then used in CombiningCompoundModel. This means that the tracker results have no relation to model_handLM results.

To achieve what you want, you need to use CroppingAndDetectingCompoundModel twice: first time to combine hand and wrist landmark models, and second time to combine person detector model and hand-wrist combined model.

Then you need to use "crop_index" result. It is added by CroppingAndDetectingCompoundModel when add_model1_results=True to each second-model result to refer to the first-model result which was used as a crop.

Please be advised, that when you set add_model1_results=True then first-model results are placed in the beginning of result list followed by second-model result. Since your second-model is also CroppingAndDetectingCompoundModel, the same applies. So in your result list you will have person bboxes, then hand bboxes, then wrist landmarks, in this order. In wrist landmark results the "crop_index" key will contain the index of hand, counted from hands part offset, not from the beginning.

The example code below (working example) adjusts "crop_index" for wrist landmarks (hands_offset logic) and propagates "track_id" result through all cropped results.

After for i, res in enumerate(inference_result.results) loop your inference_result.results will contain proper “track_id” in each element.

Hope this helps.

import degirum as dg, degirum_tools

person_model = dg.load_model(
    model_name=f"yolov8n_relu6_coco_pose--640x640_quant_hailort_hailo8_1",
    inference_host_address="@cloud",
    zoo_url="degirum/hailo",
    token=degirum_tools.get_token(),
    overlay_line_width=1,
    overlay_show_labels=False,
    overlay_color=[(255, 255, 0), (0, 255, 0), (255, 0, 0)],
)
hand_det_model = dg.load_model(
    model_name=f"yolov8n_relu6_hand--640x640_quant_hailort_hailo8_1",
    inference_host_address="@cloud",
    token=degirum_tools.get_token(),
    zoo_url="degirum/hailo",
    overlay_color=[(255, 0, 0)],
)

# load palm landmarks detection model
palm_model = dg.load_model(
    model_name=f"hand_landmark_lite--224x224_quant_hailort_hailo8_1",
    inference_host_address="@cloud",
    zoo_url="degirum/hailo",
    token=degirum_tools.get_token(),
)

tracker = degirum_tools.ObjectTracker(
    class_list=["person"],
    track_thresh=0.5,
    track_buffer=50,
    match_thresh=0.9,
    trail_depth=50,
    show_overlay=True,
    anchor_point=degirum_tools.AnchorPoint.BOTTOM_CENTER,
)
degirum_tools.attach_analyzers(person_model, [tracker])

model_handLM = degirum_tools.CroppingAndDetectingCompoundModel(
    hand_det_model,
    palm_model,
    crop_extent=30.0,
    add_model1_results=True,
)

combined_model = degirum_tools.CroppingAndDetectingCompoundModel(
    person_model,
    model_handLM,
    add_model1_results=True,
)

with degirum_tools.Display("Test") as display:
    for inference_result in degirum_tools.predict_stream(combined_model, 0):
        hands_offset = -1
        for i, res in enumerate(inference_result.results):
            if (ci := res.get("crop_index")) is not None:
                if hands_offset < 0:
                    hands_offset = i
                if "label" not in res:
                    ci += hands_offset
                    res["crop_index"] = ci
                if (tid := inference_result.results[ci].get("track_id")) is not None:
                    res["track_id"] = tid

        display.show(inference_result)

Hi Vladk, thank you very much for your support. Now, I get the track_id with every Hand information. Best regards.

Hi @jhidalgo
Thanks for confirming that Vlad’s code solves your issue. Is it ok to mark Vlad’s reply as solution?

Yes, of course. Thanks again.