How to improve box-fusion on tile boundaries

Hi everyone,

I’m currently fine-tuning the parameters involved in box fusion when using tiled inference with the DeGirum SDK. I’m running into an issue where overlapping boxes are not being merged correctly, especially near tile borders.

I’m using the following setup:

(…your setup here…)

# detection model
model = dg.load_model(
    model_name,
    hw_location,
    model_zoo_url,
    "",
    overlay_show_labels=True,
    overlay_show_probabilities=True,
    overlay_line_width=1,
    output_confidence_threshold=.10,
    output_class_set={"person", "car", "truck", "bus", "motorcycle", "bicycle"}
)

# tiling (inside)
tile_extractor = TileExtractorPseudoModel(
    cols=3,
    rows=1,
    overlap_percent=0.05,
    model2=model,
    global_tile=True, # required for the BoxFusion 

)
tiled_detector = BoxFusionLocalGlobalTileModel(
    model1=tile_extractor,
    model2=model,
    nms_options=NmsOptions(
        threshold=0.50,
        box_select=NmsBoxSelectionPolicy.MERGE,
    ),
    edge_threshold=0.02,
    fusion_threshold=0.95,
    large_object_threshold=0.02,
    add_model1_results=True,
)

# extract only the region of interest
roi_pseudo = RegionExtractionPseudoModel(
    roi_list=[bounding_extent(rois)],
    model2=tiled_detector,
)
# crop to the region of interest and use tiling inside that region
tiled_model = CroppingAndDetectingCompoundModel(
    roi_pseudo,
    tiled_detector,
    add_model1_results=True,
)

tiled_ai_model = dgstreams.AiSimpleGizmo(tiled_model, stream_depth=stream_depth, allow_drop=allow_drop)

Problem Description

In the image, several detections should clearly be fused, but aren’t:

➀ & ➁ Duplicate detections Local and Global tiles

Bullets and show that the same vehicle is detected twice: once as a truck and once as a car.
I suspect one detection comes from a local tile, and the other from the global tile (or vice-versa).

The IoU between these two boxes is almost ~1.0, so they should be merged.


➂ Small partial tile detection should be merged

Bullet shows a tiny bounding box from a neighboring tile that falls completely inside the larger box from bullets ➀ or ➁.
This box is not being fused either, while logically it should merge into the parent box.


➃ Sheriff car: green box should merge with orange box ➄

In bullet , the bright green box (local tile) and the orange box ➄ (another tile/global) are clear duplicates.
Again, these should be fused given their near-total overlap.


My question

How should I tune the box-fusion parameters so that:

  1. The local global duplicate detections in 1 and 2 are merged properly (high IoU ~1.0).
  2. The small inner box in 3 becomes fused into the larger parent box.
  3. The two sheriff-car boxes in 4 and 5 also merge into a single box.

Any help or best practices for optimizing tile-aware box fusion would be greatly appreciated!

Hi @Hobbes1987

Thanks for the detailed description. Can you please share the original image (without annotations) so that we can reproduce the results?

I didn’t manage to get the exact same image (it was a screenshot from a camera-stream).
At least I’ve one with the similar issues.

fully working code:

import degirum as dg
from degirum_tools import (
    RegionExtractionPseudoModel,
    CroppingAndDetectingCompoundModel,
    NmsOptions,
    NmsBoxSelectionPolicy,
    Display
)

from degirum_tools.tile_compound_models import (
    TileExtractorPseudoModel,
    BoxFusionLocalGlobalTileModel,
)

hw_location = "@local"
model_zoo_url = "models"
model_name = "yolo11n_coco--640x640_float_tensorrt_gpu_1"

model = dg.load_model(
    model_name,
    hw_location,
    model_zoo_url,
    "",
    overlay_show_labels=True,
    overlay_show_probabilities=True,
    overlay_line_width=1,
    output_confidence_threshold=.03,
    output_class_set={"person", "car", "truck", "bus", "motorcycle", "bicycle"}
)

extent = (100, 100, 1800, 1000)

# tiling (inside)
tile_extractor = TileExtractorPseudoModel(
    cols=3,
    rows=2,
    overlap_percent=0.05,
    model2=model,
    global_tile=True, # required for the BoxFusion 

)
tiled_detector = BoxFusionLocalGlobalTileModel(
    model1=tile_extractor,
    model2=model,
    nms_options=NmsOptions(
        threshold=0.50,
        box_select=NmsBoxSelectionPolicy.MERGE,
    ),
    edge_threshold=0.02,
    fusion_threshold=0.95,
    large_object_threshold=0.02,
    add_model1_results=True,
)

# extract only the region of interest
roi_pseudo = RegionExtractionPseudoModel(
    roi_list=[extent],
    model2=tiled_detector,
)
# crop to the region of interest and use tiling inside that region
tiled_model = CroppingAndDetectingCompoundModel(
    roi_pseudo,
    tiled_detector,
    add_model1_results=True,
)

inference_result = tiled_model("cars.png")
with Display("Cars:") as display:
    display.show_image(inference_result)

Image:

Result:

As you can see ➀ ➁ ➂ should be merged because of edge overlaps

And I think ➃ ➄ should be merged because of a global and local tile overlap

Edit: with cols 2 and rows 2, it’s even more visible:

almost every vehicle is both a truck and a car, even if they’re in the middle of a tile (thus not near the edges)

Edit 2 Oops! I think 2x2 tiling is not suitable for this image, use cols=2, rows=1 and the problem is very clearly visible:

First, I’d like to clear up how box fusion works. Box fusion works if there is a significant overlap (the fusion threshold) in the 1-dimensional IoU of two boxes. For boxes that overlap, that means there are two 1-D IoUs. If one of them reaches the threshold, they are fused. This is a simple heuristic that will not catch all cases. Furthermore, box fusions occurs before traditional NMS. That means, that the model will look for all cases for fusion. Then after all of the boxes including the fused boxes undergo traditional NMS.

Regarding boxes 1 and 2. You can eliminate them by setting NmsOptions to a very high threshold and by setting the agnostic option for NMS. Currently this is not accessible via NmsOptions but we can make the appropriate changes for the agnostic flag. Unfortunately, by setting NMS to agnostic you may suppress collateral boxes. For example, if there is a non vehicle object (i.e. a person) that has IoU over the NMS threshold, it would suppress either the vehicle or the person. If you suspect this case to occur quite frequently, you could implement a conditionally “agnostic” NMS, by setting a class hierarchy of mutually suppressible classes. degirum_tools does not currently have this capability however.

Regarding the box pairs 4/5, 1/3 or 1/2. As you can see, because the larger box, is much larger than the smaller box, the 1D-IoU will actually be very small and it will not exceed the threshold. Bear in mind the 2D IoU is not ~1.0. Just looking at box pair 4/5, I would estimate that the IoU is actually 1/20. The intersection over smaller area (IoS) is approximately 1.0. What you could do to solve this is to apply three steps, which unfortunately will increase your processing time. First, fuse edge boxes. Second, apply NMS using intersection over smaller area only on edge boxes post fusion. Finally, apply traditional NMS using IoU (agnostic or not). There are certainly cases where this strategy will also fail. For example, let’s pretend there is a child being held by a large adult. A detector would detect a small person for the child that is highly overlapped (IoS ~1.0) with the adult. Having the second IoS-NMS would suppress the child.

Another alternative is to use 1D-IoS in the box fusion process but that again has potential ramifications. For example, consider two boxes that are diagonal to one another where the boxes slightly overlap along an edge where one of the boxes has a height or width close to the edge threshold. In this case you could have accidental fusion of the boxes.

All of these heuristics will have different trade offs and I am not sure what is acceptable in your use case. Is it better to have false postives (duplicates) vs false negatives (collaterally suppressed boxes)?

Thank you for your detailed explanation. For now I’ll tweak my settings for my best results. Its nice to know how it works under the hood. Thanks!

The class-agnostic global-to-local fusion would be a nice thing to have, for now I managed to get it working by using IoS instead of IoU.

The latest degirum_tools now supports class agnostic NMS, which is set via the class_agnostic attribute in NmsOptions.

Hi @Hobbes1987, were you able to try the latest version of degirum_tools that supports class agnostic NMS?