Combined cropping and tiling

Hi all,

I am trying to build a three-stage model pipeline, but I’m running into errors and I’m no longer sure whether I’m wiring the models correctly.

My goal is this:

  1. Crop a single large ROI (shifted)
  2. Inside that ROI: use tiling
  3. Fuse detections with BoxFusionLocalGlobalTileModel

This Image1 shows what we want to achieve:

  • Blue = full frame
  • Green = one big ROI crop (under the hood: max extent of several polygons)
    Note: it’s not a crop/margin percentage, because it is shifted in x and y directions
  • Red= tile grid with edge-aware fusion

I managed to get everything working with (Image 2) a custom “manual tile generator”
simply creating many sub-ROIs and running the detection model on each.
However, with that approach I lose all the advanced features of the official tile system, such as edge-aware box fusion.
That’s why I’m trying to adopt the built-in TileExtractorPseudoModel + BoxFusionLocalGlobalTileModel, but wiring them correctly has been difficult.

# Base detection model
model = dg.load_model(...)

tile_extractor = TileExtractorPseudoModel(
    cols=3,
    rows=2,
    overlap_percent=0.15,
    model2=model, # not sure....
    global_tile=True,
)

tiled_detector = BoxFusionLocalGlobalTileModel(
    tile_extractor,
    model, # not sure....
    # ...
)

max_extent_box = [x1, y1, x2, y2]   # computed from ROIs
roi_pseudo = RegionExtractionPseudoModel(
    [max_extent_box],
    model, # not sure....
    # ...
)

tiled_model = CroppingAndDetectingCompoundModel(
    roi_pseudo,        # model1: produces one ROI bbox?
    tiled_detector,    # model2: tile model runs inside that ROI?
    crop_extent=0.0, # I can't use this I guess, because of my shifted x/y
)

Am I understanding the intended use of the CompoundModels correctly?
I want to confirm that I’m combining RegionExtractionPseudoModel, CroppingAndDetectingCompoundModel, and the tile-based models in the way the framework was designed.

Is this approach a good idea in general?
Conceptually it seems clean: crop to a large ROI first, then perform tiling and fusion inside that ROI. But I’m not sure whether this is the recommended or optimal pattern within the DeGirum Tools architecture.

Bonus question

I would also really like to access or use the cropped image itself (i.e., the image after the large ROI crop — the “green box” in the image) inside my processing pipeline.

Is there a way in DeGirum Tools to retrieve the cropped image produced by CroppingAndDetectingCompoundModel?

def retrieve_roi_images(results):
    roi_images = []
    for res in results.results:
        if res['label'].startswith('ROI'):
            x1, y1, x2, y2 = res['bbox']
            roi_images.append(results.image[y1:y2, x1:x2])

    return roi_images


model = dg.load_model(model_name, hw_location, zoo_url, dgt.get_token())

tile_extractor = TileExtractorPseudoModel(
    cols=3,
    rows=2,
    overlap_percent=0.15,
    model2=model,
    global_tile=True,
)

tiled_detector = BoxFusionLocalGlobalTileModel(
    tile_extractor,
    model,
)

max_extent_box = [x1, y1, x2, y2]   # computed from ROIs
roi_pseudo = RegionExtractionPseudoModel(
    [max_extent_box],
    tiled_detector,
)

tiled_model = CroppingAndDetectingCompoundModel(
    roi_pseudo,
    tiled_detector,
    crop_extent=0.0,
    add_model1_results=True
)

res = tiled_model(img_path)
roi_images = retrieve_roi_images(res)

The only thing wrong in the model chaining was the model used as model1 in the CroppingAndDetectingCompoundModel. By adding add_model1_results=Trueto the CroppingAndDetectingModelwe can retrieve the ROI image(s) as show with the function retrieve_roi_images.

Do you expect the ROI to change over the course of your inferences? Do you need your coordinates relative to the original large image or only the ROI?

Thank you for your reply! I’ve changed my code to match yours:

Do you expect the ROI to change over the course of your inferences?
No, it’s a static setup
Do you need your coordinates relative to the original large image or only the ROI?
not relevant for now :slight_smile: first let us get it to work properly

model = dg.load_model(
    model_name,
    hw_location,
    model_zoo_url,
    "",
    overlay_show_labels=False,
    overlay_show_probabilities=False,
    overlay_line_width=1,
    output_confidence_threshold=.05,
    output_class_set={"person", "car", "truck", "bus", "motorcycle", "bicycle"}
)

tile_extractor = TileExtractorPseudoModel(
    cols=3,
    rows=2,
    overlap_percent=0.15,
    model2=model,
    global_tile=True,
)

tiled_detector = BoxFusionLocalGlobalTileModel(
    tile_extractor,
    model,
    nms_options= NmsOptions(
        threshold=0.35,
        use_iou=True,
        box_select=NmsBoxSelectionPolicy.MOST_PROBABLE,
    ),
)

ext = [bounding_extent(rois)]
print("EXTENT", ext) # => [(250, 250, 1000, 1000)]

roi_pseudo = RegionExtractionPseudoModel(
    ext,
    tiled_detector,
)

tiled_model = CroppingAndDetectingCompoundModel(
    roi_pseudo,
    tiled_detector,
    crop_extent=0.0,
    add_model1_results=True,
)

tiled_ai_model = dgstreams.AiSimpleGizmo(tiled_model, allow_drop=True)

Composition(
    dgstreams.VideoSourceGizmo("rtsp://....", stop_composition_on_end=True)
    >> tiled_ai_model
    # >> analyzer
    # >> events
    # >> display
).start()


But when running this code i get this crash Image type ‘<class ‘NoneType’>’ is not supported for ‘opencv’ image backend

  File "test/venv/lib/python3.10/site-packages/degirum_tools/streams/base.py", line 697, in start
    self.wait()
  File "test/venv/lib/python3.10/site-packages/degirum_tools/streams/base.py", line 773, in wait
    raise Exception(errors)
Exception: Error detected during execution of AiSimpleGizmo:
  <class 'degirum.exceptions.DegirumException'>: Image type '<class 'NoneType'>' is not supported for 'opencv' image backend

Traceback (most recent call last):
  File "test/venv/lib/python3.10/site-packages/degirum_tools/streams/base.py", line 664, in gizmo_run
    gizmo.run()
  File "test/venv/lib/python3.10/site-packages/degirum_tools/streams/gizmos.py", line 689, in run
    for result in self.model.predict_batch(source()):
  File "test/venv/lib/python3.10/site-packages/degirum_tools/compound_models.py", line 871, in predict_batch
    for result in super().predict_batch(data):
  File "test/venv/lib/python3.10/site-packages/degirum_tools/compound_models.py", line 352, in predict_batch
    while result2 := next(model2_iter):
  File "test/venv/lib/python3.10/site-packages/degirum_tools/compound_models.py", line 871, in predict_batch
    for result in super().predict_batch(data):
  File "test/venv/lib/python3.10/site-packages/degirum_tools/compound_models.py", line 345, in predict_batch
    for result1 in self.model1.predict_batch(data):
  File "test/venv/lib/python3.10/site-packages/degirum_tools/tile_compound_models.py", line 293, in predict_batch
    preprocessed_data = preprocessor.forward(frame)
  File "test/venv/lib/python3.10/site-packages/degirum/log.py", line 92, in sync_wrap
    return f(*args, **kwargs)
  File "test/venv/lib/python3.10/site-packages/degirum/_preprocessor.py", line 666, in forward
    raise DegirumException(
degirum.exceptions.DegirumException: Image type '<class 'NoneType'>' is not supported for 'opencv' image backend

if I use the code in this way:

tiled_model = tiled_detector

al works as expected (but then without the region extraction and cropping).

Things to take into account:

  • I use an RTSP stream as source
  • I allow drop allow_drop=True
  • [(250, 250, 1000, 1000)] is my roi_list
  • I use the Composition(...).start() api

Hi @Hobbes1987

Thanks for reporting this. We fixed the bug and released a new version of degirum_tools. Please uninstall older version and install the latest version (0.22.6) and check if the issue still persists.

1 Like

After the update i’ve got it working! Thanks! nice follow up for people interested:

1 Like

@Hobbes1987 we’re so glad to hear this is working now! Would you mind marking a solution in the thread (the box with a checkmark) to help others quickly find the answer?