OCR and Face Recognition

Name-Aware Face Recognition on Raspberry Pi 5 + Hailo

Objective: Build a system that, upon showing a selfie with a captioned name to the camera, waits five seconds, captures a frame, reads the name (OCR), stores the face template with that name, and subsequently recognizes the person in live video by drawing a bounding box annotated with the correct name (rather than a generic label).


References (primary sources)

  • Face Recognition Pipeline (Hailo) — detection → 5-point alignment → embeddings → vector search → labeling
    Face Recognition Guide

  • PaddleOCR Example (Hailo) — text detection and recognition using DeGirum PySDK
    Paddle OCR Example

Read both resources once end-to-end. Then implement the steps below by composing small fragments from those guides.


Target Outcome

  1. Present a selfie with a clearly printed name below the face to the Pi camera.

  2. The system initiates a five-second countdown, then captures a short window of frames.

  3. For each frame, the face is detected, aligned, and embedded; embeddings are averaged into a stable template.

  4. OCR detects caption region(s), recognizes text, selects and sanitizes a name string.

  5. The template and name are stored for subsequent lookup.

  6. In recognition mode, faces are detected live, embeddings computed and searched, and the matched name is overlaid (or “Unknown” when below threshold).


Recommended Models and Runtime (All On Device)

For Raspberry Pi 5 with the Hailo AI Hat, use Hailo-quantized models:

  • Face detection with 5-point landmarks:
    yolov8n_relu6_widerface_kpts--640x640_quant_hailort_hailo8_1

  • Face recognition (ArcFace-style compatible):
    arcface_mobilefacenet--112x112_quant_hailort_hailo8_1

  • OCR detection and recognition:
    paddle_ocr_detection--544x960_quant_hailort_hailo8_1
    paddle_ocr_recognition--48x320_quant_hailort_hailo8_1

Load models with:

  • inference_host_address="@local"

  • zoo_url="degirum/hailo"

  • device_type=['HAILORT/HAILO8']

as shown in the PaddleOCR example.


Suggested System Modes

Enrollment (with OCR)

  • Perform a countdown (approximately five seconds), then capture frames for a short window (approximately 3–5 seconds).

  • Face pipeline: detect → 5-point align to the canonical size used by the face guide → compute embeddings per frame → average them to form a stable template.

  • OCR pipeline: detect text boxes → recognize text → select the most credible name string (for example, prefer high-confidence text located in the lower portion of the frame where captions typically appear). Sanitize the final name to letters, spaces, and hyphens.

  • Persist: save one representative original frame and one aligned crop for inspection; insert the (name, averaged template) into your vector database as demonstrated in the face guide.

Live Recognition

  • For each incoming frame: detect faces, align, compute embeddings (batch where appropriate), perform nearest-neighbor search in your database, and render the bounding-box label. If the similarity score is below your threshold, show “Unknown.”

Implementation Steps

1) Verify the Official Demos Independently

  • From the Face Recognition guide, run detect → align → embed → search against a small database. Confirm you see sensible matches and similarity scores.

  • From the PaddleOCR example, run text detection and recognition on a test image. Confirm recognized strings and confidence scores.

Proceed once both demos behave as expected.

2) Build Enrollment with OCR

  • Implement a five-second countdown and a short capture window.

  • For each frame:

    • Align the face to the canonical input size used by the recognition model.

    • Compute an embedding and collect a small set across the window.

    • Run OCR; record predicted texts with confidence and bounding boxes.

  • Choose the enrollment name from the OCR results (prefer high-confidence, caption-like regions). Sanitize the final string.

  • Average the collected embeddings into a stable template (maintain the same normalization practice used in the face guide).

  • Save a representative original frame and an aligned crop for traceability.

  • Insert (name, template) into the vector database as shown in the face guide.

3) Build Live Recognition

  • For each frame:

    • Detect and align faces; compute embeddings in batch to reduce latency.

    • Search the database for the nearest entry; convert distance to a similarity metric as demonstrated in the face guide.

    • If the score meets or exceeds your threshold, draw the name and score; otherwise draw “Unknown.”

4) Validate the End-to-End Workflow

  • Prepare a selfie with a clearly legible name below the face.

  • Run Enrollment mode: countdown → capture window → OCR name selection → template averaged and stored.

  • Run Recognition mode: present the face in normal viewing; observe the bounding box annotated with the correct name.