OCR and Face Recognition

Hello everyone.

I am currently creating a face recognition for my school STEM project with the rpi5 + AI hat hailo. The image to be detected is a selfie of a person with a caption that states the person’s name. The program should then store the face and name separately in a folder. That data would be used in the future to recognize this person’s face and his/her name in a rectangular box. The default face recognition only states “person” instead of his/her name. That is the basic hailo rpi-5-examples code.

The workflow:

  1. I take a selfie of myself with my phone
  2. I add my name into the selfie pic
  3. I show my selfie to the raspberry rpi5 camera
  4. The camera takes a picture in 5 seconds
  5. The rpi5 records my face and recognize the alphabets below my face
  6. Every time my face pass by the camera, there will be a rectangle box that recognizes me and displays my name instead of “person”

Any help will be appreciated. Thank you in advance.

Hi @Audric.Tsai

Welcome to the DeGirum community. Thanks for sharing the details of the project. This is such a cool use case. What you outlined is easily doable with PySDK. We will share some guidelines shortly.

Thank you for your fast response. I really appreciate it. I am looking forward to your step by step guidance.

I‘ll have a slow response as I have semester exams. When I have time, I’ll take a look at your guidance. This small project needs to be finished within 2 weeks. The stem fair will be at the end of this month. I hope I can make it on time.

P.S. this project is to help ALS patients for face recognition and name.

Hello sir, have you made a guideline? I am looking forward to it since I am going to continue my project throughout the weekend. What should I prepare first? Please guide me through it. Thank you, have an awesome day

Name-Aware Face Recognition on Raspberry Pi 5 + Hailo

Objective: Build a system that, upon showing a selfie with a captioned name to the camera, waits five seconds, captures a frame, reads the name (OCR), stores the face template with that name, and subsequently recognizes the person in live video by drawing a bounding box annotated with the correct name (rather than a generic label).


References (primary sources)

  • Face Recognition Pipeline (Hailo) — detection → 5-point alignment → embeddings → vector search → labeling
    Face Recognition Guide

  • PaddleOCR Example (Hailo) — text detection and recognition using DeGirum PySDK
    Paddle OCR Example

Read both resources once end-to-end. Then implement the steps below by composing small fragments from those guides.


Target Outcome

  1. Present a selfie with a clearly printed name below the face to the Pi camera.

  2. The system initiates a five-second countdown, then captures a short window of frames.

  3. For each frame, the face is detected, aligned, and embedded; embeddings are averaged into a stable template.

  4. OCR detects caption region(s), recognizes text, selects and sanitizes a name string.

  5. The template and name are stored for subsequent lookup.

  6. In recognition mode, faces are detected live, embeddings computed and searched, and the matched name is overlaid (or “Unknown” when below threshold).


Recommended Models and Runtime (All On Device)

For Raspberry Pi 5 with the Hailo AI Hat, use Hailo-quantized models:

  • Face detection with 5-point landmarks:
    yolov8n_relu6_widerface_kpts--640x640_quant_hailort_hailo8_1

  • Face recognition (ArcFace-style compatible):
    arcface_mobilefacenet--112x112_quant_hailort_hailo8_1

  • OCR detection and recognition:
    paddle_ocr_detection--544x960_quant_hailort_hailo8_1
    paddle_ocr_recognition--48x320_quant_hailort_hailo8_1

Load models with:

  • inference_host_address="@local"

  • zoo_url="degirum/hailo"

  • device_type=['HAILORT/HAILO8']

as shown in the PaddleOCR example.


Suggested System Modes

Enrollment (with OCR)

  • Perform a countdown (approximately five seconds), then capture frames for a short window (approximately 3–5 seconds).

  • Face pipeline: detect → 5-point align to the canonical size used by the face guide → compute embeddings per frame → average them to form a stable template.

  • OCR pipeline: detect text boxes → recognize text → select the most credible name string (for example, prefer high-confidence text located in the lower portion of the frame where captions typically appear). Sanitize the final name to letters, spaces, and hyphens.

  • Persist: save one representative original frame and one aligned crop for inspection; insert the (name, averaged template) into your vector database as demonstrated in the face guide.

Live Recognition

  • For each incoming frame: detect faces, align, compute embeddings (batch where appropriate), perform nearest-neighbor search in your database, and render the bounding-box label. If the similarity score is below your threshold, show “Unknown.”

Implementation Steps

1) Verify the Official Demos Independently

  • From the Face Recognition guide, run detect → align → embed → search against a small database. Confirm you see sensible matches and similarity scores.

  • From the PaddleOCR example, run text detection and recognition on a test image. Confirm recognized strings and confidence scores.

Proceed once both demos behave as expected.

2) Build Enrollment with OCR

  • Implement a five-second countdown and a short capture window.

  • For each frame:

    • Align the face to the canonical input size used by the recognition model.

    • Compute an embedding and collect a small set across the window.

    • Run OCR; record predicted texts with confidence and bounding boxes.

  • Choose the enrollment name from the OCR results (prefer high-confidence, caption-like regions). Sanitize the final string.

  • Average the collected embeddings into a stable template (maintain the same normalization practice used in the face guide).

  • Save a representative original frame and an aligned crop for traceability.

  • Insert (name, template) into the vector database as shown in the face guide.

3) Build Live Recognition

  • For each frame:

    • Detect and align faces; compute embeddings in batch to reduce latency.

    • Search the database for the nearest entry; convert distance to a similarity metric as demonstrated in the face guide.

    • If the score meets or exceeds your threshold, draw the name and score; otherwise draw “Unknown.”

4) Validate the End-to-End Workflow

  • Prepare a selfie with a clearly legible name below the face.

  • Run Enrollment mode: countdown → capture window → OCR name selection → template averaged and stored.

  • Run Recognition mode: present the face in normal viewing; observe the bounding box annotated with the correct name.

Hi @Audric.Tsai

One of our engineers @c_franklin put together the above detailed guide on how to build a system outlined by you. Please take a look and let us know if you get stuck.