Object Detection

After the setup it is now about recognizing objects in images and in video streams for which The pre-trained model "yolo11n.pt is used". I am pretty sure that it was trained with the COCO data set, because the classifiable objects match exactly with the COCO data set.

First YOLO Test

This is the standard test for YOLO: You load the "bus image" from the Ultralytics website and let YOLO recognize the existing objects:

from ultralytics import YOLO
# Load the pretrained model
model = YOLO("yolo11n.pt")

# recognition of the objects
results = model("https://ultralytics.com/images/bus.jpg")

# Visualize the results
for result in results:
    result.show()

The result should look like this:

Conversion To NCNN

NCNN is a highly efficient inference framework optimized for mobile and embedded devices. When combined with YOLO (You Only Look Once), it offers several advantages: The combination with NCNN enables YOLO models to run in real-time on mobile devices without relying on high-performance GPUs. NCNN is platform-independent and does not require special runtime dependencies like CUDA or OpenCL. This makes deploying YOLO on devices without GPUs or specialized hardware more straightforward.

# Load a YOLO11n PyTorch model
model = YOLO("yolo11n.pt")

# Export the model to NCNN format
model.export(format="ncnn")  # creates 'yolo11n_ncnn_model'

# Load the exported NCNN model
ncnn_model = YOLO("yolo11n_ncnn_model")

Improved Test

We do the same test as above a second time, but this time we use the saved NCNN model. It makes no sense to convert the model again and again as long as the original YOLO model does not change.

from ultralytics import YOLO

# Load the exported NCNN model
ncnn_model = YOLO("yolo11n_ncnn_model")

# recognition of the objects
results = ncnn_model("https://ultralytics.com/images/bus.jpg")

# Visualize the results
for result in results:
    result.show()

Video Capture

It is also very easy to analyze the images in a video stream. however, the Raspberry Pi quickly reaches the limits of its performance with high frame rates and many objects:

import cv2
from ultralytics import YOLO

# Load the YOLO model
model = YOLO("yolo11n_ncnn_model")

# Open the video streamq
cam = cv2.VideoCapture(0)

# Loop through the video frames
while cam.isOpened():
    # Read a frame from the video
    success, frame = cam.read()

    if success:
        # Run YOLO inference on the frame
        results = model(frame)

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLO", annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cam.release()
cv2.destroyAllWindows

Video Capture Test

My plan is to use YOLO for the search for missing persons with drones. that's why i made a test that corresponds approximately to the perspective of a drone at about 1o meters above ground. The main focus was on whether people can still be recognized correctly from this distance and perspective.