Setup
This tutorial is about recognizing objects in images or video streams. The YOLO framework is used for this purpose. YOLO offers a very simple and extremely high-performance option for object classification.
Attention: This tutorial is based on my tests with a Raspberry Pi model 4 and the Raspian OS version 11 (Bullseye). As AI framework I have preinstalled PyTorch 2.3 and use Python 3.9. I cannot guarantee that it will also work on older Raspberry Pi versions and earlier PyTorch versions etc.
Check Your Raspberry Pi Model
Assuming that your Raspberry Pi is powered on and you have command line access to it, then the easiest way to check the model number would be executing a simple Linux command. Running this command will check the contents of the /sys/firmware/devicetree/base/model file and output your Raspberry Pi’s model number:
$ cat /sys/firmware/devicetree/base/model Raspberry Pi 4 Model B Rev 1.5
Check Raspberry Pi’s OS Version
One of the simplest ways to see your Raspberry Pi OS version is with the hostnamectl command. This is provided by systemd, which is standard across all editions of Raspberry Pi OS. Depending on your release, either the version or the release code name is revealed in the output.
hostnamectl
Another way to see this same information is by viewing the /etc/os-release file.
sudo cat /etc/os-release
Check PyTorch Version
Just write this little Python program and run it:
import torch print(torch.__version__)
OpenCV
OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Being an Apache 2 licensed product, OpenCV makes it easy for businesses to utilize and modify the code. To install OpenCV, simply execute the following command:
Pip install opencv-python
Camera Test
Unfortunately, the “legacy camera support” is no longer supported in the 64 BIt Raspberry Pi OS versions, which means that you can no longer use the PiCam. On the other hand, you need the 64 Bit OS version to be able to use PyTorch. The only way out of this dilemma is to use a USB camera.
import cv2 from ultralytics import YOLO # Open the video streamq cam = cv2.VideoCapture(0) # Loop through the video frames while cam.isOpened(): # Read a frame from the video success, frame = cam.read() if success: # Display the annotated frame cv2.imshow("TEST", frame) # Break the loop if 'q' is pressed if cv2.waitKey(1) == ord("q"): break else: # Break the loop if the end of the video is reached break # Release the video capture object and close the display window cam.release() cv2.destroyAllWindows
YOLO
YOLO stands for You Only Look Once, a family of real-time object detection algorithms. It is widely used in computer vision tasks to identify and locate objects within images or video streams. YOLO is known for its speed and efficiency, making it suitable for applications requiring real-time processing, such as surveillance, autonomous vehicles, and robotics. As illustrated below, it has 24 convolutional layers, four max-pooling layers, and two fully connected layers.The first 20 convolution layers of the model are pre-trained using ImageNet by plugging in a temporary average pooling and fully connected layer. Then, this pre-trained model is converted to perform detection since previous research showcased that adding convolution and connected layers to a pre-trained network improves performance. YOLO’s final fully connected layer predicts both class probabilities and bounding box coordinates.
YOLO divides an input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the predicted box is.
YOLO models, including yolo11n.pt, are commonly trained on the COCO dataset. The COCO dataset (short for Common Objects in Context) is a widely used large-scale dataset for training and evaluating computer vision models. It was developed by researchers at Microsoft and is designed to advance the understanding of objects in complex, real-world scenes.
YOLO Installation
As always, you can easily install YOLO with pip:
pip install ultralytics