Computer vision’s object detection technology is essential for numerous applications. We use it in robotics, surveillance equipment, self-driving automobiles, and many more areas. Hence, we get to find and recognize certain things in a picture or video.
The most recent version of this series is YOLOv5. And, it is the quickest and most precise object identification model on the market. The model’s capacity to generalize to new data has been greatly improved. Also, it contains many features that make it perform better than earlier iterations.
YOLOv5 is great for real-time applications since it can process pictures at a rate of up to 1000 frames per second on a single GPU.
In this article, we will introduce YOLOv5 and go over the details of its areas of application.
Journey of YOLO: From YOLO to YOLOv5
Joseph Redmon et al. originally introduced the YOLO, a set of object identification models, in 2016. The initial YOLO model could identify objects in real time. However, it had a low accuracy when compared to other models at that time.
Several upgraded versions of YOLO were released throughout the years. And finally, Ultralytics LLC created the newest edition of the YOLO series, YOLOv5.
YOLOv5 is the most accurate and quickest object identification model currently available.
YOLOv5 predicts bounding boxes for objects in an image using anchor boxes. The model predicts which of many pre-defined boxes with various aspect ratios best matches the item in the picture using anchor boxes. These are pre-defined boxes.
And, they enable YOLOv5 to recognize and find items in a picture with accuracy.
Mosaic data augmentation
When training, YOLOv5 employs a method known as mosaic data augmentation. To develop fresh training pictures, our model randomly combines patches of several photos. As a result, the model becomes more resilient and dependable. Hence, it gets to generalize to new data and decrease overfitting.
A Unique Training Pipeline
A unique training pipeline that mixes supervised and unsupervised learning is used.
Thus, the model learns from a smaller sample and utilizes unlabeled input effectively. This boosts the model’s performance and enhances its capacity to generalize to new inputs.
Layers that are residual and non-residual
YOLOv5’s architecture combines layers that are residual and non-residual. By allowing gradients to flow across the layers, residual layers assist the model in learning difficult features. Also, non-residual layers provide the model with a more comprehensive grasp of the input picture. As a result, YOLOv5 can operate more precisely and effectively.
How To Use YOLOv5
YOLOv5 installation may be completed quickly using pip. Pip is a Python package manager. The general procedures for installing YOLOv5 are as follows:
1- Install PyTorch: Because YOLOv5 is based on the PyTorch framework, you must first install PyTorch.
pip install torch torchvision
2. Install CUDA: You must install CUDA if you intend to run YOLOv5 on a GPU.
3. Install YOLOv5: After setting up PyTorch and CUDA, use the following command to download YOLOv5.
pip install yolov5
4-Following the installation of YOLOv5, you must download the pre-trained weights. The pre-trained weights are available in the Ultralytics GitHub repo.
Go to the “weights” part of the website by scrolling down. You may download pre-trained weights from the list you can find here.
5. Select the weights that are already trained and best suit your use case. The dataset or the particular YOLOv5 version that the weights were learned may be used to narrow the list.
6- After choosing the proper weights, pick the weight by clicking the “Download” button next to it. The weights will be available for download as. pt files.
7- Transfer the downloaded weights to the directory. This is where your detection script will be operating.
8- At this point, you can run object detection on your photos or videos using the pre-trained weights in your detection script.
Prepare the Data
You must take the following actions to get the data ready for usage with YOLOv5:
1. Gather the data: The first step is to gather the picture or video data you’ll need for object detection. The things you wish to detect should be present in the photos or videos.
2- Format the data: You may just import photos into your script if you’re utilizing them. You must turn a video into a series of photos if you plan to use one. You may extract the frames from a movie using a library like OpenCV.
img = cv2.imread('path/to/image')
With the OpenCV library, you can use the following command to turn a video into a series of images:
cap = cv2.VideoCapture('path/to/video')
ret, frame = cap.read()
if not ret:
if cv2.waitKey(1) & 0xFF == ord('q'):
3. Label the data: You must label the data if you are using your dataset. Drawing bounding boxes around the items you wish to identify in each frame of an image. It is the process of labeling the data. You may use several tools to assist you with this operation, including LabelImg and RectLabel.
4- You must divide the data into training and testing sets after you have tagged it. This is crucial for assessing how well your model performs.
5. Finally, you might need to preprocess the data before training or testing. This can entail scaling the pictures or videos, standardizing the pixel values, or using methods for data augmentation.
After completing these steps, your data is ready.
Run the detection script
Here is an illustration of a detection script that analyzes a picture and finds objects.
# Pre-trained weights should be loaded.
weights = 'path/to/weights.pt'
# Set the detection confidence level
conf_thres = 0.5
# Set the Non-Maxima Suppression (NMS) threshold
nms_thres = 0.5
# Create the detector object
detector = yolov5.YOLOv5(weights, conf_thres, nms_thres)
# Load the image
img = cv2.imread('path/to/image')
# Perform object detection
detections = detector.detect(img)
# Print the detections
for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:
print("Bounding box:", (x1, y1, x2, y2))
Non-maximum suppression is one of the most frequent post-processing techniques used in object detection (NMS). We use NMS to eliminate overlapping bounding boxes for the same object. To execute NMS on the detections, we can use the OpenCV library’s cv2.dnn.NMSBoxes() method.
Here’s an example of how to post-process detections using NMS.
# Perform Non-Maxima Suppression (NMS)
indices = cv2.dnn.NMSBoxes(detections, confidences, conf_thres, nms_thres)
In the case of visualization, we can again use a library like OpenCV. We can display the bounding boxes around the discovered objects on the source picture or video. To draw the image’s bounding boxes, use the cv2.rectangle() method. Here’s how to view the detections on the original image:
# Draw the bounding boxes on the image
for I in indices:
i = i
x1, y1, x2, y2 = detections[i], detections[i], detections[i], detections[i]
cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)
cv2.putText(img, classes[class_ids[i]], (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
# Show the image
cv2.imshow("Object Detection", img)
YOLOv5 is a strong object identification model. Hence, we can make use of it in many real-world scenarios. One of the most prominent uses is in self-driving cars. YOLOv5 can identify items in real-time such as automobiles and traffic lights.
In surveillance systems, we can use YOLOv5 to recognize and track objects in live video streams. Furthermore, YOLOv5 can be a great asset in robotics. It can help robots detect and comprehend their surroundings. This is extremely important for activities like navigation and manipulation.
YOLOv5 may also be utilized in any industry that requires object detection, such as retail, sports, medical, and security.
Finally, YOLOv5 is the most recent and sophisticated version of the YOLO family of object detection models
. Also, it is fair to say that it is the most accurate object detection model available. Thanks to its high accuracy and speed, you can safely choose it for your object detection projects.