Object detection is a type of image categorization in which a neural network anticipates items in an image and draws bounding boxes around them. Detecting and localizing things in an image that conforms to a preset set of classes is referred to as object detection.
Object detection (also known as object recognition) is a particularly significant subdomain of Computer Vision because tasks like detection, identification, and localization find broad application in real-world contexts.
The YOLO approach can help you do these tasks. In this essay, we’ll take a closer look at YOLO, including what it is, how it works, different variations, and more.
So, what is YOLO?
YOLO is a method for real-time object identification and recognition in photographs. It is an acronym for You Only Look Once. Redmond et al. proposed the approach in a paper that was initially published in 2015 at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
The OpenCV People’s Choice Award was given to the paper. Unlike previous object identification methods, which repurposed classifiers to do detection, YOLO proposes the usage of an end-to-end neural network that predicts bounding boxes and class probabilities simultaneously.
YOLO produces state-of-the-art results by taking a fundamentally new approach to object recognition, easily outperforming previous real-time object detection methods.
YOLO working
The YOLO method divides the picture into N grids, each with an equal-sized SxS dimensional sector. Each of these N grids is in charge of detecting and locating the object that it contains.
These grids, in turn, forecast B bounding box coordinates relative to cell coordinates, as well as the item name and likelihood of the object being present in the cell. Due to many cells predicting the same item with varied bounding box predictions, this technique considerably reduces computation because both detection and recognition are handled by cells from the picture.
However, it produces a lot of duplicate predictions. To address this problem, YOLO employs Non-Maximal Suppression. YOLO suppresses all bounding boxes with lower probability scores in Non-Maximal Suppression.
YOLO does this by examining the probability scores linked with each option and selecting the one with the highest score. The bounding boxes with the biggest Intersection over Union with the current high probability bounding box are then suppressed.
This process is continued until the bounding boxes are complete.
Different variations of YOLO
We’ll look at some of the most common YOLO versions. Let’s get started.
1. YOLOv1
The initial YOLO version was announced in 2015 in the publication “You Only Look Once: Unified, Real-Time Object Detection” by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.
Because of its speed, accuracy, and learning ability, YOLO quickly dominated the area of object identification and became the most widely used algorithm. Rather than addressing object detection as a classification issue, the authors approached it as a regression problem with geographically separated bounding boxes and associated class probabilities, which they solved using a single neural network.
The YOLOv1 processed photos at 45 frames per second in real-time, whereas a smaller variant, Fast YOLO, processed at 155 frames per second and still obtained double the mAP of other real-time detectors.
2. YOLOv2
A year later, in 2016, Joseph Redmon and Ali Farhadi released YOLOv2 (also known as YOLO9000) in the paper “YOLO9000: Better, Faster, Stronger.”
The model’s capacity to forecast even 9000 distinct item categories while still running in real-time earned it the designation 9000. Not only was the new model version simultaneously trained on object detection and classification datasets, but it also got Darknet-19 as the new baseline model.
Because YOLOv2 was also a big success and quickly became the next state-of-the-art object recognition model, other engineers began to experiment with the algorithm and produce their own, unique YOLO versions. Some of them will be discussed at various points in the paper.
3. YOLOv3
In the paper “YOLOv3: An Incremental Improvement,” Joseph Redmon and Ali Farhadi published a new version of the algorithm in 2018. It was built on the Darknet-53 architecture. Independent logistic classifiers replaced the softmax activation mechanism in YOLOv3.
The binary cross-entropy loss was used during training. Darknet-19 was enhanced and renamed Darknet-53, which now has 53 convolutional layers. Aside from that, the predictions were done on three distinct scales, which helped YOLOv3 enhance its accuracy in predicting tiny things.
YOLOv3 was Joseph Redmon’s final YOLO version, since he opted not to work on any further YOLO improvements (or even in the computer vision area) in order to avoid his work having a detrimental influence on the world. It is now mostly used as a starting point for constructing unique object-detection architectures.
4. Yolov4
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao published “YOLOv4: Optimal Speed and Accuracy of Object Detection” in April 2020, which was the fourth iteration of the YOLO algorithm.
Weighted Residual Connections, Cross-Stage-Partial Connections, cross mini-batch normalization, self-adversarial training, mish activation, drop block, and CIoU loss were all introduced as part of the SPDarknet53 architecture.
YOLOv4 is a descendant of the YOLO family, however, it was developed by separate scientists (not Joseph Redmon and Ali Farhadi). SPDarknet53 backbone, spatial pyramid pooling, PANet path-aggregation as neck, and YOLOv3 head make up its architecture.
As a consequence, when compared to its parent, YOLOv3, YOLOv4 achieves 10% higher Average Precision and 12% better Frames Per Second metrics.
5. YOLOv5
YOLOv5 is an open-source project that includes a range of object identification models and algorithms based on the YOLO model that has been pre-trained on the COCO dataset.
YOLOv5 is a collection of compound-scaled object identification models trained on the COCO dataset, with easy capabilities for TTA, model assembly, hyperparameter development, and export to ONNX, CoreML, and TFLite. Because YOLOv5 does not implement or develop any unique approaches, the formal paper could not be released. It’s simply YOLOv3’s PyTorch extension.
Ultranytics took use of this scenario to publicize the “new YOLO” version under its sponsorship. Because there are also five pre-trained models accessible, the YOLOv5 homepage is quite straightforward and professionally structured and written, with a number of lessons and suggestions on training and utilizing the YOLOv5 models.
YOLO limitations
Although YOLO appears to be the greatest technique for solving object detection problems, it has a number of drawbacks. Because each grid can only identify one item, YOLO has difficulty detecting and segregating tiny things in pictures that occur in groups. Small things in swarms, such as a swarm of ants, are difficult for YOLO to identify and locate.
When compared to significantly slower object identification methods like Fast RCNN, YOLO is likewise characterized by lesser accuracy.
Start using YOLOv5
If you’re interested in seeing a YOLOv5 in action, check out the official GitHub and YOLOv5 in PyTorch.
Conclusion
YOLOv5’s initial version is extremely quick, performant, and simple to use. While YOLOv5 does not add any new model architecture to the YOLO family, it does provide a new PyTorch training and deployment framework that enhances the state of the art for object detectors.
Furthermore, YOLOv5 is extremely user-friendly and comes “out of the box” ready to use on bespoke objects.
Leave a Reply