YOLOv5 is a PyTorch-based object detection framework developed by Ultralytics that supports three computer vision tasks: object detection, instance segmentation, and image classification. The repository provides a complete pipeline for training, validating, exporting, and deploying models across multiple hardware platforms and inference backends.
This page provides a high-level overview of the YOLOv5 system architecture and its core components. For detailed information about specific subsystems:
YOLOv5 supports three primary computer vision tasks, each with dedicated implementations:
| Task | Primary Script | Model Types | Output Format |
|---|---|---|---|
| Object Detection | detect.py, train.py, val.py | DetectionModel | Bounding boxes (x, y, w, h) + class + confidence |
| Instance Segmentation | segment/predict.py, segment/train.py, segment/val.py | SegmentationModel | Bounding boxes + pixel masks |
| Image Classification | classify/predict.py, classify/train.py, classify/val.py | ClassificationModel | Class probabilities |
Each task uses a shared foundation of neural network modules and utilities but has specialized heads and training procedures defined in task-specific directories.
Sources: README.md20 models/yolo.py218-376
Diagram: Core system components and their dependencies
The architecture follows a layered design with clear separation between:
Sources: train.py1-617 models/yolo.py1-497 models/common.py1-700 utils/dataloaders.py1-1300
The repository is organized into the following key directories:
yolov5/
├── train.py # Main training script (424.13 importance)
├── detect.py # Object detection inference
├── val.py # Validation and benchmarking
├── export.py # Export to 11+ formats
├── models/
│ ├── yolo.py # DetectionModel, SegmentationModel, ClassificationModel
│ ├── common.py # Neural network modules (Conv, C3, SPPF, etc.)
│ ├── tf.py # TensorFlow/Keras/TFLite versions
│ └── experimental.py # Experimental features
├── utils/
│ ├── general.py # General utilities (343.30 importance)
│ ├── torch_utils.py # PyTorch utilities (ModelEMA, device selection)
│ ├── dataloaders.py # Data loading and caching
│ ├── augmentations.py # Image augmentation techniques
│ ├── loss.py # Loss functions
│ ├── metrics.py # Evaluation metrics
│ ├── plots.py # Visualization
│ └── loggers/ # Multi-platform logging
├── data/
│ ├── coco128.yaml # Dataset configurations
│ └── hyps/ # Hyperparameter configurations
├── segment/ # Instance segmentation task
│ ├── train.py
│ ├── predict.py
│ └── val.py
└── classify/ # Image classification task
├── train.py
├── predict.py
└── val.py
The importance values indicate modification frequency, with train.py (424.13) and utils/general.py (343.30) being the most actively maintained components.
Sources: train.py1-10 models/yolo.py1-10 utils/general.py1-10
Diagram: Model lifecycle from definition to deployment
The lifecycle consists of four stages:
Definition: Models are defined via YAML configuration files that specify architecture, depth, and width multipliers. The parse_model() function in models/yolo.py378-461 constructs the neural network layers.
Training: The train() function in train.py105-543 orchestrates the training loop, using create_dataloader() from utils/dataloaders.py116-195 for data loading. Validation runs periodically via val.run(), and checkpoints are saved as best.pt and last.pt.
Export: The export.py script converts trained models to 11+ deployment formats including ONNX, TensorRT, CoreML, TensorFlow Lite, and OpenVINO.
Inference: Models are loaded via DetectMultiBackend in models/common.py344-552 which provides a unified interface for all export formats. Inference can be performed through command-line scripts (detect.py, segment/predict.py, classify/predict.py) or PyTorch Hub.
Sources: models/yolo.py378-461 train.py105-543 models/common.py344-552 export.py1-100
YOLOv5 provides five model sizes (n, s, m, l, x) scaling in both accuracy and inference speed:
| Model | Parameters | FLOPs @640 | COCO mAPval | Speed V100 (ms) |
|---|---|---|---|---|
| YOLOv5n | 1.9M | 4.5B | 28.0 | 6.3 |
| YOLOv5s | 7.2M | 16.5B | 37.4 | 6.4 |
| YOLOv5m | 21.2M | 49.0B | 45.4 | 8.2 |
| YOLOv5l | 46.5M | 109.1B | 49.0 | 10.1 |
| YOLOv5x | 86.7M | 205.7B | 50.7 | 12.1 |
Model scaling is controlled by depth_multiple (gd) and width_multiple (gw) parameters in YAML configuration files. The parse_model() function applies these multipliers to create scaled versions:
Additional P6 models (yolov5n6 through yolov5x6) process 1280×1280 images for higher accuracy at the cost of increased compute.
Sources: models/yolo.py378-461 README.md252-278
The train() function in train.py105-543 handles:
Sources: train.py105-543 train.py1-15
Inference scripts use DetectMultiBackend from models/common.py344-552 to load models in any supported format. The backend automatically handles:
Sources: models/common.py344-552 detect.py1-270
The hubconf.py file defines model loading functions that automatically download pretrained weights from GitHub releases. Models are loaded using the attempt_load() function from models/experimental.py106-132
Sources: README.md88-107
Diagram: Multi-task architecture with shared components
All three tasks share the same building blocks from models/common.py:
Task-specific differences are implemented in the head layers:
Detect class produces bounding box predictions at 3 scales (P3, P4, P5)Segment class extends Detect with a Proto layer for mask coefficientsClassify class replaces detection head with global pooling and fully connected layerSources: models/yolo.py72-376 models/common.py1-700
The export system supports 11+ formats for diverse deployment targets:
| Format | Script | Use Case | Speed |
|---|---|---|---|
| PyTorch | - | Native format | Baseline |
| TorchScript | export.py --include torchscript | C++ inference | Fast |
| ONNX | export.py --include onnx | Cross-platform | Fast |
| TensorRT | export.py --include engine | NVIDIA GPUs | Fastest |
| CoreML | export.py --include coreml | iOS/macOS | Optimized |
| TensorFlow | export.py --include saved_model pb | TF ecosystem | Variable |
| TFLite | export.py --include tflite | Mobile/Edge | Optimized |
| OpenVINO | export.py --include openvino | Intel CPUs | Fast |
| EdgeTPU | export.py --include edgetpu | Google Coral | Optimized |
| TFJS | export.py --include tfjs | Web browsers | Web-optimized |
| PaddlePaddle | export.py --include paddle | Baidu ecosystem | Variable |
The DetectMultiBackend class in models/common.py344-552 provides a unified interface for loading and running inference with any of these formats. It automatically detects the model format and applies appropriate preprocessing/postprocessing.
Sources: models/common.py344-552 export.py1-100 README.md182