Overview

Relevant source files

YOLOv5 is a PyTorch-based object detection framework developed by Ultralytics that supports three computer vision tasks: object detection, instance segmentation, and image classification. The repository provides a complete pipeline for training, validating, exporting, and deploying models across multiple hardware platforms and inference backends.

This page provides a high-level overview of the YOLOv5 system architecture and its core components. For detailed information about specific subsystems:

Training configuration and workflows: see Training System
Model architecture details: see Model Architecture
Data handling and augmentation: see Data Pipeline
Export formats and deployment: see Model Export and Deployment

Core Capabilities

YOLOv5 supports three primary computer vision tasks, each with dedicated implementations:

Task	Primary Script	Model Types	Output Format
Object Detection	`detect.py`, `train.py`, `val.py`	`DetectionModel`	Bounding boxes (x, y, w, h) + class + confidence
Instance Segmentation	`segment/predict.py`, `segment/train.py`, `segment/val.py`	`SegmentationModel`	Bounding boxes + pixel masks
Image Classification	`classify/predict.py`, `classify/train.py`, `classify/val.py`	`ClassificationModel`	Class probabilities

Each task uses a shared foundation of neural network modules and utilities but has specialized heads and training procedures defined in task-specific directories.

Sources: README.md20 models/yolo.py218-376

System Architecture

Diagram: Core system components and their dependencies

The architecture follows a layered design with clear separation between:

Entry points: User-facing scripts for different operations
Model layer: Neural network definitions and multi-backend support
Data layer: Dataset loading and augmentation
Training infrastructure: Loss computation, metrics, logging, and optimization
Utilities: Shared helper functions

Sources: train.py1-617 models/yolo.py1-497 models/common.py1-700 utils/dataloaders.py1-1300

Repository Structure

The repository is organized into the following key directories:

yolov5/
├── train.py              # Main training script (424.13 importance)
├── detect.py             # Object detection inference
├── val.py                # Validation and benchmarking
├── export.py             # Export to 11+ formats
├── models/
│   ├── yolo.py           # DetectionModel, SegmentationModel, ClassificationModel
│   ├── common.py         # Neural network modules (Conv, C3, SPPF, etc.)
│   ├── tf.py             # TensorFlow/Keras/TFLite versions
│   └── experimental.py   # Experimental features
├── utils/
│   ├── general.py        # General utilities (343.30 importance)
│   ├── torch_utils.py    # PyTorch utilities (ModelEMA, device selection)
│   ├── dataloaders.py    # Data loading and caching
│   ├── augmentations.py  # Image augmentation techniques
│   ├── loss.py           # Loss functions
│   ├── metrics.py        # Evaluation metrics
│   ├── plots.py          # Visualization
│   └── loggers/          # Multi-platform logging
├── data/
│   ├── coco128.yaml      # Dataset configurations
│   └── hyps/             # Hyperparameter configurations
├── segment/              # Instance segmentation task
│   ├── train.py
│   ├── predict.py
│   └── val.py
└── classify/             # Image classification task
    ├── train.py
    ├── predict.py
    └── val.py

The importance values indicate modification frequency, with train.py (424.13) and utils/general.py (343.30) being the most actively maintained components.

Sources: train.py1-10 models/yolo.py1-10 utils/general.py1-10

Model Lifecycle

Diagram: Model lifecycle from definition to deployment

The lifecycle consists of four stages:

Definition: Models are defined via YAML configuration files that specify architecture, depth, and width multipliers. The parse_model() function in models/yolo.py378-461 constructs the neural network layers.
Training: The train() function in train.py105-543 orchestrates the training loop, using create_dataloader() from utils/dataloaders.py116-195 for data loading. Validation runs periodically via val.run(), and checkpoints are saved as best.pt and last.pt.
Export: The export.py script converts trained models to 11+ deployment formats including ONNX, TensorRT, CoreML, TensorFlow Lite, and OpenVINO.
Inference: Models are loaded via DetectMultiBackend in models/common.py344-552 which provides a unified interface for all export formats. Inference can be performed through command-line scripts (detect.py, segment/predict.py, classify/predict.py) or PyTorch Hub.

Sources: models/yolo.py378-461 train.py105-543 models/common.py344-552 export.py1-100

Model Variants and Sizes

YOLOv5 provides five model sizes (n, s, m, l, x) scaling in both accuracy and inference speed:

Model	Parameters	FLOPs @640	COCO mAP^val	Speed V100 (ms)
YOLOv5n	1.9M	4.5B	28.0	6.3
YOLOv5s	7.2M	16.5B	37.4	6.4
YOLOv5m	21.2M	49.0B	45.4	8.2
YOLOv5l	46.5M	109.1B	49.0	10.1
YOLOv5x	86.7M	205.7B	50.7	12.1

Model scaling is controlled by depth_multiple (gd) and width_multiple (gw) parameters in YAML configuration files. The parse_model() function applies these multipliers to create scaled versions:

Additional P6 models (yolov5n6 through yolov5x6) process 1280×1280 images for higher accuracy at the cost of increased compute.

Sources: models/yolo.py378-461 README.md252-278

Key Entry Points

Training

The train() function in train.py105-543 handles:

Model initialization from checkpoint or configuration
DataLoader creation with augmentation and caching
Training loop with gradient accumulation
Validation and metric computation
Checkpoint saving (best.pt, last.pt)
Multi-platform logging (TensorBoard, Weights & Biases, ClearML, Comet ML)

Sources: train.py105-543 train.py1-15

Inference

Inference scripts use DetectMultiBackend from models/common.py344-552 to load models in any supported format. The backend automatically handles:

Model format detection (PyTorch, ONNX, TensorRT, etc.)
Device selection (CPU, CUDA, MPS)
Input preprocessing and output postprocessing
Batch inference optimization

Sources: models/common.py344-552 detect.py1-270

PyTorch Hub

The hubconf.py file defines model loading functions that automatically download pretrained weights from GitHub releases. Models are loaded using the attempt_load() function from models/experimental.py106-132

Sources: README.md88-107

Multi-Task Architecture

Diagram: Multi-task architecture with shared components

All three tasks share the same building blocks from models/common.py:

Conv: Standard convolution with BatchNorm and activation
C3: CSP Bottleneck with 3 convolutions
SPPF: Spatial Pyramid Pooling - Fast
Bottleneck: Residual bottleneck block

Task-specific differences are implemented in the head layers:

Detection: Detect class produces bounding box predictions at 3 scales (P3, P4, P5)
Segmentation: Segment class extends Detect with a Proto layer for mask coefficients
Classification: Classify class replaces detection head with global pooling and fully connected layer

Sources: models/yolo.py72-376 models/common.py1-700

Export and Deployment

The export system supports 11+ formats for diverse deployment targets:

Format	Script	Use Case	Speed
PyTorch	-	Native format	Baseline
TorchScript	`export.py --include torchscript`	C++ inference	Fast
ONNX	`export.py --include onnx`	Cross-platform	Fast
TensorRT	`export.py --include engine`	NVIDIA GPUs	Fastest
CoreML	`export.py --include coreml`	iOS/macOS	Optimized
TensorFlow	`export.py --include saved_model pb`	TF ecosystem	Variable
TFLite	`export.py --include tflite`	Mobile/Edge	Optimized
OpenVINO	`export.py --include openvino`	Intel CPUs	Fast
EdgeTPU	`export.py --include edgetpu`	Google Coral	Optimized
TFJS	`export.py --include tfjs`	Web browsers	Web-optimized
PaddlePaddle	`export.py --include paddle`	Baidu ecosystem	Variable

The DetectMultiBackend class in models/common.py344-552 provides a unified interface for loading and running inference with any of these formats. It automatically detects the model format and applies appropriate preprocessing/postprocessing.

Sources: models/common.py344-552 export.py1-100 README.md182

Overview

Relevant source files

This page provides a high-level overview of the YOLOv5 system architecture and its core components. For detailed information about specific subsystems:

Training configuration and workflows: see Training System
Model architecture details: see Model Architecture
Data handling and augmentation: see Data Pipeline
Export formats and deployment: see Model Export and Deployment

Core Capabilities

YOLOv5 supports three primary computer vision tasks, each with dedicated implementations:

Task	Primary Script	Model Types	Output Format
Object Detection	`detect.py`, `train.py`, `val.py`	`DetectionModel`	Bounding boxes (x, y, w, h) + class + confidence
Instance Segmentation	`segment/predict.py`, `segment/train.py`, `segment/val.py`	`SegmentationModel`	Bounding boxes + pixel masks
Image Classification	`classify/predict.py`, `classify/train.py`, `classify/val.py`	`ClassificationModel`	Class probabilities

Each task uses a shared foundation of neural network modules and utilities but has specialized heads and training procedures defined in task-specific directories.

Sources: README.md20 models/yolo.py218-376

System Architecture

Diagram: Core system components and their dependencies

The architecture follows a layered design with clear separation between:

Entry points: User-facing scripts for different operations
Model layer: Neural network definitions and multi-backend support
Data layer: Dataset loading and augmentation
Training infrastructure: Loss computation, metrics, logging, and optimization
Utilities: Shared helper functions

Sources: train.py1-617 models/yolo.py1-497 models/common.py1-700 utils/dataloaders.py1-1300

Repository Structure

The repository is organized into the following key directories:

yolov5/
├── train.py              # Main training script (424.13 importance)
├── detect.py             # Object detection inference
├── val.py                # Validation and benchmarking
├── export.py             # Export to 11+ formats
├── models/
│   ├── yolo.py           # DetectionModel, SegmentationModel, ClassificationModel
│   ├── common.py         # Neural network modules (Conv, C3, SPPF, etc.)
│   ├── tf.py             # TensorFlow/Keras/TFLite versions
│   └── experimental.py   # Experimental features
├── utils/
│   ├── general.py        # General utilities (343.30 importance)
│   ├── torch_utils.py    # PyTorch utilities (ModelEMA, device selection)
│   ├── dataloaders.py    # Data loading and caching
│   ├── augmentations.py  # Image augmentation techniques
│   ├── loss.py           # Loss functions
│   ├── metrics.py        # Evaluation metrics
│   ├── plots.py          # Visualization
│   └── loggers/          # Multi-platform logging
├── data/
│   ├── coco128.yaml      # Dataset configurations
│   └── hyps/             # Hyperparameter configurations
├── segment/              # Instance segmentation task
│   ├── train.py
│   ├── predict.py
│   └── val.py
└── classify/             # Image classification task
    ├── train.py
    ├── predict.py
    └── val.py

The importance values indicate modification frequency, with train.py (424.13) and utils/general.py (343.30) being the most actively maintained components.

Sources: train.py1-10 models/yolo.py1-10 utils/general.py1-10

Model Lifecycle

Diagram: Model lifecycle from definition to deployment

The lifecycle consists of four stages:

Definition: Models are defined via YAML configuration files that specify architecture, depth, and width multipliers. The parse_model() function in models/yolo.py378-461 constructs the neural network layers.
Training: The train() function in train.py105-543 orchestrates the training loop, using create_dataloader() from utils/dataloaders.py116-195 for data loading. Validation runs periodically via val.run(), and checkpoints are saved as best.pt and last.pt.
Export: The export.py script converts trained models to 11+ deployment formats including ONNX, TensorRT, CoreML, TensorFlow Lite, and OpenVINO.
Inference: Models are loaded via DetectMultiBackend in models/common.py344-552 which provides a unified interface for all export formats. Inference can be performed through command-line scripts (detect.py, segment/predict.py, classify/predict.py) or PyTorch Hub.

Sources: models/yolo.py378-461 train.py105-543 models/common.py344-552 export.py1-100

Model Variants and Sizes

YOLOv5 provides five model sizes (n, s, m, l, x) scaling in both accuracy and inference speed:

Model	Parameters	FLOPs @640	COCO mAP^val	Speed V100 (ms)
YOLOv5n	1.9M	4.5B	28.0	6.3
YOLOv5s	7.2M	16.5B	37.4	6.4
YOLOv5m	21.2M	49.0B	45.4	8.2
YOLOv5l	46.5M	109.1B	49.0	10.1
YOLOv5x	86.7M	205.7B	50.7	12.1

Additional P6 models (yolov5n6 through yolov5x6) process 1280×1280 images for higher accuracy at the cost of increased compute.

Sources: models/yolo.py378-461 README.md252-278

Key Entry Points

Training

The train() function in train.py105-543 handles:

Model initialization from checkpoint or configuration
DataLoader creation with augmentation and caching
Training loop with gradient accumulation
Validation and metric computation
Checkpoint saving (best.pt, last.pt)
Multi-platform logging (TensorBoard, Weights & Biases, ClearML, Comet ML)

Sources: train.py105-543 train.py1-15

Inference

Inference scripts use DetectMultiBackend from models/common.py344-552 to load models in any supported format. The backend automatically handles:

Model format detection (PyTorch, ONNX, TensorRT, etc.)
Device selection (CPU, CUDA, MPS)
Input preprocessing and output postprocessing
Batch inference optimization

Sources: models/common.py344-552 detect.py1-270

PyTorch Hub

Sources: README.md88-107

Multi-Task Architecture

Diagram: Multi-task architecture with shared components

All three tasks share the same building blocks from models/common.py:

Conv: Standard convolution with BatchNorm and activation
C3: CSP Bottleneck with 3 convolutions
SPPF: Spatial Pyramid Pooling - Fast
Bottleneck: Residual bottleneck block

Task-specific differences are implemented in the head layers:

Detection: Detect class produces bounding box predictions at 3 scales (P3, P4, P5)
Segmentation: Segment class extends Detect with a Proto layer for mask coefficients
Classification: Classify class replaces detection head with global pooling and fully connected layer

Sources: models/yolo.py72-376 models/common.py1-700

Export and Deployment

The export system supports 11+ formats for diverse deployment targets:

Format	Script	Use Case	Speed
PyTorch	-	Native format	Baseline
TorchScript	`export.py --include torchscript`	C++ inference	Fast
ONNX	`export.py --include onnx`	Cross-platform	Fast
TensorRT	`export.py --include engine`	NVIDIA GPUs	Fastest
CoreML	`export.py --include coreml`	iOS/macOS	Optimized
TensorFlow	`export.py --include saved_model pb`	TF ecosystem	Variable
TFLite	`export.py --include tflite`	Mobile/Edge	Optimized
OpenVINO	`export.py --include openvino`	Intel CPUs	Fast
EdgeTPU	`export.py --include edgetpu`	Google Coral	Optimized
TFJS	`export.py --include tfjs`	Web browsers	Web-optimized
PaddlePaddle	`export.py --include paddle`	Baidu ecosystem	Variable

Sources: models/common.py344-552 export.py1-100 README.md182

Overview

Core Capabilities

System Architecture

Repository Structure

Model Lifecycle

Model Variants and Sizes

Key Entry Points

Training

Inference

PyTorch Hub

Multi-Task Architecture

Export and Deployment

On this page

Overview

Core Capabilities

System Architecture

Repository Structure

Model Lifecycle

Model Variants and Sizes

Key Entry Points

Training

Inference

PyTorch Hub

Multi-Task Architecture

Export and Deployment

On this page