Overview

Relevant source files

Purpose and Scope

ncnn is a high-performance neural network inference framework optimized for mobile and embedded platforms. It provides efficient CPU and GPU-accelerated neural network execution with minimal dependencies and cross-platform support. This framework is designed for deployment rather than training, focusing on running pre-trained models efficiently on resource-constrained devices.

ncnn supports deployment across mobile (Android, iOS), desktop (Linux, Windows, macOS), and embedded platforms (Raspberry Pi, NVIDIA Jetson, WebAssembly). The framework accepts models from multiple deep learning frameworks through conversion tools and executes them using optimized CPU kernels or Vulkan GPU compute shaders.

For build instructions and platform-specific details, see Platform Support and Build System. For quick start examples, see Getting Started Guide.

Sources: README.md9-21 README.md590-616

Key Design Principles

ncnn is built around several core design principles that distinguish it from other inference frameworks:

Zero External Dependencies: The framework has no mandatory third-party library dependencies (BLAS, NNPACK, etc.). Optional dependencies include Vulkan SDK for GPU support, OpenMP for threading, and protobuf for model conversion tools only.

Mobile-First Optimization: Architecture-specific optimizations for ARM NEON, x86 AVX/AVX512, RISC-V RVV, and LoongArch LSX/LASX instruction sets. Runtime CPU feature detection enables optimal code path selection without requiring multiple binaries.

Lightweight Binary Size: Minimal runtime footprint through optional feature compilation (NCNN_STDIO, NCNN_STRING, NCNN_PIXEL), layer registry generation, and static linking support.

Memory Efficiency: Custom allocator system (PoolAllocator, UnlockedPoolAllocator) with budget-based memory pooling reduces allocation overhead. Supports in-place operations and blob memory recycling during inference.

Flexible Precision Support: Runtime switching between FP32, FP16 (ARM and Vulkan), BF16, and INT8 quantized inference without model recompilation. Quantization calibration tools (ncnn2table, ncnn2int8) enable post-training quantization.

Sources: README.md590-616 src/platform.h.in1-60 CMakeLists.txt61-93

Supported Model Architectures

ncnn is a pure inference framework; it does not train models. Pre-trained models from external frameworks are converted to ncnn's .param/.bin format and executed using the registered layer set. Any model expressible with the registered operations is supported.

Category	Example Architectures
Image Classification	VGG, AlexNet, GoogLeNet, ResNet, DenseNet, SENet, MobileNetV1/V2/V3, SqueezeNet, ShuffleNetV1/V2, MNasNet, EfficientNet
Object Detection	YOLOv2–v8, YOLOX, NanoDet, MobileNet-SSD, VGG-SSD, Faster-RCNN, R-FCN
Face Detection	MTCNN, RetinaFace, SCRFD
Segmentation	FCN, PSPNet, UNet, YOLACT
Pose Estimation	SimplePose
Speech / Audio	Whisper (via `examples/`)
Transformer Inference	Models using `MultiHeadAttention`, `SDPA`, `RMSNorm`, `RotaryEmbed` layers

The complete operation set is defined by ncnn_add_layer() calls in src/CMakeLists.txt, registering over 100 layer types including Convolution, ConvolutionDepthWise, Gemm, MultiHeadAttention, SDPA, RotaryEmbed, LSTM, GRU, Spectrogram, and InverseSpectrogram.

Sources: README.md498-555 src/CMakeLists.txt66-176

System Architecture

System Architecture: Model-to-Execution Pipeline

External models are converted to ncnn's native .param/.bin format, optionally optimized or quantized, then loaded by ncnn::Net. The ncnn::Extractor drives forward passes through the layer DAG, dispatching each ncnn::Layer to CPU or Vulkan GPU implementations.

Sources: src/net.h27-164 src/layer.h20-135 src/mat.h1-100 src/CMakeLists.txt17-45

Core Components

Model Representation

ncnn uses a two-file format for model storage:

Parameter File (.param or .param.bin): Network structure definition including layer types, dimensions, and connections. Plain text .param files are human-readable; binary .param.bin files are more compact.
Binary File (.bin): Raw weight data stored as floating-point or quantized values with 32-bit alignment.

The ncnn::Net class loads these files through multiple interfaces:

Model Loading Architecture

The DataReader abstraction enables unified loading from files, memory, or platform-specific sources. The Net::load_param() methods parse network structure, while Net::load_model() loads weights.

Sources: src/net.h58-125 docs/how-to-use-and-FAQ/ncnn-load-model.md1-9

Runtime System

The inference runtime consists of ncnn::Net (network container) and ncnn::Extractor (inference session):

Runtime Execution Flow

ncnn::Net stores the network structure and creates ncnn::Extractor instances for concurrent inference sessions. Each Extractor maintains its own blob storage and can recycle memory when lightmode is enabled. The ncnn::Option object controls threading, precision modes, and allocator selection.

Sources: src/net.h27-164 src/net.h166-233 src/option.h1-100

Layer System

ncnn implements neural network operations through a layer abstraction with automatic registration:

Layer System Architecture

The CMake macro ncnn_add_layer() automatically generates layer registration code during build. For each layer, architecture-specific variants are compiled with appropriate ISA flags and dispatched at runtime based on CPU features.

Sources: src/layer.h20-155 src/layer.cpp1-200 cmake/ncnn_add_layer.cmake82-150 src/CMakeLists.txt62-176

Memory Management

ncnn provides custom allocators to minimize allocation overhead during inference:

Memory Allocator System

PoolAllocator maintains multiple size classes with free lists to avoid repeated malloc/free calls. VkBlobAllocator manages GPU device memory, while VkStagingAllocator handles CPU-visible staging buffers for data transfer. The ncnn::Option object allows per-network or per-extractor allocator configuration.

Sources: src/allocator.h1-80 src/option.h1-100 src/c_api.cpp47-139

Data Structures

ncnn uses ncnn::Mat for CPU tensors and ncnn::VkMat for GPU buffers:

Tensor Data Structures

ncnn::Mat stores CPU tensors with support for SIMD packing (1/4/8 elements per pack). The elemsize field includes packing: for elempack=4 FP32, elemsize=16 bytes. VkMat wraps Vulkan buffers for GPU computation, while VkImageMat uses image storage for texture sampling operations.

Sources: src/mat.h1-200 src/c_api.h79-140 python/tests/test_mat.py10-50

Execution Backends

CPU Backend

The CPU backend provides architecture-specific optimized implementations:

CPU Backend Architecture

The cpu.cpp module detects available CPU features at initialization. Layer creators are generated for each ISA variant during build, and runtime dispatch selects the most optimized implementation. For convolution, this includes im2col+GEMM, Winograd transforms, and direct convolution strategies.

Sources: src/cpu.cpp1-300 src/cpu.h42-100 cmake/ncnn_add_layer.cmake1-100

Vulkan GPU Backend

The Vulkan backend uses compute shaders for GPU acceleration:

Vulkan GPU Backend Architecture

VulkanDevice encapsulates device initialization and queue management. VkCompute records command buffers that upload input data, bind pipelines, dispatch compute shaders, and download results. Shaders are compiled from GLSL to SPIR-V at build time and embedded in the binary. Each Vulkan layer implements create_pipeline() to set up its compute pipeline and forward() to record commands.

Sources: src/gpu.h1-100 src/command.h1-150 src/pipeline.h1-100 src/layer/vulkan/convolution_vulkan.cpp1-100

Language Bindings

C++ API

The primary C++ API provides the ncnn::Net, ncnn::Extractor, and ncnn::Mat classes documented in C++ API Reference.

Sources: src/net.h27-233

C API

The C API wraps C++ classes behind opaque pointers for C compatibility:

C API Design

The C API defined in src/c_api.h provides a procedural interface to the C++ classes. All C++ objects are accessed through opaque typedef pointers (e.g., ncnn_net_t), and functions follow the naming pattern ncnn_<class>_<method>(). This enables integration with C codebases and foreign function interfaces.

Sources: src/c_api.h1-250 src/c_api.cpp1-500 examples/squeezenet_c_api.cpp1-100

Python Bindings

Python bindings use pybind11 to expose ncnn classes:

Python Bindings Architecture

The pybind11-based bindings provide Pythonic access to ncnn functionality with automatic type conversion between numpy.ndarray and ncnn::Mat. Memory is managed through Python reference counting with proper lifetime handling.

Sources: python/src/main.cpp1-200 python/src/pybind11_mat.h1-100 python/tests/test_mat.py1-50

Configuration System

The ncnn::Option class controls runtime behavior:

Field	Type	Default	Description
`num_threads`	`int`	Physical CPU count	OpenMP thread count
`blob_allocator`	`Allocator*`	`NULL`	Intermediate blob allocator
`workspace_allocator`	`Allocator*`	`NULL`	Temporary workspace allocator
`use_vulkan_compute`	`bool`	`false`	Enable Vulkan GPU backend
`use_packing_layout`	`bool`	`true`	Enable SIMD packing (4/8-way)
`use_fp16_packed`	`bool`	`true`	Enable FP16 packed computation
`use_fp16_storage`	`bool`	`true`	Enable FP16 weight storage
`use_fp16_arithmetic`	`bool`	`false`	Enable FP16 arithmetic (ARM/Vulkan)
`use_int8_storage`	`bool`	`true`	Enable INT8 weight storage
`use_int8_arithmetic`	`bool`	`false`	Enable INT8 arithmetic
`use_bf16_storage`	`bool`	`true`	Enable BF16 weight storage
`use_winograd_convolution`	`bool`	`true`	Enable Winograd transform for 3x3 conv
`use_sgemm_convolution`	`bool`	`true`	Enable im2col+GEMM convolution

The Option object is passed to Net::load_param(), Layer::create_pipeline(), and Layer::forward() to control behavior at load time, pipeline creation, and inference.

Sources: src/option.h1-100 src/c_api.cpp141-350 python/tests/test_option.py1-80

Platform Support

ncnn supports the following platform families:

Mobile: Android (ARMv7, ARM64, x86, x86_64), iOS (ARM64, Simulator), HarmonyOS

Desktop: Linux (x86_64, ARM64, RISC-V, LoongArch, PowerPC), Windows (x86, x86_64, ARM64, XP), macOS (x86_64, ARM64)

Embedded: Raspberry Pi, NVIDIA Jetson, AllWinner D1, Loongson 2K1000

Web: WebAssembly (Emscripten)

Platform-specific details including toolchain configuration, cross-compilation instructions, and build options are documented in Platform Support and Build System.

Sources: README.md65-490 docs/how-to-build/how-to-build.md1-800

Build System

ncnn uses CMake with custom macros for layer registration and shader compilation:

Key CMake Options:

NCNN_VULKAN: Enable Vulkan GPU support
NCNN_OPENMP: Enable OpenMP multi-threading
NCNN_INT8: Enable INT8 quantization support
NCNN_PIXEL: Enable pixel format conversion utilities
NCNN_BUILD_TOOLS: Build model conversion tools
NCNN_BUILD_EXAMPLES: Build example applications
NCNN_PYTHON: Build Python bindings
NCNN_SHARED_LIB: Build shared library instead of static

The build system automatically detects CPU features and generates architecture-specific layer variants. GLSL compute shaders are compiled to SPIR-V and embedded into the binary during build.

Sources: CMakeLists.txt1-200 src/CMakeLists.txt1-200 cmake/ncnn_add_layer.cmake1-100

Wiki Structure

The table below maps major wiki sections to the systems and source files they document.

Section	Title	Key Code Entities
1.1	Platform Support and Build System	`CMakeLists.txt`, toolchain files, CMake options
1.2	Getting Started Guide	`Net::load_param`, `Net::load_model`, `Extractor::extract`, `ncnnConfig.cmake`
2	Core Runtime Architecture	`Net`, `Layer`, `Mat`, `Allocator`, `Option`
2.1	Network Loading and Inference Pipeline	`Net::load_param`, `Net::load_model`, `ModelBin`, `DataReader`, `forward_layer`
2.2	Layer System and Registry	`Layer`, `layer_registry`, `ncnn_add_layer` macro, `layer_registry.h`
2.3	Mat and Tensor Data Structures	`Mat`, `VkMat`, `VkImageMat`, `elempack`, `cstep`, `elemsize`
2.4	Memory Allocators	`PoolAllocator`, `UnlockedPoolAllocator`, `VkAllocator`, `VkBlobAllocator`
2.5	Option and Configuration System	`Option` struct fields, precision flags, allocator hooks
3	GPU and Vulkan System	`VulkanDevice`, `VkCompute`, `Pipeline`, SPIR-V shaders
3.1	Vulkan Instance and Device Management	`create_gpu_instance`, `GpuInfo`, `get_default_gpu_index`
3.2	Command Recording and Pipeline Execution	`VkCompute`, `record_pipeline`, `submit_and_wait`, `Pipeline`
3.3	GPU Memory and Data Transfer	`VkBufferMemory`, `VkStagingAllocator`, `VkTransfer`, `record_upload`
3.4	Vulkan Layer Implementations	`Convolution_vulkan`, `create_pipeline`, `upload_model`, pack4/pack8 variants
4	CPU Layer Implementations	Per-operation source files in `src/layer/` and arch subdirectories
4.1	Convolution and Deconvolution Layers	`Convolution`, `Convolution_arm`, `Convolution_x86`, Winograd, im2col-GEMM
4.2	Depthwise Convolution Layers	`ConvolutionDepthWise`, `ConvolutionDepthWise_arm`, `ConvolutionDepthWise_x86`
4.3	InnerProduct and GEMM	`InnerProduct`, `Gemm`, `Gemm_arm`, `Gemm_x86`, tiled GEMM
4.4	Pooling, Normalization, and Activation	`Pooling`, `BinaryOp`, `UnaryOp`, `GELU`
4.5	Reshape, Crop, and Interpolation	`Reshape`, `Crop`, `Interp`, `MemoryData`, expression blobs
4.6	Image and Pixel Processing	`Mat::from_pixels`, `Mat::to_pixels`, `Cast`, YUV conversion, warpaffine
5	Platform-Specific Optimizations	SIMD packing, intrinsics, reduced-precision paths
5.1	Runtime CPU Detection and Dispatch	`cpu.h`, `cpu_support_arm_neon`, `NCNN_AVX2`, runtime dispatch
5.2	ARM NEON Optimizations	pack4/pack8, `neon_mathfun.h`, FP16/BF16 kernels
5.3	x86 SIMD Optimizations	`x86_usability.h`, `avx_mathfun.h`, AVX/AVX512 kernels
5.4	INT8 Quantization and Precision Modes	`Quantize`, `Dequantize`, `Requantize`, `float2int8`, `scale_data`
5.5	RISC-V and Other Architecture Backends	`riscv_usability.h`, MIPS MSA, LoongArch LSX
6	Model Conversion and Optimization	`pnnx`, `caffe2ncnn`, `onnx2ncnn`, `.param` format
6.1	Model Conversion Overview	Conversion paths, `.param`/`.bin` format, `ncnnoptimize` workflow
6.2	PNNX PyTorch Converter Architecture	PNNX build, `main.cpp`, pass_level0–pass_level5, Graph IR
6.3	PNNX Intermediate Representation	`Graph`, `Operator`, `Operand`, `Parameter`, `Attribute`, `save_ncnn`
6.4	ONNX, MXNet, and Caffe Converters	`onnx2ncnn`, `mxnet2ncnn`, `caffe2ncnn`, `ModelWriter`, `ParamDict`
6.5	Model Optimization Tools	`ncnnoptimize` fusion passes, FP16 conversion, `ncnn2mem`
7	APIs and Integration	C++, C, and Python API reference
7.1	C++ API Reference	`Net`, `Extractor`, `Mat`, `find_package(ncnn)`, `ncnnConfig.cmake`
7.2	C API and Python Bindings	`c_api.h`, `ncnn_net_t`, `ncnn_mat_t`, pybind11 module
8	Tools and Utilities	`benchncnn`, quantization tools, example applications
8.1	Benchmarking System	`benchncnn`, `ncnn_add_param`, timing methodology
8.2	Post-Training Quantization Tools	`ncnn2table`, `QuantNet`, KL divergence calibration, `ncnn2int8`
8.3	Example Applications	YOLOv8, Whisper, ArcFace, NMS, preprocessing patterns
9	Development and Testing	Testing approach, CI/CD, release procedures
9.1	Testing Framework	`testutil.h`, `test_layer`, `RandomMat`, `CompareMat`
9.2	CI/CD Pipeline	GitHub Actions workflows, SwiftShader Vulkan, cross-platform test matrix
9.3	Release Process	`cibuildwheel`, Python wheel builds, PyPI publish workflow

Version Information

Version information is accessible at compile-time and runtime:

Compile-time: NCNN_VERSION_STRING macro (format: "1.0.YYYYMMDD"), NCNN_VERSION_NUMBER macro
Runtime C++: N/A (version is compile-time constant)
Runtime C API: ncnn_version() returns string, ncnn_version_number() returns integer
Runtime Python: ncnn.__version__ attribute

Version format uses YYYYMMDD date stamps (e.g., "20260113") as the patch version.

Sources: src/platform.h.in62-63 src/c_api.cpp37-45

Overview

Relevant source files

Purpose and Scope

For build instructions and platform-specific details, see Platform Support and Build System. For quick start examples, see Getting Started Guide.

Sources: README.md9-21 README.md590-616

Key Design Principles

ncnn is built around several core design principles that distinguish it from other inference frameworks:

Lightweight Binary Size: Minimal runtime footprint through optional feature compilation (NCNN_STDIO, NCNN_STRING, NCNN_PIXEL), layer registry generation, and static linking support.

Sources: README.md590-616 src/platform.h.in1-60 CMakeLists.txt61-93

Supported Model Architectures

Category	Example Architectures
Image Classification	VGG, AlexNet, GoogLeNet, ResNet, DenseNet, SENet, MobileNetV1/V2/V3, SqueezeNet, ShuffleNetV1/V2, MNasNet, EfficientNet
Object Detection	YOLOv2–v8, YOLOX, NanoDet, MobileNet-SSD, VGG-SSD, Faster-RCNN, R-FCN
Face Detection	MTCNN, RetinaFace, SCRFD
Segmentation	FCN, PSPNet, UNet, YOLACT
Pose Estimation	SimplePose
Speech / Audio	Whisper (via `examples/`)
Transformer Inference	Models using `MultiHeadAttention`, `SDPA`, `RMSNorm`, `RotaryEmbed` layers

Sources: README.md498-555 src/CMakeLists.txt66-176

System Architecture

System Architecture: Model-to-Execution Pipeline

Sources: src/net.h27-164 src/layer.h20-135 src/mat.h1-100 src/CMakeLists.txt17-45

Core Components

Model Representation

ncnn uses a two-file format for model storage:

Parameter File (.param or .param.bin): Network structure definition including layer types, dimensions, and connections. Plain text .param files are human-readable; binary .param.bin files are more compact.
Binary File (.bin): Raw weight data stored as floating-point or quantized values with 32-bit alignment.

The ncnn::Net class loads these files through multiple interfaces:

Model Loading Architecture

Sources: src/net.h58-125 docs/how-to-use-and-FAQ/ncnn-load-model.md1-9

Runtime System

The inference runtime consists of ncnn::Net (network container) and ncnn::Extractor (inference session):

Runtime Execution Flow

Sources: src/net.h27-164 src/net.h166-233 src/option.h1-100

Layer System

ncnn implements neural network operations through a layer abstraction with automatic registration:

Layer System Architecture

Sources: src/layer.h20-155 src/layer.cpp1-200 cmake/ncnn_add_layer.cmake82-150 src/CMakeLists.txt62-176

Memory Management

ncnn provides custom allocators to minimize allocation overhead during inference:

Memory Allocator System

Sources: src/allocator.h1-80 src/option.h1-100 src/c_api.cpp47-139

Data Structures

ncnn uses ncnn::Mat for CPU tensors and ncnn::VkMat for GPU buffers:

Tensor Data Structures

Sources: src/mat.h1-200 src/c_api.h79-140 python/tests/test_mat.py10-50

Execution Backends

CPU Backend

The CPU backend provides architecture-specific optimized implementations:

CPU Backend Architecture

Sources: src/cpu.cpp1-300 src/cpu.h42-100 cmake/ncnn_add_layer.cmake1-100

Vulkan GPU Backend

The Vulkan backend uses compute shaders for GPU acceleration:

Vulkan GPU Backend Architecture

Sources: src/gpu.h1-100 src/command.h1-150 src/pipeline.h1-100 src/layer/vulkan/convolution_vulkan.cpp1-100

Language Bindings

C++ API

The primary C++ API provides the ncnn::Net, ncnn::Extractor, and ncnn::Mat classes documented in C++ API Reference.

Sources: src/net.h27-233

C API

The C API wraps C++ classes behind opaque pointers for C compatibility:

C API Design

Sources: src/c_api.h1-250 src/c_api.cpp1-500 examples/squeezenet_c_api.cpp1-100

Python Bindings

Python bindings use pybind11 to expose ncnn classes:

Python Bindings Architecture

Sources: python/src/main.cpp1-200 python/src/pybind11_mat.h1-100 python/tests/test_mat.py1-50

Configuration System

The ncnn::Option class controls runtime behavior:

Field	Type	Default	Description
`num_threads`	`int`	Physical CPU count	OpenMP thread count
`blob_allocator`	`Allocator*`	`NULL`	Intermediate blob allocator
`workspace_allocator`	`Allocator*`	`NULL`	Temporary workspace allocator
`use_vulkan_compute`	`bool`	`false`	Enable Vulkan GPU backend
`use_packing_layout`	`bool`	`true`	Enable SIMD packing (4/8-way)
`use_fp16_packed`	`bool`	`true`	Enable FP16 packed computation
`use_fp16_storage`	`bool`	`true`	Enable FP16 weight storage
`use_fp16_arithmetic`	`bool`	`false`	Enable FP16 arithmetic (ARM/Vulkan)
`use_int8_storage`	`bool`	`true`	Enable INT8 weight storage
`use_int8_arithmetic`	`bool`	`false`	Enable INT8 arithmetic
`use_bf16_storage`	`bool`	`true`	Enable BF16 weight storage
`use_winograd_convolution`	`bool`	`true`	Enable Winograd transform for 3x3 conv
`use_sgemm_convolution`	`bool`	`true`	Enable im2col+GEMM convolution

The Option object is passed to Net::load_param(), Layer::create_pipeline(), and Layer::forward() to control behavior at load time, pipeline creation, and inference.

Sources: src/option.h1-100 src/c_api.cpp141-350 python/tests/test_option.py1-80

Platform Support

ncnn supports the following platform families:

Mobile: Android (ARMv7, ARM64, x86, x86_64), iOS (ARM64, Simulator), HarmonyOS

Desktop: Linux (x86_64, ARM64, RISC-V, LoongArch, PowerPC), Windows (x86, x86_64, ARM64, XP), macOS (x86_64, ARM64)

Embedded: Raspberry Pi, NVIDIA Jetson, AllWinner D1, Loongson 2K1000

Web: WebAssembly (Emscripten)

Platform-specific details including toolchain configuration, cross-compilation instructions, and build options are documented in Platform Support and Build System.

Sources: README.md65-490 docs/how-to-build/how-to-build.md1-800

Build System

ncnn uses CMake with custom macros for layer registration and shader compilation:

Key CMake Options:

NCNN_VULKAN: Enable Vulkan GPU support
NCNN_OPENMP: Enable OpenMP multi-threading
NCNN_INT8: Enable INT8 quantization support
NCNN_PIXEL: Enable pixel format conversion utilities
NCNN_BUILD_TOOLS: Build model conversion tools
NCNN_BUILD_EXAMPLES: Build example applications
NCNN_PYTHON: Build Python bindings
NCNN_SHARED_LIB: Build shared library instead of static

The build system automatically detects CPU features and generates architecture-specific layer variants. GLSL compute shaders are compiled to SPIR-V and embedded into the binary during build.

Sources: CMakeLists.txt1-200 src/CMakeLists.txt1-200 cmake/ncnn_add_layer.cmake1-100

Wiki Structure

The table below maps major wiki sections to the systems and source files they document.

Section	Title	Key Code Entities
1.1	Platform Support and Build System	`CMakeLists.txt`, toolchain files, CMake options
1.2	Getting Started Guide	`Net::load_param`, `Net::load_model`, `Extractor::extract`, `ncnnConfig.cmake`
2	Core Runtime Architecture	`Net`, `Layer`, `Mat`, `Allocator`, `Option`
2.1	Network Loading and Inference Pipeline	`Net::load_param`, `Net::load_model`, `ModelBin`, `DataReader`, `forward_layer`
2.2	Layer System and Registry	`Layer`, `layer_registry`, `ncnn_add_layer` macro, `layer_registry.h`
2.3	Mat and Tensor Data Structures	`Mat`, `VkMat`, `VkImageMat`, `elempack`, `cstep`, `elemsize`
2.4	Memory Allocators	`PoolAllocator`, `UnlockedPoolAllocator`, `VkAllocator`, `VkBlobAllocator`
2.5	Option and Configuration System	`Option` struct fields, precision flags, allocator hooks
3	GPU and Vulkan System	`VulkanDevice`, `VkCompute`, `Pipeline`, SPIR-V shaders
3.1	Vulkan Instance and Device Management	`create_gpu_instance`, `GpuInfo`, `get_default_gpu_index`
3.2	Command Recording and Pipeline Execution	`VkCompute`, `record_pipeline`, `submit_and_wait`, `Pipeline`
3.3	GPU Memory and Data Transfer	`VkBufferMemory`, `VkStagingAllocator`, `VkTransfer`, `record_upload`
3.4	Vulkan Layer Implementations	`Convolution_vulkan`, `create_pipeline`, `upload_model`, pack4/pack8 variants
4	CPU Layer Implementations	Per-operation source files in `src/layer/` and arch subdirectories
4.1	Convolution and Deconvolution Layers	`Convolution`, `Convolution_arm`, `Convolution_x86`, Winograd, im2col-GEMM
4.2	Depthwise Convolution Layers	`ConvolutionDepthWise`, `ConvolutionDepthWise_arm`, `ConvolutionDepthWise_x86`
4.3	InnerProduct and GEMM	`InnerProduct`, `Gemm`, `Gemm_arm`, `Gemm_x86`, tiled GEMM
4.4	Pooling, Normalization, and Activation	`Pooling`, `BinaryOp`, `UnaryOp`, `GELU`
4.5	Reshape, Crop, and Interpolation	`Reshape`, `Crop`, `Interp`, `MemoryData`, expression blobs
4.6	Image and Pixel Processing	`Mat::from_pixels`, `Mat::to_pixels`, `Cast`, YUV conversion, warpaffine
5	Platform-Specific Optimizations	SIMD packing, intrinsics, reduced-precision paths
5.1	Runtime CPU Detection and Dispatch	`cpu.h`, `cpu_support_arm_neon`, `NCNN_AVX2`, runtime dispatch
5.2	ARM NEON Optimizations	pack4/pack8, `neon_mathfun.h`, FP16/BF16 kernels
5.3	x86 SIMD Optimizations	`x86_usability.h`, `avx_mathfun.h`, AVX/AVX512 kernels
5.4	INT8 Quantization and Precision Modes	`Quantize`, `Dequantize`, `Requantize`, `float2int8`, `scale_data`
5.5	RISC-V and Other Architecture Backends	`riscv_usability.h`, MIPS MSA, LoongArch LSX
6	Model Conversion and Optimization	`pnnx`, `caffe2ncnn`, `onnx2ncnn`, `.param` format
6.1	Model Conversion Overview	Conversion paths, `.param`/`.bin` format, `ncnnoptimize` workflow
6.2	PNNX PyTorch Converter Architecture	PNNX build, `main.cpp`, pass_level0–pass_level5, Graph IR
6.3	PNNX Intermediate Representation	`Graph`, `Operator`, `Operand`, `Parameter`, `Attribute`, `save_ncnn`
6.4	ONNX, MXNet, and Caffe Converters	`onnx2ncnn`, `mxnet2ncnn`, `caffe2ncnn`, `ModelWriter`, `ParamDict`
6.5	Model Optimization Tools	`ncnnoptimize` fusion passes, FP16 conversion, `ncnn2mem`
7	APIs and Integration	C++, C, and Python API reference
7.1	C++ API Reference	`Net`, `Extractor`, `Mat`, `find_package(ncnn)`, `ncnnConfig.cmake`
7.2	C API and Python Bindings	`c_api.h`, `ncnn_net_t`, `ncnn_mat_t`, pybind11 module
8	Tools and Utilities	`benchncnn`, quantization tools, example applications
8.1	Benchmarking System	`benchncnn`, `ncnn_add_param`, timing methodology
8.2	Post-Training Quantization Tools	`ncnn2table`, `QuantNet`, KL divergence calibration, `ncnn2int8`
8.3	Example Applications	YOLOv8, Whisper, ArcFace, NMS, preprocessing patterns
9	Development and Testing	Testing approach, CI/CD, release procedures
9.1	Testing Framework	`testutil.h`, `test_layer`, `RandomMat`, `CompareMat`
9.2	CI/CD Pipeline	GitHub Actions workflows, SwiftShader Vulkan, cross-platform test matrix
9.3	Release Process	`cibuildwheel`, Python wheel builds, PyPI publish workflow

Version Information

Version information is accessible at compile-time and runtime:

Compile-time: NCNN_VERSION_STRING macro (format: "1.0.YYYYMMDD"), NCNN_VERSION_NUMBER macro
Runtime C++: N/A (version is compile-time constant)
Runtime C API: ncnn_version() returns string, ncnn_version_number() returns integer
Runtime Python: ncnn.__version__ attribute

Version format uses YYYYMMDD date stamps (e.g., "20260113") as the patch version.

Sources: src/platform.h.in62-63 src/c_api.cpp37-45

Overview

Purpose and Scope

Key Design Principles

Supported Model Architectures

System Architecture

Core Components

Model Representation

Runtime System

Layer System

Memory Management

Data Structures

Execution Backends

CPU Backend

Vulkan GPU Backend

Language Bindings

C++ API

C API

Python Bindings

Configuration System

Platform Support

Build System

Wiki Structure

Version Information

On this page

Overview

Purpose and Scope

Key Design Principles

Supported Model Architectures

System Architecture

Core Components

Model Representation

Runtime System

Layer System

Memory Management

Data Structures

Execution Backends

CPU Backend

Vulkan GPU Backend

Language Bindings

C++ API

C API

Python Bindings

Configuration System

Platform Support

Build System

Wiki Structure

Version Information

On this page