ncnn is a high-performance neural network inference framework optimized for mobile and embedded platforms. It provides efficient CPU and GPU-accelerated neural network execution with minimal dependencies and cross-platform support. This framework is designed for deployment rather than training, focusing on running pre-trained models efficiently on resource-constrained devices.
ncnn supports deployment across mobile (Android, iOS), desktop (Linux, Windows, macOS), and embedded platforms (Raspberry Pi, NVIDIA Jetson, WebAssembly). The framework accepts models from multiple deep learning frameworks through conversion tools and executes them using optimized CPU kernels or Vulkan GPU compute shaders.
For build instructions and platform-specific details, see Platform Support and Build System. For quick start examples, see Getting Started Guide.
Sources: README.md9-21 README.md590-616
ncnn is built around several core design principles that distinguish it from other inference frameworks:
Zero External Dependencies: The framework has no mandatory third-party library dependencies (BLAS, NNPACK, etc.). Optional dependencies include Vulkan SDK for GPU support, OpenMP for threading, and protobuf for model conversion tools only.
Mobile-First Optimization: Architecture-specific optimizations for ARM NEON, x86 AVX/AVX512, RISC-V RVV, and LoongArch LSX/LASX instruction sets. Runtime CPU feature detection enables optimal code path selection without requiring multiple binaries.
Lightweight Binary Size: Minimal runtime footprint through optional feature compilation (NCNN_STDIO, NCNN_STRING, NCNN_PIXEL), layer registry generation, and static linking support.
Memory Efficiency: Custom allocator system (PoolAllocator, UnlockedPoolAllocator) with budget-based memory pooling reduces allocation overhead. Supports in-place operations and blob memory recycling during inference.
Flexible Precision Support: Runtime switching between FP32, FP16 (ARM and Vulkan), BF16, and INT8 quantized inference without model recompilation. Quantization calibration tools (ncnn2table, ncnn2int8) enable post-training quantization.
Sources: README.md590-616 src/platform.h.in1-60 CMakeLists.txt61-93
ncnn is a pure inference framework; it does not train models. Pre-trained models from external frameworks are converted to ncnn's .param/.bin format and executed using the registered layer set. Any model expressible with the registered operations is supported.
| Category | Example Architectures |
|---|---|
| Image Classification | VGG, AlexNet, GoogLeNet, ResNet, DenseNet, SENet, MobileNetV1/V2/V3, SqueezeNet, ShuffleNetV1/V2, MNasNet, EfficientNet |
| Object Detection | YOLOv2–v8, YOLOX, NanoDet, MobileNet-SSD, VGG-SSD, Faster-RCNN, R-FCN |
| Face Detection | MTCNN, RetinaFace, SCRFD |
| Segmentation | FCN, PSPNet, UNet, YOLACT |
| Pose Estimation | SimplePose |
| Speech / Audio | Whisper (via examples/) |
| Transformer Inference | Models using MultiHeadAttention, SDPA, RMSNorm, RotaryEmbed layers |
The complete operation set is defined by ncnn_add_layer() calls in src/CMakeLists.txt, registering over 100 layer types including Convolution, ConvolutionDepthWise, Gemm, MultiHeadAttention, SDPA, RotaryEmbed, LSTM, GRU, Spectrogram, and InverseSpectrogram.
Sources: README.md498-555 src/CMakeLists.txt66-176
System Architecture: Model-to-Execution Pipeline
External models are converted to ncnn's native .param/.bin format, optionally optimized or quantized, then loaded by ncnn::Net. The ncnn::Extractor drives forward passes through the layer DAG, dispatching each ncnn::Layer to CPU or Vulkan GPU implementations.
Sources: src/net.h27-164 src/layer.h20-135 src/mat.h1-100 src/CMakeLists.txt17-45
ncnn uses a two-file format for model storage:
.param or .param.bin): Network structure definition including layer types, dimensions, and connections. Plain text .param files are human-readable; binary .param.bin files are more compact..bin): Raw weight data stored as floating-point or quantized values with 32-bit alignment.The ncnn::Net class loads these files through multiple interfaces:
Model Loading Architecture
The DataReader abstraction enables unified loading from files, memory, or platform-specific sources. The Net::load_param() methods parse network structure, while Net::load_model() loads weights.
Sources: src/net.h58-125 docs/how-to-use-and-FAQ/ncnn-load-model.md1-9
The inference runtime consists of ncnn::Net (network container) and ncnn::Extractor (inference session):
Runtime Execution Flow
ncnn::Net stores the network structure and creates ncnn::Extractor instances for concurrent inference sessions. Each Extractor maintains its own blob storage and can recycle memory when lightmode is enabled. The ncnn::Option object controls threading, precision modes, and allocator selection.
Sources: src/net.h27-164 src/net.h166-233 src/option.h1-100
ncnn implements neural network operations through a layer abstraction with automatic registration:
Layer System Architecture
The CMake macro ncnn_add_layer() automatically generates layer registration code during build. For each layer, architecture-specific variants are compiled with appropriate ISA flags and dispatched at runtime based on CPU features.
Sources: src/layer.h20-155 src/layer.cpp1-200 cmake/ncnn_add_layer.cmake82-150 src/CMakeLists.txt62-176
ncnn provides custom allocators to minimize allocation overhead during inference:
Memory Allocator System
PoolAllocator maintains multiple size classes with free lists to avoid repeated malloc/free calls. VkBlobAllocator manages GPU device memory, while VkStagingAllocator handles CPU-visible staging buffers for data transfer. The ncnn::Option object allows per-network or per-extractor allocator configuration.
Sources: src/allocator.h1-80 src/option.h1-100 src/c_api.cpp47-139
ncnn uses ncnn::Mat for CPU tensors and ncnn::VkMat for GPU buffers:
Tensor Data Structures
ncnn::Mat stores CPU tensors with support for SIMD packing (1/4/8 elements per pack). The elemsize field includes packing: for elempack=4 FP32, elemsize=16 bytes. VkMat wraps Vulkan buffers for GPU computation, while VkImageMat uses image storage for texture sampling operations.
Sources: src/mat.h1-200 src/c_api.h79-140 python/tests/test_mat.py10-50
The CPU backend provides architecture-specific optimized implementations:
CPU Backend Architecture
The cpu.cpp module detects available CPU features at initialization. Layer creators are generated for each ISA variant during build, and runtime dispatch selects the most optimized implementation. For convolution, this includes im2col+GEMM, Winograd transforms, and direct convolution strategies.
Sources: src/cpu.cpp1-300 src/cpu.h42-100 cmake/ncnn_add_layer.cmake1-100
The Vulkan backend uses compute shaders for GPU acceleration:
Vulkan GPU Backend Architecture
VulkanDevice encapsulates device initialization and queue management. VkCompute records command buffers that upload input data, bind pipelines, dispatch compute shaders, and download results. Shaders are compiled from GLSL to SPIR-V at build time and embedded in the binary. Each Vulkan layer implements create_pipeline() to set up its compute pipeline and forward() to record commands.
Sources: src/gpu.h1-100 src/command.h1-150 src/pipeline.h1-100 src/layer/vulkan/convolution_vulkan.cpp1-100
The primary C++ API provides the ncnn::Net, ncnn::Extractor, and ncnn::Mat classes documented in C++ API Reference.
Sources: src/net.h27-233
The C API wraps C++ classes behind opaque pointers for C compatibility:
C API Design
The C API defined in src/c_api.h provides a procedural interface to the C++ classes. All C++ objects are accessed through opaque typedef pointers (e.g., ncnn_net_t), and functions follow the naming pattern ncnn_<class>_<method>(). This enables integration with C codebases and foreign function interfaces.
Sources: src/c_api.h1-250 src/c_api.cpp1-500 examples/squeezenet_c_api.cpp1-100
Python bindings use pybind11 to expose ncnn classes:
Python Bindings Architecture
The pybind11-based bindings provide Pythonic access to ncnn functionality with automatic type conversion between numpy.ndarray and ncnn::Mat. Memory is managed through Python reference counting with proper lifetime handling.
Sources: python/src/main.cpp1-200 python/src/pybind11_mat.h1-100 python/tests/test_mat.py1-50
The ncnn::Option class controls runtime behavior:
| Field | Type | Default | Description |
|---|---|---|---|
num_threads | int | Physical CPU count | OpenMP thread count |
blob_allocator | Allocator* | NULL | Intermediate blob allocator |
workspace_allocator | Allocator* | NULL | Temporary workspace allocator |
use_vulkan_compute | bool | false | Enable Vulkan GPU backend |
use_packing_layout | bool | true | Enable SIMD packing (4/8-way) |
use_fp16_packed | bool | true | Enable FP16 packed computation |
use_fp16_storage | bool | true | Enable FP16 weight storage |
use_fp16_arithmetic | bool | false | Enable FP16 arithmetic (ARM/Vulkan) |
use_int8_storage | bool | true | Enable INT8 weight storage |
use_int8_arithmetic | bool | false | Enable INT8 arithmetic |
use_bf16_storage | bool | true | Enable BF16 weight storage |
use_winograd_convolution | bool | true | Enable Winograd transform for 3x3 conv |
use_sgemm_convolution | bool | true | Enable im2col+GEMM convolution |
The Option object is passed to Net::load_param(), Layer::create_pipeline(), and Layer::forward() to control behavior at load time, pipeline creation, and inference.
Sources: src/option.h1-100 src/c_api.cpp141-350 python/tests/test_option.py1-80
ncnn supports the following platform families:
Mobile: Android (ARMv7, ARM64, x86, x86_64), iOS (ARM64, Simulator), HarmonyOS
Desktop: Linux (x86_64, ARM64, RISC-V, LoongArch, PowerPC), Windows (x86, x86_64, ARM64, XP), macOS (x86_64, ARM64)
Embedded: Raspberry Pi, NVIDIA Jetson, AllWinner D1, Loongson 2K1000
Web: WebAssembly (Emscripten)
Platform-specific details including toolchain configuration, cross-compilation instructions, and build options are documented in Platform Support and Build System.
Sources: README.md65-490 docs/how-to-build/how-to-build.md1-800
ncnn uses CMake with custom macros for layer registration and shader compilation:
Key CMake Options:
NCNN_VULKAN: Enable Vulkan GPU supportNCNN_OPENMP: Enable OpenMP multi-threadingNCNN_INT8: Enable INT8 quantization supportNCNN_PIXEL: Enable pixel format conversion utilitiesNCNN_BUILD_TOOLS: Build model conversion toolsNCNN_BUILD_EXAMPLES: Build example applicationsNCNN_PYTHON: Build Python bindingsNCNN_SHARED_LIB: Build shared library instead of staticThe build system automatically detects CPU features and generates architecture-specific layer variants. GLSL compute shaders are compiled to SPIR-V and embedded into the binary during build.
Sources: CMakeLists.txt1-200 src/CMakeLists.txt1-200 cmake/ncnn_add_layer.cmake1-100
The table below maps major wiki sections to the systems and source files they document.
| Section | Title | Key Code Entities |
|---|---|---|
| 1.1 | Platform Support and Build System | CMakeLists.txt, toolchain files, CMake options |
| 1.2 | Getting Started Guide | Net::load_param, Net::load_model, Extractor::extract, ncnnConfig.cmake |
| 2 | Core Runtime Architecture | Net, Layer, Mat, Allocator, Option |
| 2.1 | Network Loading and Inference Pipeline | Net::load_param, Net::load_model, ModelBin, DataReader, forward_layer |
| 2.2 | Layer System and Registry | Layer, layer_registry, ncnn_add_layer macro, layer_registry.h |
| 2.3 | Mat and Tensor Data Structures | Mat, VkMat, VkImageMat, elempack, cstep, elemsize |
| 2.4 | Memory Allocators | PoolAllocator, UnlockedPoolAllocator, VkAllocator, VkBlobAllocator |
| 2.5 | Option and Configuration System | Option struct fields, precision flags, allocator hooks |
| 3 | GPU and Vulkan System | VulkanDevice, VkCompute, Pipeline, SPIR-V shaders |
| 3.1 | Vulkan Instance and Device Management | create_gpu_instance, GpuInfo, get_default_gpu_index |
| 3.2 | Command Recording and Pipeline Execution | VkCompute, record_pipeline, submit_and_wait, Pipeline |
| 3.3 | GPU Memory and Data Transfer | VkBufferMemory, VkStagingAllocator, VkTransfer, record_upload |
| 3.4 | Vulkan Layer Implementations | Convolution_vulkan, create_pipeline, upload_model, pack4/pack8 variants |
| 4 | CPU Layer Implementations | Per-operation source files in src/layer/ and arch subdirectories |
| 4.1 | Convolution and Deconvolution Layers | Convolution, Convolution_arm, Convolution_x86, Winograd, im2col-GEMM |
| 4.2 | Depthwise Convolution Layers | ConvolutionDepthWise, ConvolutionDepthWise_arm, ConvolutionDepthWise_x86 |
| 4.3 | InnerProduct and GEMM | InnerProduct, Gemm, Gemm_arm, Gemm_x86, tiled GEMM |
| 4.4 | Pooling, Normalization, and Activation | Pooling, BinaryOp, UnaryOp, GELU |
| 4.5 | Reshape, Crop, and Interpolation | Reshape, Crop, Interp, MemoryData, expression blobs |
| 4.6 | Image and Pixel Processing | Mat::from_pixels, Mat::to_pixels, Cast, YUV conversion, warpaffine |
| 5 | Platform-Specific Optimizations | SIMD packing, intrinsics, reduced-precision paths |
| 5.1 | Runtime CPU Detection and Dispatch | cpu.h, cpu_support_arm_neon, NCNN_AVX2, runtime dispatch |
| 5.2 | ARM NEON Optimizations | pack4/pack8, neon_mathfun.h, FP16/BF16 kernels |
| 5.3 | x86 SIMD Optimizations | x86_usability.h, avx_mathfun.h, AVX/AVX512 kernels |
| 5.4 | INT8 Quantization and Precision Modes | Quantize, Dequantize, Requantize, float2int8, scale_data |
| 5.5 | RISC-V and Other Architecture Backends | riscv_usability.h, MIPS MSA, LoongArch LSX |
| 6 | Model Conversion and Optimization | pnnx, caffe2ncnn, onnx2ncnn, .param format |
| 6.1 | Model Conversion Overview | Conversion paths, .param/.bin format, ncnnoptimize workflow |
| 6.2 | PNNX PyTorch Converter Architecture | PNNX build, main.cpp, pass_level0–pass_level5, Graph IR |
| 6.3 | PNNX Intermediate Representation | Graph, Operator, Operand, Parameter, Attribute, save_ncnn |
| 6.4 | ONNX, MXNet, and Caffe Converters | onnx2ncnn, mxnet2ncnn, caffe2ncnn, ModelWriter, ParamDict |
| 6.5 | Model Optimization Tools | ncnnoptimize fusion passes, FP16 conversion, ncnn2mem |
| 7 | APIs and Integration | C++, C, and Python API reference |
| 7.1 | C++ API Reference | Net, Extractor, Mat, find_package(ncnn), ncnnConfig.cmake |
| 7.2 | C API and Python Bindings | c_api.h, ncnn_net_t, ncnn_mat_t, pybind11 module |
| 8 | Tools and Utilities | benchncnn, quantization tools, example applications |
| 8.1 | Benchmarking System | benchncnn, ncnn_add_param, timing methodology |
| 8.2 | Post-Training Quantization Tools | ncnn2table, QuantNet, KL divergence calibration, ncnn2int8 |
| 8.3 | Example Applications | YOLOv8, Whisper, ArcFace, NMS, preprocessing patterns |
| 9 | Development and Testing | Testing approach, CI/CD, release procedures |
| 9.1 | Testing Framework | testutil.h, test_layer, RandomMat, CompareMat |
| 9.2 | CI/CD Pipeline | GitHub Actions workflows, SwiftShader Vulkan, cross-platform test matrix |
| 9.3 | Release Process | cibuildwheel, Python wheel builds, PyPI publish workflow |
Version information is accessible at compile-time and runtime:
NCNN_VERSION_STRING macro (format: "1.0.YYYYMMDD"), NCNN_VERSION_NUMBER macroncnn_version() returns string, ncnn_version_number() returns integerncnn.__version__ attributeVersion format uses YYYYMMDD date stamps (e.g., "20260113") as the patch version.
Sources: src/platform.h.in62-63 src/c_api.cpp37-45
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.