This page provides quick-start instructions for building NCNN and running basic neural network inference. It covers the essential workflow from obtaining the library to executing your first model. For detailed platform-specific build configurations and toolchain options, see Platform Support and Build System. For in-depth information about the runtime architecture, see Core Runtime Architecture.
NCNN inference follows a simple three-phase pattern: load a model, create an inference session, and execute forward passes. The following diagram illustrates the fundamental data flow:
Sources: src/net.h27-164 src/net.cpp tests/test_squeezenet.cpp14-125
Clone the repository with submodules:
Required dependencies:
On Ubuntu/Debian, install dependencies:
On Redhat/Centos:
Sources: docs/how-to-build/how-to-build.md1-8 docs/how-to-build/how-to-build.md42-62 README.md72
Common CMake options:
| Option | Default | Description |
|---|---|---|
NCNN_VULKAN | OFF | Enable Vulkan GPU support |
NCNN_OPENMP | ON | Enable OpenMP multi-threading |
NCNN_BUILD_EXAMPLES | ON | Build example programs (squeezenet, yolov5, etc) |
NCNN_BUILD_TOOLS | ON | Build conversion tools (caffe2ncnn, onnx2ncnn) |
NCNN_BUILD_BENCHMARK | ON | Build benchncnn performance tool |
NCNN_SHARED_LIB | OFF | Build shared library instead of static |
NCNN_PIXEL | ON | Enable image preprocessing functions |
NCNN_INT8 | ON | Enable INT8 quantization support |
NCNN_SIMPLEOCV | OFF | Use built-in OpenCV replacement |
For GPU builds, add -DNCNN_VULKAN=ON. For Python bindings, add -DNCNN_PYTHON=ON.
Sources: docs/how-to-build/how-to-build.md40-89 CMakeLists.txt61-125
Install Vulkan SDK first:
Build with Vulkan enabled:
The Vulkan driver is required at runtime. For AMD/Intel GPUs, Mesa provides Vulkan drivers. For NVIDIA GPUs, install the proprietary driver. See GPU and Vulkan System for device selection and configuration.
Sources: docs/how-to-build/how-to-build.md51-65 docs/how-to-build/how-to-build.md40-89
Download and install Visual Studio Community 2017 or later. Start the x64 Native Tools Command Prompt:
For protobuf support (required for conversion tools), first build protobuf then link:
Download MinGW-w64 from winlibs or w64devkit add bin to PATH:
Sources: docs/how-to-build/how-to-build.md180-210 docs/how-to-build/how-to-build.md253-267
NCNN uses two files to represent a neural network:
.param: Text file containing layer types, parameters, and topology (human-readable).param.bin: Binary equivalent of .param (smaller, faster to load).bin: Binary file containing all network weights (32-bit aligned memory)Models from other frameworks must be converted to NCNN format using conversion tools. See Model Conversion and Optimization for details on converting PyTorch, ONNX, Caffe, or TensorFlow models.
Sources: src/net.h60-109 docs/how-to-use-and-FAQ/ncnn-load-model.md1-9
The following diagram shows the relationship between core runtime classes:
Sources: src/net.h27-164 src/net.h167-245 src/mat.h src/option.h
Compile and link:
Key API methods:
| Method | Description | Header Location |
|---|---|---|
Net::load_param(const char*) | Load network structure from .param file | src/net.h72-76 |
Net::load_model(const char*) | Load weight data from .bin file | src/net.h92-96 |
Net::create_extractor() | Create inference session (thread-safe) | src/net.h131-133 |
Extractor::input(const char*, const Mat&) | Set input blob by name | src/net.h203-205 |
Extractor::extract(const char*, Mat&) | Get output blob by name | src/net.h209-211 |
Sources: examples/squeezenet.cpp46-108 src/net.h27-164 src/net.h167-245
NCNN supports zero-copy loading from memory for embedded deployments:
The ncnn2mem tool converts models to C++ header files for static embedding:
This generates:
squeezenet.id.h - Enum of blob names (e.g., squeezenet_v1_1_param_id::BLOB_data)squeezenet.mem.h - Byte arrays squeezenet_v1_1_param_bin[] and squeezenet_v1_1_bin[]Include in your program:
See Model Optimization Tools for additional model manipulation utilities.
Sources: src/net.h78-86 src/net.h98-108 tools/ncnn2mem.cpp1-150
The ncnn::Option class controls runtime behavior:
Common configurations:
| Scenario | Settings |
|---|---|
| CPU inference | num_threads=4, use_vulkan_compute=false |
| GPU inference | use_vulkan_compute=true, lightmode=false |
| Mobile (power-efficient) | num_threads=2, lightmode=true, use_fp16_storage=true |
| INT8 quantization | use_int8_storage=true, use_int8_arithmetic=true |
| Memory-constrained | lightmode=true, custom allocators with pooling |
Option configuration in Net:
Option override in Extractor:
Sources: src/option.h src/net.h37 src/net.h184-198
NCNN provides a C89-compatible API for integration with non-C++ languages:
Key C API types:
ncnn_net_t - Opaque pointer to ncnn::Net (src/c_api.h166)ncnn_extractor_t - Opaque pointer to ncnn::Extractor (src/c_api.h199)ncnn_mat_t - Opaque pointer to ncnn::Mat (src/c_api.h80)ncnn_option_t - Opaque pointer to ncnn::Option (src/c_api.h36)All C API functions follow the naming pattern ncnn_<class>_<method>. See C API and Python Bindings for complete API reference.
Sources: src/c_api.h1-600 src/c_api.cpp1-1500 examples/squeezenet_c_api.cpp1-100 tests/test_c_api.cpp1-200
NCNN provides Python bindings via pybind11 for seamless integration with Python ML pipelines.
Installation:
Basic usage example:
Image preprocessing utilities:
The Python API mirrors the C++ interface with Pythonic naming (use_vulkan_compute instead of use_vulkan_compute). For complete API documentation, see C API and Python Bindings.
Sources: python/src/ncnn_export.cpp1-200 python/README.md1-50
After building NCNN, verify the installation by running included example applications. The examples demonstrate various network types and preprocessing techniques.
Expected output:
[0 AMD RADV FIJI (LLVM 10.0.1)] queueC=1[4] queueG=0[1] queueT=0[1]
[0 AMD RADV FIJI (LLVM 10.0.1)] bugsbn1=0 buglbia=0 bugcopc=0 bugihfa=0
[0 AMD RADV FIJI (LLVM 10.0.1)] fp16p=1 fp16s=1 fp16a=0 int8s=1 int8a=1
532 = 0.163452 (n02123045 tabby, tabby cat)
920 = 0.093140 (n03584254 iPod)
716 = 0.061584 (n02971356 carton)
The output shows GPU device capabilities and top-3 ImageNet class predictions with confidence scores.
Other example applications demonstrate different use cases:
| Example | Purpose | Input | Output |
|---|---|---|---|
squeezenet | Image classification | Image file | Top-K class predictions |
yolov5 | Object detection | Image file | Bounding boxes with labels |
arcface | Face recognition | Image file | Face embeddings (512-dim vector) |
whisper | Speech recognition | Audio file | Transcribed text |
ppocrv5 | OCR text recognition | Image file | Detected text regions and content |
Sources: examples/squeezenet.cpp1-150 examples/arcface.cpp1-50 examples/whisper.cpp1-100 examples/ppocrv5.cpp1-150 docs/how-to-build/how-to-build.md144-156
Use benchncnn to measure inference performance across different models:
Arguments:
10 - Number of loops (more loops = more accurate timing)4 - Number of CPU threads0 - Powersave mode (0=disabled, 1=little cores only, 2=big cores only)0 - GPU device index (0=first GPU, -1=CPU only)Sample output:
num_threads = 4
powersave = 0
gpu_device = 0
cooling_down = 1
squeezenet min = 4.68 max = 4.99 avg = 4.85
squeezenet_int8 min = 38.52 max = 66.90 avg = 48.52
mobilenet min = 7.12 max = 7.45 avg = 7.23
mobilenet_int8 min = 51.68 max = 84.12 avg = 62.15
Times are in milliseconds. See Benchmarking System for performance analysis and optimization techniques.
Sources: benchmark/benchncnn.cpp1-500 docs/how-to-build/how-to-build.md158-176
To learn more about specific topics:
Common workflows:
Sources: README.md560-576 docs/how-to-build/how-to-build.md1-1000
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.