This page provides an overview of the standalone executable tools shipped alongside the ncnn library. These tools cover benchmarking, post-training quantization, model optimization, and application development. They are separate binaries that consume or produce ncnn model files (.param / .bin).
For detailed coverage of specific tool groups, see:
The following table summarizes all tools built from the ncnn repository, their source locations, and primary functions.
| Tool | Source Location | Purpose |
|---|---|---|
benchncnn | benchmark/benchncnn.cpp | Measure inference latency across CPU and Vulkan GPU |
ncnn2table | tools/quantize/ncnn2table.cpp | Generate INT8 calibration table from images or NPY files |
ncnn2int8 | tools/quantize/ncnn2int8.cpp | Apply calibration table to produce INT8 model |
ncnnoptimize | tools/ncnnoptimize.cpp | Layer fusion, FP16 weight conversion, graph pruning |
ncnn2mem | tools/ncnn2mem.cpp | Convert .param / .bin files into C byte arrays |
ncnnmerge | tools/ncnnmerge.cpp | Merge multiple ncnn models into one |
caffe2ncnn | tools/caffe/caffe2ncnn.cpp | Convert Caffe models to ncnn format |
onnx2ncnn | tools/onnx/onnx2ncnn.cpp | Convert ONNX models to ncnn format |
mxnet2ncnn | tools/mxnet/mxnet2ncnn.cpp | Convert MXNet models to ncnn format |
RankCards | benchmark/RankCards/main.cpp | Parse benchmark/README.md and rank hardware by inference speed |
Diagram: Tools in the ncnn deployment workflow
Sources: tools/CMakeLists.txt1-46 benchmark/CMakeLists.txt1-69 tools/quantize/CMakeLists.txt1-43 benchmark/README.md1-30
benchncnn is a command-line tool for measuring end-to-end inference latency of ncnn models. It does not load binary weight files; instead, it uses a DataReaderFromEmpty that fills weight buffers with zeros, making it possible to measure compute throughput without distributing large model files.
Command-line signature:
benchncnn [loop_count] [num_threads] [powersave] [gpu_device] [cooling_down] [(key=value)...]
param=model.param
shape=[w,h,c],...
Parameter reference:
| Parameter | Values | Default |
|---|---|---|
loop_count | 1–N | 4 |
num_threads | 1–N | get_physical_big_cpu_count() |
powersave | 0=all, 1=little, 2=big | 2 |
gpu_device | -1=CPU, 0=GPU0, … | -1 |
cooling_down | 0=off, 1=on | 1 |
param | path to .param file | (built-in models) |
shape | [w,h,c] per input | — |
When no param argument is supplied, benchncnn runs a fixed set of built-in models whose .param files have been converted to C byte arrays at build time via the ncnn_add_param CMake macro.
Diagram: benchncnn internal execution flow (code entities)
Sources: benchmark/benchncnn.cpp24-168 benchmark/benchncnn.cpp246-467
The benchmark/CMakeLists.txt iterates over a list of .param filenames and calls ncnn_add_param() for each one. The ncnn_add_param macro (in cmake/ncnn_add_param.cmake) converts each .param file's text content to a hex byte array and writes a header file under benchmark/param/. The generated header benchncnn_param_data.h is then #include-d by benchncnn.cpp.
The built-in model set covers 34 networks including:
| Classification | Detection | INT8 variants |
|---|---|---|
| squeezenet, mobilenet, mobilenet_v2/v3 | squeezenet_ssd, mobilenet_ssd | squeezenet_int8, mobilenet_int8 |
| shufflenet, shufflenet_v2, mnasnet | mobilenet_yolo, mobilenetv2_yolov3 | googlenet_int8, resnet18_int8 |
| efficientnet_b0, efficientnetv2_b0 | yolov4-tiny, nanodet_m | resnet50_int8, vgg16_int8 |
| googlenet, resnet18, resnet50, vgg16 | yolo-fastest-1.1, yolo-fastestv2 | — |
| vision_transformer, FastestDet | blazeface, regnety_400m | — |
Sources: benchmark/CMakeLists.txt10-51 cmake/ncnn_add_param.cmake1-37 cmake/ncnn_generate_param_header.cmake1-23 benchmark/benchncnn_param_data.h.in1-6
benchmark/RankCards/ contains a companion utility that parses the timing results in benchmark/README.md and produces a sorted hardware ranking table. It reads boards identified by ### section headers, extracts avg = timing values for each model, selects the best run per board, and computes a ratio against a reference board (currently Raspberry Pi 5 Cortex-A76).
Key types: TBoard, TModelSet, TModel (defined in benchmark/RankCards/Rcards.h). The ranking is written to a new README.md.
Sources: benchmark/RankCards/README.md1-93 benchmark/RankCards/main.cpp1-173 benchmark/RankCards/Rcards.h37-171
The INT8 quantization workflow requires two tools used sequentially: ncnn2table and ncnn2int8.
End-to-end workflow:
ncnnoptimize model.param model.bin model-opt.param model-opt.bin 0
ncnn2table model-opt.param model-opt.bin imagelist.txt model.table \
mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] \
pixel=BGR thread=8 method=kl
ncnn2int8 model-opt.param model-opt.bin model-int8.param model-int8.bin model.table
Diagram: ncnn2table class structure and data flow (code entities)
Sources: tools/quantize/ncnn2table.cpp42-139 tools/quantize/ncnn2table.cpp197-298 tools/quantize/CMakeLists.txt1-43
ncnn2table accepts two input modes controlled by the type parameter:
type value | Input format | Reading function |
|---|---|---|
0 (default) | Image files listed in a text file | read_and_resize_image() |
1 | NPY files listed in a text file | read_npy() |
For image inputs, mean subtraction and normalization are applied via Mat::substract_mean_normalize() before inference. For models with multiple inputs, comma-separated list files and parameter sets are supported.
Three algorithms are available via the method= argument:
| Method | Function | Description |
|---|---|---|
kl | QuantNet::quantize_KL() | KL divergence over a 2048-bin histogram to find optimal clipping threshold |
aciq | QuantNet::quantize_ACIQ() | Analytically optimal Gaussian clipping (ACIQ paper) |
eq | QuantNet::quantize_EQ() | Equalization-based method |
The KL method performs two passes over the calibration dataset: one to compute the global absmax per blob, and a second to build and normalize the activation histogram. It then sweeps candidate thresholds to minimize KL divergence between the clipped and quantized distributions.
Sources: tools/quantize/ncnn2table.cpp300-571 docs/how-to-use-and-FAQ/quantized-int8-inference.md1-125
A layer can be excluded from INT8 quantization by commenting out its weight scale line in the .table file. The runtime then falls back to FP32 for that layer automatically.
Sources: docs/how-to-use-and-FAQ/quantized-int8-inference.md115-125
ncnnoptimize reads a .param / .bin pair, applies a series of graph-level passes, and writes an optimized output. It is a required preprocessing step before running ncnn2table for non-PNNX models.
Key passes include:
fuse_convolution_batchnorm)Dropout layers that are inactive at inference timeSource: tools/CMakeLists.txt31-35
For full documentation of ncnnoptimize passes, see Model Optimization Tools.
ncnn2mem converts a .param + .bin model pair into C source files that embed the model as static byte arrays. This eliminates file system I/O at runtime, which is useful for bare-metal or embedded targets.
Usage:
ncnn2mem model.param model.bin model.id.h model.mem.cpp
The generated .id.h defines integer constants for blob and layer names. The .mem.cpp file contains const unsigned char arrays for the param and bin data. The model is then loaded via Net::load_param_mem() and a DataReaderFromMemory.
Source: tools/CMakeLists.txt25-29
For more detail, see Model Optimization Tools.
ncnnmerge is a utility for combining multiple .param / .bin file pairs into a single merged model. It is built from tools/ncnnmerge.cpp and installed alongside the other tools.
Source: tools/CMakeLists.txt37-44
The model conversion tools (caffe2ncnn, onnx2ncnn, mxnet2ncnn) are built under tools/caffe/, tools/onnx/, and tools/mxnet/ respectively. They depend on Protocol Buffers (detected via CMake's find_package(protobuf)) and are skipped with a warning if Protobuf is not found.
| Tool | Requires | CMake file |
|---|---|---|
caffe2ncnn | Protobuf | tools/caffe/CMakeLists.txt |
onnx2ncnn | Protobuf | tools/onnx/CMakeLists.txt |
mxnet2ncnn | none | tools/mxnet/CMakeLists.txt |
For detailed documentation of these converters, see ONNX, MXNet, and Caffe Converters.
The recommended conversion path for PyTorch and ONNX models is the PNNX tool; see PNNX PyTorch Converter Architecture.
Sources: tools/caffe/CMakeLists.txt1-39 tools/onnx/CMakeLists.txt1-39 tools/mxnet/CMakeLists.txt1-6
All tools under tools/ are included by adding add_subdirectory(tools) to the root CMakeLists.txt. The quantization tools are gated on the NCNN_INT8 CMake option; if disabled, a warning is emitted and ncnn2table / ncnn2int8 are not built.
if(NCNN_INT8)
add_subdirectory(quantize)
else()
message(WARNING "NCNN_INT8 disabled, quantize tools won't be built")
endif()
The benchmark/ directory must be explicitly uncommented in the root CMakeLists.txt or added with -DNCNN_BUILD_BENCHMARKS=ON depending on the build configuration.
Diagram: CMake target dependencies for tools
Sources: tools/CMakeLists.txt1-46 benchmark/CMakeLists.txt1-69 tools/quantize/CMakeLists.txt1-43 tools/caffe/CMakeLists.txt1-39 tools/onnx/CMakeLists.txt1-39 tools/mxnet/CMakeLists.txt1-6
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.