Tools and Utilities

Relevant source files

This page provides an overview of the standalone executable tools shipped alongside the ncnn library. These tools cover benchmarking, post-training quantization, model optimization, and application development. They are separate binaries that consume or produce ncnn model files (.param / .bin).

For detailed coverage of specific tool groups, see:

Benchmarking system details → Benchmarking System
Quantization tool internals → Post-Training Quantization Tools
Example applications → Example Applications
Model conversion tools (caffe2ncnn, onnx2ncnn, PNNX) → Model Conversion and Optimization

Tool Inventory

The following table summarizes all tools built from the ncnn repository, their source locations, and primary functions.

Tool	Source Location	Purpose
`benchncnn`	`benchmark/benchncnn.cpp`	Measure inference latency across CPU and Vulkan GPU
`ncnn2table`	`tools/quantize/ncnn2table.cpp`	Generate INT8 calibration table from images or NPY files
`ncnn2int8`	`tools/quantize/ncnn2int8.cpp`	Apply calibration table to produce INT8 model
`ncnnoptimize`	`tools/ncnnoptimize.cpp`	Layer fusion, FP16 weight conversion, graph pruning
`ncnn2mem`	`tools/ncnn2mem.cpp`	Convert `.param` / `.bin` files into C byte arrays
`ncnnmerge`	`tools/ncnnmerge.cpp`	Merge multiple ncnn models into one
`caffe2ncnn`	`tools/caffe/caffe2ncnn.cpp`	Convert Caffe models to ncnn format
`onnx2ncnn`	`tools/onnx/onnx2ncnn.cpp`	Convert ONNX models to ncnn format
`mxnet2ncnn`	`tools/mxnet/mxnet2ncnn.cpp`	Convert MXNet models to ncnn format
`RankCards`	`benchmark/RankCards/main.cpp`	Parse `benchmark/README.md` and rank hardware by inference speed

Tool Relationships in the Deployment Pipeline

Diagram: Tools in the ncnn deployment workflow

Sources: tools/CMakeLists.txt1-46 benchmark/CMakeLists.txt1-69 tools/quantize/CMakeLists.txt1-43 benchmark/README.md1-30

benchncnn

benchncnn is a command-line tool for measuring end-to-end inference latency of ncnn models. It does not load binary weight files; instead, it uses a DataReaderFromEmpty that fills weight buffers with zeros, making it possible to measure compute throughput without distributing large model files.

Command-line signature:

benchncnn [loop_count] [num_threads] [powersave] [gpu_device] [cooling_down] [(key=value)...]
  param=model.param
  shape=[w,h,c],...

Parameter reference:

Parameter	Values	Default
`loop_count`	1–N	4
`num_threads`	1–N	`get_physical_big_cpu_count()`
`powersave`	0=all, 1=little, 2=big	2
`gpu_device`	-1=CPU, 0=GPU0, …	-1
`cooling_down`	0=off, 1=on	1
`param`	path to `.param` file	(built-in models)
`shape`	`[w,h,c]` per input	—

When no param argument is supplied, benchncnn runs a fixed set of built-in models whose .param files have been converted to C byte arrays at build time via the ncnn_add_param CMake macro.

Diagram: benchncnn internal execution flow (code entities)

Sources: benchmark/benchncnn.cpp24-168 benchmark/benchncnn.cpp246-467

Built-in model embedding

The benchmark/CMakeLists.txt iterates over a list of .param filenames and calls ncnn_add_param() for each one. The ncnn_add_param macro (in cmake/ncnn_add_param.cmake) converts each .param file's text content to a hex byte array and writes a header file under benchmark/param/. The generated header benchncnn_param_data.h is then #include-d by benchncnn.cpp.

The built-in model set covers 34 networks including:

Classification	Detection	INT8 variants
squeezenet, mobilenet, mobilenet_v2/v3	squeezenet_ssd, mobilenet_ssd	squeezenet_int8, mobilenet_int8
shufflenet, shufflenet_v2, mnasnet	mobilenet_yolo, mobilenetv2_yolov3	googlenet_int8, resnet18_int8
efficientnet_b0, efficientnetv2_b0	yolov4-tiny, nanodet_m	resnet50_int8, vgg16_int8
googlenet, resnet18, resnet50, vgg16	yolo-fastest-1.1, yolo-fastestv2	—
vision_transformer, FastestDet	blazeface, regnety_400m	—

Sources: benchmark/CMakeLists.txt10-51 cmake/ncnn_add_param.cmake1-37 cmake/ncnn_generate_param_header.cmake1-23 benchmark/benchncnn_param_data.h.in1-6

RankCards

benchmark/RankCards/ contains a companion utility that parses the timing results in benchmark/README.md and produces a sorted hardware ranking table. It reads boards identified by ### section headers, extracts avg = timing values for each model, selects the best run per board, and computes a ratio against a reference board (currently Raspberry Pi 5 Cortex-A76).

Key types: TBoard, TModelSet, TModel (defined in benchmark/RankCards/Rcards.h). The ranking is written to a new README.md.

Sources: benchmark/RankCards/README.md1-93 benchmark/RankCards/main.cpp1-173 benchmark/RankCards/Rcards.h37-171

Post-Training Quantization Tools

The INT8 quantization workflow requires two tools used sequentially: ncnn2table and ncnn2int8.

End-to-end workflow:

ncnnoptimize model.param model.bin model-opt.param model-opt.bin 0
ncnn2table model-opt.param model-opt.bin imagelist.txt model.table \
    mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] \
    pixel=BGR thread=8 method=kl
ncnn2int8 model-opt.param model-opt.bin model-int8.param model-int8.bin model.table

Diagram: ncnn2table class structure and data flow (code entities)

Sources: tools/quantize/ncnn2table.cpp42-139 tools/quantize/ncnn2table.cpp197-298 tools/quantize/CMakeLists.txt1-43

ncnn2table calibration inputs

ncnn2table accepts two input modes controlled by the type parameter:

`type` value	Input format	Reading function
`0` (default)	Image files listed in a text file	`read_and_resize_image()`
`1`	NPY files listed in a text file	`read_npy()`

For image inputs, mean subtraction and normalization are applied via Mat::substract_mean_normalize() before inference. For models with multiple inputs, comma-separated list files and parameter sets are supported.

Calibration algorithms

Three algorithms are available via the method= argument:

Method	Function	Description
`kl`	`QuantNet::quantize_KL()`	KL divergence over a 2048-bin histogram to find optimal clipping threshold
`aciq`	`QuantNet::quantize_ACIQ()`	Analytically optimal Gaussian clipping (ACIQ paper)
`eq`	`QuantNet::quantize_EQ()`	Equalization-based method

The KL method performs two passes over the calibration dataset: one to compute the global absmax per blob, and a second to build and normalize the activation histogram. It then sweeps candidate thresholds to minimize KL divergence between the clipped and quantized distributions.

Sources: tools/quantize/ncnn2table.cpp300-571 docs/how-to-use-and-FAQ/quantized-int8-inference.md1-125

Mixed-precision inference

A layer can be excluded from INT8 quantization by commenting out its weight scale line in the .table file. The runtime then falls back to FP32 for that layer automatically.

Sources: docs/how-to-use-and-FAQ/quantized-int8-inference.md115-125

ncnnoptimize

ncnnoptimize reads a .param / .bin pair, applies a series of graph-level passes, and writes an optimized output. It is a required preprocessing step before running ncnn2table for non-PNNX models.

Key passes include:

Layer fusion: fuse batch normalization into convolution weights (fuse_convolution_batchnorm)
Dropout elimination: remove Dropout layers that are inactive at inference time
FP16 weight conversion: halve the size of weight blobs for platforms that support FP16 storage
Graph cutting: remove dead layers that do not contribute to any output blob

Source: tools/CMakeLists.txt31-35

For full documentation of ncnnoptimize passes, see Model Optimization Tools.

ncnn2mem

ncnn2mem converts a .param + .bin model pair into C source files that embed the model as static byte arrays. This eliminates file system I/O at runtime, which is useful for bare-metal or embedded targets.

Usage:

ncnn2mem model.param model.bin model.id.h model.mem.cpp

The generated .id.h defines integer constants for blob and layer names. The .mem.cpp file contains const unsigned char arrays for the param and bin data. The model is then loaded via Net::load_param_mem() and a DataReaderFromMemory.

Source: tools/CMakeLists.txt25-29

For more detail, see Model Optimization Tools.

ncnnmerge

ncnnmerge is a utility for combining multiple .param / .bin file pairs into a single merged model. It is built from tools/ncnnmerge.cpp and installed alongside the other tools.

Source: tools/CMakeLists.txt37-44

Conversion Tools

The model conversion tools (caffe2ncnn, onnx2ncnn, mxnet2ncnn) are built under tools/caffe/, tools/onnx/, and tools/mxnet/ respectively. They depend on Protocol Buffers (detected via CMake's find_package(protobuf)) and are skipped with a warning if Protobuf is not found.

Tool	Requires	CMake file
`caffe2ncnn`	Protobuf	`tools/caffe/CMakeLists.txt`
`onnx2ncnn`	Protobuf	`tools/onnx/CMakeLists.txt`
`mxnet2ncnn`	none	`tools/mxnet/CMakeLists.txt`

For detailed documentation of these converters, see ONNX, MXNet, and Caffe Converters.

The recommended conversion path for PyTorch and ONNX models is the PNNX tool; see PNNX PyTorch Converter Architecture.

Sources: tools/caffe/CMakeLists.txt1-39 tools/onnx/CMakeLists.txt1-39 tools/mxnet/CMakeLists.txt1-6

Build System Integration

All tools under tools/ are included by adding add_subdirectory(tools) to the root CMakeLists.txt. The quantization tools are gated on the NCNN_INT8 CMake option; if disabled, a warning is emitted and ncnn2table / ncnn2int8 are not built.

if(NCNN_INT8)
    add_subdirectory(quantize)
else()
    message(WARNING "NCNN_INT8 disabled, quantize tools won't be built")
endif()

The benchmark/ directory must be explicitly uncommented in the root CMakeLists.txt or added with -DNCNN_BUILD_BENCHMARKS=ON depending on the build configuration.

Diagram: CMake target dependencies for tools

Sources: tools/CMakeLists.txt1-46 benchmark/CMakeLists.txt1-69 tools/quantize/CMakeLists.txt1-43 tools/caffe/CMakeLists.txt1-39 tools/onnx/CMakeLists.txt1-39 tools/mxnet/CMakeLists.txt1-6

Tools and Utilities

Relevant source files

For detailed coverage of specific tool groups, see:

Benchmarking system details → Benchmarking System
Quantization tool internals → Post-Training Quantization Tools
Example applications → Example Applications
Model conversion tools (caffe2ncnn, onnx2ncnn, PNNX) → Model Conversion and Optimization

Tool Inventory

The following table summarizes all tools built from the ncnn repository, their source locations, and primary functions.

Tool	Source Location	Purpose
`benchncnn`	`benchmark/benchncnn.cpp`	Measure inference latency across CPU and Vulkan GPU
`ncnn2table`	`tools/quantize/ncnn2table.cpp`	Generate INT8 calibration table from images or NPY files
`ncnn2int8`	`tools/quantize/ncnn2int8.cpp`	Apply calibration table to produce INT8 model
`ncnnoptimize`	`tools/ncnnoptimize.cpp`	Layer fusion, FP16 weight conversion, graph pruning
`ncnn2mem`	`tools/ncnn2mem.cpp`	Convert `.param` / `.bin` files into C byte arrays
`ncnnmerge`	`tools/ncnnmerge.cpp`	Merge multiple ncnn models into one
`caffe2ncnn`	`tools/caffe/caffe2ncnn.cpp`	Convert Caffe models to ncnn format
`onnx2ncnn`	`tools/onnx/onnx2ncnn.cpp`	Convert ONNX models to ncnn format
`mxnet2ncnn`	`tools/mxnet/mxnet2ncnn.cpp`	Convert MXNet models to ncnn format
`RankCards`	`benchmark/RankCards/main.cpp`	Parse `benchmark/README.md` and rank hardware by inference speed

Tool Relationships in the Deployment Pipeline

Diagram: Tools in the ncnn deployment workflow

Sources: tools/CMakeLists.txt1-46 benchmark/CMakeLists.txt1-69 tools/quantize/CMakeLists.txt1-43 benchmark/README.md1-30

benchncnn

Command-line signature:

benchncnn [loop_count] [num_threads] [powersave] [gpu_device] [cooling_down] [(key=value)...]
  param=model.param
  shape=[w,h,c],...

Parameter reference:

Parameter	Values	Default
`loop_count`	1–N	4
`num_threads`	1–N	`get_physical_big_cpu_count()`
`powersave`	0=all, 1=little, 2=big	2
`gpu_device`	-1=CPU, 0=GPU0, …	-1
`cooling_down`	0=off, 1=on	1
`param`	path to `.param` file	(built-in models)
`shape`	`[w,h,c]` per input	—

When no param argument is supplied, benchncnn runs a fixed set of built-in models whose .param files have been converted to C byte arrays at build time via the ncnn_add_param CMake macro.

Diagram: benchncnn internal execution flow (code entities)

Sources: benchmark/benchncnn.cpp24-168 benchmark/benchncnn.cpp246-467

Built-in model embedding

The built-in model set covers 34 networks including:

Classification	Detection	INT8 variants
squeezenet, mobilenet, mobilenet_v2/v3	squeezenet_ssd, mobilenet_ssd	squeezenet_int8, mobilenet_int8
shufflenet, shufflenet_v2, mnasnet	mobilenet_yolo, mobilenetv2_yolov3	googlenet_int8, resnet18_int8
efficientnet_b0, efficientnetv2_b0	yolov4-tiny, nanodet_m	resnet50_int8, vgg16_int8
googlenet, resnet18, resnet50, vgg16	yolo-fastest-1.1, yolo-fastestv2	—
vision_transformer, FastestDet	blazeface, regnety_400m	—

Sources: benchmark/CMakeLists.txt10-51 cmake/ncnn_add_param.cmake1-37 cmake/ncnn_generate_param_header.cmake1-23 benchmark/benchncnn_param_data.h.in1-6

RankCards

Key types: TBoard, TModelSet, TModel (defined in benchmark/RankCards/Rcards.h). The ranking is written to a new README.md.

Sources: benchmark/RankCards/README.md1-93 benchmark/RankCards/main.cpp1-173 benchmark/RankCards/Rcards.h37-171

Post-Training Quantization Tools

The INT8 quantization workflow requires two tools used sequentially: ncnn2table and ncnn2int8.

End-to-end workflow:

ncnnoptimize model.param model.bin model-opt.param model-opt.bin 0
ncnn2table model-opt.param model-opt.bin imagelist.txt model.table \
    mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] \
    pixel=BGR thread=8 method=kl
ncnn2int8 model-opt.param model-opt.bin model-int8.param model-int8.bin model.table

Diagram: ncnn2table class structure and data flow (code entities)

Sources: tools/quantize/ncnn2table.cpp42-139 tools/quantize/ncnn2table.cpp197-298 tools/quantize/CMakeLists.txt1-43

ncnn2table calibration inputs

ncnn2table accepts two input modes controlled by the type parameter:

`type` value	Input format	Reading function
`0` (default)	Image files listed in a text file	`read_and_resize_image()`
`1`	NPY files listed in a text file	`read_npy()`

Calibration algorithms

Three algorithms are available via the method= argument:

Method	Function	Description
`kl`	`QuantNet::quantize_KL()`	KL divergence over a 2048-bin histogram to find optimal clipping threshold
`aciq`	`QuantNet::quantize_ACIQ()`	Analytically optimal Gaussian clipping (ACIQ paper)
`eq`	`QuantNet::quantize_EQ()`	Equalization-based method

Sources: tools/quantize/ncnn2table.cpp300-571 docs/how-to-use-and-FAQ/quantized-int8-inference.md1-125

Mixed-precision inference

A layer can be excluded from INT8 quantization by commenting out its weight scale line in the .table file. The runtime then falls back to FP32 for that layer automatically.

Sources: docs/how-to-use-and-FAQ/quantized-int8-inference.md115-125

ncnnoptimize

Key passes include:

Layer fusion: fuse batch normalization into convolution weights (fuse_convolution_batchnorm)
Dropout elimination: remove Dropout layers that are inactive at inference time
FP16 weight conversion: halve the size of weight blobs for platforms that support FP16 storage
Graph cutting: remove dead layers that do not contribute to any output blob

Source: tools/CMakeLists.txt31-35

For full documentation of ncnnoptimize passes, see Model Optimization Tools.

ncnn2mem

Usage:

ncnn2mem model.param model.bin model.id.h model.mem.cpp

Source: tools/CMakeLists.txt25-29

For more detail, see Model Optimization Tools.

ncnnmerge

ncnnmerge is a utility for combining multiple .param / .bin file pairs into a single merged model. It is built from tools/ncnnmerge.cpp and installed alongside the other tools.

Source: tools/CMakeLists.txt37-44

Conversion Tools

Tool	Requires	CMake file
`caffe2ncnn`	Protobuf	`tools/caffe/CMakeLists.txt`
`onnx2ncnn`	Protobuf	`tools/onnx/CMakeLists.txt`
`mxnet2ncnn`	none	`tools/mxnet/CMakeLists.txt`

For detailed documentation of these converters, see ONNX, MXNet, and Caffe Converters.

The recommended conversion path for PyTorch and ONNX models is the PNNX tool; see PNNX PyTorch Converter Architecture.

Sources: tools/caffe/CMakeLists.txt1-39 tools/onnx/CMakeLists.txt1-39 tools/mxnet/CMakeLists.txt1-6

Build System Integration

if(NCNN_INT8)
    add_subdirectory(quantize)
else()
    message(WARNING "NCNN_INT8 disabled, quantize tools won't be built")
endif()

The benchmark/ directory must be explicitly uncommented in the root CMakeLists.txt or added with -DNCNN_BUILD_BENCHMARKS=ON depending on the build configuration.

Diagram: CMake target dependencies for tools

Sources: tools/CMakeLists.txt1-46 benchmark/CMakeLists.txt1-69 tools/quantize/CMakeLists.txt1-43 tools/caffe/CMakeLists.txt1-39 tools/onnx/CMakeLists.txt1-39 tools/mxnet/CMakeLists.txt1-6

Tools and Utilities

Tool Inventory

Tool Relationships in the Deployment Pipeline

benchncnn

Built-in model embedding

RankCards

Post-Training Quantization Tools

ncnn2table calibration inputs

Calibration algorithms

Mixed-precision inference

ncnnoptimize

ncnn2mem

ncnnmerge

Conversion Tools

Build System Integration

On this page

Tools and Utilities

Tool Inventory

Tool Relationships in the Deployment Pipeline

benchncnn

Built-in model embedding

RankCards

Post-Training Quantization Tools

ncnn2table calibration inputs

Calibration algorithms

Mixed-precision inference

ncnnoptimize

ncnn2mem

ncnnmerge

Conversion Tools

Build System Integration

On this page