Model Conversion and Optimization

Relevant source files

This page describes the toolchain for converting trained models from external frameworks into ncnn's native format and for subsequently optimizing those models before deployment. It covers the available converter tools, the ncnn model file format, optimization passes in ncnnoptimize, INT8 quantization via ncnn2int8, and binary embedding via ncnn2mem.

For runtime details on how the resulting .param and .bin files are loaded and executed, see Core Runtime Architecture and Network Loading and Inference Pipeline. For the internal PNNX IR and its detailed pass pipeline, see PNNX PyTorch Converter Architecture and PNNX Intermediate Representation. For the post-training quantization tools in depth, see Post-Training Quantization Tools.

Overview

ncnn models consist of two files:

File	Content
`.param`	Text file describing the layer graph — layer types, names, input/output blob names, and per-layer parameters
`.bin`	Binary file containing all weight data in row-major order

Every .param file begins with the magic number 7767517, followed by a line containing the total layer count and blob count. Each subsequent line describes one layer:

7767517
<layer_count> <blob_count>
<LayerType> <LayerName> <bottom_count> <top_count> <bottoms...> <tops...> [param_id=value ...]

This format is written by all converters and consumed by ncnn::Net::load_param at runtime.

Sources: tools/caffe/caffe2ncnn.cpp99-100 tools/mxnet/mxnet2ncnn.cpp981-982 tools/onnx/onnx2ncnn.cpp1-20 tools/modelwriter.h208-244

Conversion Tool Landscape

Conversion pipeline diagram

Sources: tools/pnnx/src/main.cpp226-400 tools/onnx/onnx2ncnn.cpp1-43 tools/caffe/caffe2ncnn.cpp65-100 tools/mxnet/mxnet2ncnn.cpp959-984 tools/ncnnoptimize.cpp36-78 tools/quantize/ncnn2int8.cpp108-131 tools/ncnn2mem.cpp153-230

PNNX (Recommended Converter)

PNNX is the primary recommended tool for converting PyTorch models. Its entry point is tools/pnnx/src/main.cpp. It accepts a TorchScript .pt file or an ONNX file and emits both PNNX-format intermediate files and ncnn-format output files.

Command-line Interface

pnnx model.pt inputshape=[1,3,224,224]
pnnx model.onnx inputshape=[1,3,224,224]

Key arguments parsed by main():

Argument	Default	Purpose
`pnnxparam`	`model.pnnx.param`	PNNX IR parameter file
`pnnxbin`	`model.pnnx.bin`	PNNX IR weight file
`ncnnparam`	`model.ncnn.param`	ncnn parameter output
`ncnnbin`	`model.ncnn.bin`	ncnn weight output
`fp16`	`1`	Save weights in FP16
`optlevel`	`2`	Optimization pass depth (0–5)
`inputshape`	—	Required for shape inference, e.g. `[1,3,224,224]`
`device`	`cpu`	`cpu` or `gpu` for tracing
`moduleop`	—	Comma-separated module class names to keep opaque

PNNX Conversion Pipeline

PNNX internal pipeline diagram

Sources: tools/pnnx/src/main.cpp23-36 tools/pnnx/src/main.cpp226-400 tools/pnnx/src/save_ncnn.cpp74-90

PNNX IR Classes

The PNNX intermediate representation is defined in tools/pnnx/src/ir.h and implemented in tools/pnnx/src/ir.cpp.

PNNX IR class diagram

Sources: tools/pnnx/src/ir.h1-200 tools/pnnx/src/ir.cpp520-791

PNNX to ncnn Output

save_ncnn() in tools/pnnx/src/save_ncnn.cpp iterates the Graph::ops list after pass_ncnn has remapped PNNX operators to ncnn layer types. It writes the .param file in ncnn's text format and the .bin file with optionally FP16-converted weight data.

Sources: tools/pnnx/src/save_ncnn.cpp74-120

Legacy Converters

caffe2ncnn

Located at tools/caffe/caffe2ncnn.cpp. Reads a Caffe .prototxt (text proto, via read_proto_from_text) and .caffemodel (binary proto, via read_proto_from_binary). Maps Caffe layer types to ncnn layer types, writing the .param and .bin files directly.

Notable type mappings applied by main():

Caffe type	ncnn type
`Convolution` (group != 1)	`ConvolutionDepthWise`
`Deconvolution` (group != 1)	`DeconvolutionDepthWise`
`MemoryData`	`Input`
`ReLU6`	`Clip`
`Silence`	`Noop`
`BN`	`Scale`

Where a blob is consumed by more than one layer, caffe2ncnn inserts synthetic Split layers and suffixed blob names (_splitncnn_N) to make the DAG explicit.

Sources: tools/caffe/caffe2ncnn.cpp65-270

onnx2ncnn

Located at tools/onnx/onnx2ncnn.cpp. Parses an ONNX binary protobuf via read_proto_from_binary into an onnx::ModelProto. Before writing ncnn output it runs several graph-level fusion passes over the mutable ONNX graph:

Pass function	What it fuses
`fuse_weight_reshape`	Absorbs `Reshape` of constant weights into weight tensors directly
`fuse_weight_transpose`	Absorbs `Transpose(perm=[1,0])` of 2-D weight tensors
`fuse_shufflechannel`	`Reshape – Transpose – Reshape` → `ShuffleChannel`
`fuse_shufflechannel_split`	`ShuffleChannel(reverse) – Gather – Gather` → `Split`
`fuse_hardswish`	`Add(+3) – Clip(0,6) – Mul – Div(/6)` → `HardSwish`
`fuse_hardsigmoid`	`Add(+3) – Clip(0,6) – Div(/6)` → `HardSigmoid`
`fuse_swish`	`Sigmoid – Mul` → `Swish`
`fuse_batchnorm1d_squeeze_unsqueeze`	`Unsqueeze – BN – Squeeze` → `BatchNormalization`
`fuse_rewrite_gather`	Single-index `Gather` → `Crop`

Nodes reduced by these passes are marked noop_reducedncnn and excluded from the final output.

Sources: tools/onnx/onnx2ncnn.cpp362-1200

mxnet2ncnn

Located at tools/mxnet/mxnet2ncnn.cpp. Reads an MXNet JSON symbol file via read_mxnet_json (custom line-by-line parser into MXNetNode objects) and an MXNet binary parameter file via read_mxnet_param. MXNet parameters are read as raw NDArray binary records identified by a magic number (0xF993FAC9 or 0xF993FAC8).

Similar pre-processing fusion passes exist:

Pass	What it fuses
`fuse_shufflechannel`	`Reshape – SwapAxis – Reshape` → `ShuffleChannel`
`fuse_hardsigmoid_hardswish`	`_plus_scalar(+3) – clip(0,6) – _div_scalar(/6)` → `HardSigmoid` or `HardSwish`

Sources: tools/mxnet/mxnet2ncnn.cpp342-960

ncnnoptimize

ncnnoptimize (implemented in tools/ncnnoptimize.cpp) loads an existing ncnn model using the NetOptimize class (which extends ModelWriter, itself extending ncnn::Net), applies a sequence of structural fusion and elimination passes, then saves the result to new .param and .bin files.

ModelWriter

ModelWriter (defined in tools/modelwriter.h) is a subclass of ncnn::Net that exposes the internal layers and blobs vectors mutably, adds storage_type (FP16 vs FP32), cutstart/cutend for graph section extraction, and provides serialization via save(parampath, binpath).

Sources: tools/modelwriter.h208-244

NetOptimize

NetOptimize adds fusion and elimination methods on top of ModelWriter:

Fusion passes — merge consecutive layers into a single parameterized layer:

Method	Effect
`fuse_batchnorm_scale`	`BatchNorm – Scale` → `BatchNorm`
`fuse_convolution_batchnorm`	`Convolution – BatchNorm` → `Convolution` (BN params folded into conv weights)
`fuse_convolution_mul`	`Convolution – BinaryOp(Mul, MemoryData)` → `Convolution`
`fuse_convolution_add`	`Convolution – BinaryOp(Add, MemoryData)` → `Convolution`
`fuse_convolutiondepthwise_batchnorm`	Same as above for depthwise conv
`fuse_deconvolution_batchnorm`	Same for transposed conv
`fuse_innerproduct_batchnorm`	`InnerProduct – BatchNorm` → `InnerProduct`
`fuse_innerproduct_add`	`InnerProduct – BinaryOp(Add, MemoryData)` → `InnerProduct`
`fuse_innerproduct_dropout`	`InnerProduct – Dropout` → `InnerProduct`
`fuse_convolution_activation`	Appends ReLU/Clip/Sigmoid activation into convolution `activation_type`
`fuse_memorydata_binaryop`	Folds constant `MemoryData` scalars into `BinaryOp`
`fuse_binaryop_eltwise`	`BinaryOp(Add) + MemoryData` → `Eltwise`

Elimination passes — remove no-op layers:

Method	Effect
`eliminate_dropout`	Removes all `Dropout` layers
`eliminate_pooling1x1`	Removes 1×1 stride-1 average pools with no padding
`eliminate_noop`	Removes `Noop` layers
`eliminate_split`	Removes `Split` layers with a single consumer
`eliminate_orphaned_memorydata`	Removes `MemoryData` not consumed by any layer
`eliminate_flatten_after_global_pooling`	Removes redundant `Flatten` after global pool
`eliminate_reshape_after_global_pooling`	Same for `Reshape`
`eliminate_flatten_after_innerproduct`	Removes `Flatten` after `InnerProduct`
`eliminate_reshape_before_binaryop`	Removes rank-expanding `Reshape` before `BinaryOp`

Replace passes — substitute one layer type for a more efficient equivalent:

Method	Effect
`replace_reduction_with_global_pooling`	`Reduction(mean, all axes)` → `Pooling(global_avg)`
`replace_prelu_with_leaky_relu`	`PReLU` with single slope → `ReLU(negative_slope)`
`replace_convolution_with_innerproduct_after_global_pooling`	1×1 `Convolution` after global pool → `InnerProduct`
`replace_convolution_with_innerproduct_after_innerproduct`	1×1 `Convolution` after `InnerProduct` → `InnerProduct`

Sources: tools/ncnnoptimize.cpp36-78 tools/ncnnoptimize.cpp85-2500

BatchNorm Folding

The fuse_convolution_batchnorm pass is representative of how weight-level arithmetic is used during fusion. Given BatchNorm parameters slope, mean, var, bias, and epsilon:

b[i] = slope[i] / sqrt(var[i] + eps)
a[i] = bias[i] - slope[i] * mean[i] / sqrt(var[i] + eps)

The convolution weights for output channel i are multiplied by b[i], and the bias is updated to bias[i] * b[i] + a[i]. The BatchNorm layer is then marked ncnnfused so it is excluded on save.

Sources: tools/ncnnoptimize.cpp146-227

FP16 Weight Storage

ModelWriter::storage_type controls whether weights are written as float32 (type 0) or float16 (type 1) in fwrite_weight_tag_data. The quantize tag prepended to each weight blob in the .bin file encodes the storage type so that ModelBin on the runtime side knows how to interpret it.

Sources: tools/modelwriter.h220-242 src/modelbin.cpp1-80

Usage

ncnnoptimize model.param model.bin model_opt.param model_opt.bin [storage_type]

storage_type: 0 = FP32, 1 = FP16.

INT8 Post-Training Quantization

INT8 quantization is a two-step process:

Step 1: Run ncnn2table (see Post-Training Quantization Tools) against a calibration dataset to produce a .table file with per-layer and per-weight activation scales.

Step 2: Run ncnn2int8 with the original model and the .table file to produce an INT8 model.

ncnn2int8

The NetQuantize class in tools/quantize/ncnn2int8.cpp extends ModelWriter and holds two maps read from the scale table file:

blob_int8scale_table — maps layer name → ncnn::Mat of activation scales
weight_int8scale_table — maps <layername>_param_0 → ncnn::Mat of per-output-channel weight scales

Quantization methods:

Method	Target layer
`quantize_convolution`	`Convolution` — weights quantized with `ncnn::quantize_to_int8`
`quantize_convolutiondepthwise`	`ConvolutionDepthWise` — per-group quantization
`quantize_innerproduct`	`InnerProduct`
`quantize_rnn`	`RNN` — per-output-row scale derived from weight abs-max
`quantize_lstm`	`LSTM`
`quantize_gru`	`GRU`
`quantize_embed`	`Embed`
`quantize_gemm`	`Gemm`
`quantize_multiheadattention`	`MultiHeadAttention`
`quantize_sdpa`	`SDPA`
`fuse_requantize`	Fuses `Dequantize – Quantize` pairs into `Requantize`

After quantization, the layer's int8_scale_term is set (e.g. 2 for per-channel), and weight_data_int8_scales / bottom_blob_int8_scales are stored in the layer parameters. These are serialized into the .bin file so the runtime's Quantize and Requantize layers can apply them.

Quantization workflow diagram

Sources: tools/quantize/ncnn2int8.cpp37-106 tools/quantize/ncnn2int8.cpp108-131 tools/quantize/ncnn2int8.cpp138-194

ncnn2mem

ncnn2mem (in tools/ncnn2mem.cpp) converts a model's .param and .bin files into formats that can be compiled directly into a binary, eliminating filesystem access at runtime.

It produces:

Output file	Content
`model.param.bin`	Binary-encoded version of the text `.param` file
`model.id.h`	C++ header with a namespace of integer constants for each layer and blob name

The text .param is re-parsed by dump_param() and re-serialized as binary data. sanitize_name() converts blob/layer names into valid C++ identifiers. The generated model.param.bin is loaded at runtime by passing its address to Net::load_param_bin via a DataReaderFromMemory.

The .bin file itself is used directly as a byte array — the application either mmaps it or stores it in a const unsigned char[].

ncnn2mem output diagram

The model.id.h header lets the application refer to blobs by name at compile time rather than via string lookup. For example:

Sources: tools/ncnn2mem.cpp153-230 tools/ncnn2mem.cpp31-51

ModelBin and DataReader

Both converters and the runtime use the ModelBin / DataReader abstraction for reading weights. ModelBin (in src/modelbin.h and src/modelbin.cpp) provides load(w, type) methods that interpret the quantization tag at the start of each weight blob. DataReader (in src/datareader.h, src/datareader.cpp) abstracts over file I/O (DataReaderFromStdio), memory pointer (DataReaderFromMemory), and other sources.

ModelWriter in tools/modelwriter.h writes weights with fwrite_weight_tag_data, which prepends the storage-type tag before the raw float data.

Sources: src/modelbin.h1-50 src/modelbin.cpp1-80 src/datareader.h1-40 tools/modelwriter.h238-244

Recommended Workflow

Use case	Recommended path
PyTorch model	`pnnx` → `ncnnoptimize`
ONNX model	`pnnx` (ONNX input) or `onnx2ncnn` → `ncnnoptimize`
Caffe model	`caffe2ncnn` → `ncnnoptimize`
MXNet model	`mxnet2ncnn` → `ncnnoptimize`
Mobile deployment, size matters	add `ncnnoptimize` with `storage_type=1` (FP16)
Highest throughput on ARM/x86	add `ncnn2table` + `ncnn2int8` for INT8
No filesystem on device	add `ncnn2mem` to embed the model

Sources: tools/pnnx/src/main.cpp201-224 tools/ncnnoptimize.cpp36-78 tools/quantize/ncnn2int8.cpp108-131 tools/ncnn2mem.cpp153-170

Model Conversion and Optimization

Relevant source files

Overview

ncnn models consist of two files:

File	Content
`.param`	Text file describing the layer graph — layer types, names, input/output blob names, and per-layer parameters
`.bin`	Binary file containing all weight data in row-major order

Every .param file begins with the magic number 7767517, followed by a line containing the total layer count and blob count. Each subsequent line describes one layer:

7767517
<layer_count> <blob_count>
<LayerType> <LayerName> <bottom_count> <top_count> <bottoms...> <tops...> [param_id=value ...]

This format is written by all converters and consumed by ncnn::Net::load_param at runtime.

Sources: tools/caffe/caffe2ncnn.cpp99-100 tools/mxnet/mxnet2ncnn.cpp981-982 tools/onnx/onnx2ncnn.cpp1-20 tools/modelwriter.h208-244

Conversion Tool Landscape

Conversion pipeline diagram

PNNX (Recommended Converter)

Command-line Interface

pnnx model.pt inputshape=[1,3,224,224]
pnnx model.onnx inputshape=[1,3,224,224]

Key arguments parsed by main():

Argument	Default	Purpose
`pnnxparam`	`model.pnnx.param`	PNNX IR parameter file
`pnnxbin`	`model.pnnx.bin`	PNNX IR weight file
`ncnnparam`	`model.ncnn.param`	ncnn parameter output
`ncnnbin`	`model.ncnn.bin`	ncnn weight output
`fp16`	`1`	Save weights in FP16
`optlevel`	`2`	Optimization pass depth (0–5)
`inputshape`	—	Required for shape inference, e.g. `[1,3,224,224]`
`device`	`cpu`	`cpu` or `gpu` for tracing
`moduleop`	—	Comma-separated module class names to keep opaque

PNNX Conversion Pipeline

PNNX internal pipeline diagram

Sources: tools/pnnx/src/main.cpp23-36 tools/pnnx/src/main.cpp226-400 tools/pnnx/src/save_ncnn.cpp74-90

PNNX IR Classes

The PNNX intermediate representation is defined in tools/pnnx/src/ir.h and implemented in tools/pnnx/src/ir.cpp.

PNNX IR class diagram

Sources: tools/pnnx/src/ir.h1-200 tools/pnnx/src/ir.cpp520-791

PNNX to ncnn Output

Sources: tools/pnnx/src/save_ncnn.cpp74-120

Legacy Converters

caffe2ncnn

Notable type mappings applied by main():

Caffe type	ncnn type
`Convolution` (group != 1)	`ConvolutionDepthWise`
`Deconvolution` (group != 1)	`DeconvolutionDepthWise`
`MemoryData`	`Input`
`ReLU6`	`Clip`
`Silence`	`Noop`
`BN`	`Scale`

Where a blob is consumed by more than one layer, caffe2ncnn inserts synthetic Split layers and suffixed blob names (_splitncnn_N) to make the DAG explicit.

Sources: tools/caffe/caffe2ncnn.cpp65-270

onnx2ncnn

Pass function	What it fuses
`fuse_weight_reshape`	Absorbs `Reshape` of constant weights into weight tensors directly
`fuse_weight_transpose`	Absorbs `Transpose(perm=[1,0])` of 2-D weight tensors
`fuse_shufflechannel`	`Reshape – Transpose – Reshape` → `ShuffleChannel`
`fuse_shufflechannel_split`	`ShuffleChannel(reverse) – Gather – Gather` → `Split`
`fuse_hardswish`	`Add(+3) – Clip(0,6) – Mul – Div(/6)` → `HardSwish`
`fuse_hardsigmoid`	`Add(+3) – Clip(0,6) – Div(/6)` → `HardSigmoid`
`fuse_swish`	`Sigmoid – Mul` → `Swish`
`fuse_batchnorm1d_squeeze_unsqueeze`	`Unsqueeze – BN – Squeeze` → `BatchNormalization`
`fuse_rewrite_gather`	Single-index `Gather` → `Crop`

Nodes reduced by these passes are marked noop_reducedncnn and excluded from the final output.

Sources: tools/onnx/onnx2ncnn.cpp362-1200

mxnet2ncnn

Similar pre-processing fusion passes exist:

Pass	What it fuses
`fuse_shufflechannel`	`Reshape – SwapAxis – Reshape` → `ShuffleChannel`
`fuse_hardsigmoid_hardswish`	`_plus_scalar(+3) – clip(0,6) – _div_scalar(/6)` → `HardSigmoid` or `HardSwish`

Sources: tools/mxnet/mxnet2ncnn.cpp342-960

ncnnoptimize

ModelWriter

Sources: tools/modelwriter.h208-244

NetOptimize

NetOptimize adds fusion and elimination methods on top of ModelWriter:

Fusion passes — merge consecutive layers into a single parameterized layer:

Method	Effect
`fuse_batchnorm_scale`	`BatchNorm – Scale` → `BatchNorm`
`fuse_convolution_batchnorm`	`Convolution – BatchNorm` → `Convolution` (BN params folded into conv weights)
`fuse_convolution_mul`	`Convolution – BinaryOp(Mul, MemoryData)` → `Convolution`
`fuse_convolution_add`	`Convolution – BinaryOp(Add, MemoryData)` → `Convolution`
`fuse_convolutiondepthwise_batchnorm`	Same as above for depthwise conv
`fuse_deconvolution_batchnorm`	Same for transposed conv
`fuse_innerproduct_batchnorm`	`InnerProduct – BatchNorm` → `InnerProduct`
`fuse_innerproduct_add`	`InnerProduct – BinaryOp(Add, MemoryData)` → `InnerProduct`
`fuse_innerproduct_dropout`	`InnerProduct – Dropout` → `InnerProduct`
`fuse_convolution_activation`	Appends ReLU/Clip/Sigmoid activation into convolution `activation_type`
`fuse_memorydata_binaryop`	Folds constant `MemoryData` scalars into `BinaryOp`
`fuse_binaryop_eltwise`	`BinaryOp(Add) + MemoryData` → `Eltwise`

Elimination passes — remove no-op layers:

Method	Effect
`eliminate_dropout`	Removes all `Dropout` layers
`eliminate_pooling1x1`	Removes 1×1 stride-1 average pools with no padding
`eliminate_noop`	Removes `Noop` layers
`eliminate_split`	Removes `Split` layers with a single consumer
`eliminate_orphaned_memorydata`	Removes `MemoryData` not consumed by any layer
`eliminate_flatten_after_global_pooling`	Removes redundant `Flatten` after global pool
`eliminate_reshape_after_global_pooling`	Same for `Reshape`
`eliminate_flatten_after_innerproduct`	Removes `Flatten` after `InnerProduct`
`eliminate_reshape_before_binaryop`	Removes rank-expanding `Reshape` before `BinaryOp`

Replace passes — substitute one layer type for a more efficient equivalent:

Method	Effect
`replace_reduction_with_global_pooling`	`Reduction(mean, all axes)` → `Pooling(global_avg)`
`replace_prelu_with_leaky_relu`	`PReLU` with single slope → `ReLU(negative_slope)`
`replace_convolution_with_innerproduct_after_global_pooling`	1×1 `Convolution` after global pool → `InnerProduct`
`replace_convolution_with_innerproduct_after_innerproduct`	1×1 `Convolution` after `InnerProduct` → `InnerProduct`

Sources: tools/ncnnoptimize.cpp36-78 tools/ncnnoptimize.cpp85-2500

BatchNorm Folding

The fuse_convolution_batchnorm pass is representative of how weight-level arithmetic is used during fusion. Given BatchNorm parameters slope, mean, var, bias, and epsilon:

b[i] = slope[i] / sqrt(var[i] + eps)
a[i] = bias[i] - slope[i] * mean[i] / sqrt(var[i] + eps)

Sources: tools/ncnnoptimize.cpp146-227

FP16 Weight Storage

Sources: tools/modelwriter.h220-242 src/modelbin.cpp1-80

Usage

ncnnoptimize model.param model.bin model_opt.param model_opt.bin [storage_type]

storage_type: 0 = FP32, 1 = FP16.

INT8 Post-Training Quantization

INT8 quantization is a two-step process:

Step 1: Run ncnn2table (see Post-Training Quantization Tools) against a calibration dataset to produce a .table file with per-layer and per-weight activation scales.

Step 2: Run ncnn2int8 with the original model and the .table file to produce an INT8 model.

ncnn2int8

The NetQuantize class in tools/quantize/ncnn2int8.cpp extends ModelWriter and holds two maps read from the scale table file:

blob_int8scale_table — maps layer name → ncnn::Mat of activation scales
weight_int8scale_table — maps <layername>_param_0 → ncnn::Mat of per-output-channel weight scales

Quantization methods:

Method	Target layer
`quantize_convolution`	`Convolution` — weights quantized with `ncnn::quantize_to_int8`
`quantize_convolutiondepthwise`	`ConvolutionDepthWise` — per-group quantization
`quantize_innerproduct`	`InnerProduct`
`quantize_rnn`	`RNN` — per-output-row scale derived from weight abs-max
`quantize_lstm`	`LSTM`
`quantize_gru`	`GRU`
`quantize_embed`	`Embed`
`quantize_gemm`	`Gemm`
`quantize_multiheadattention`	`MultiHeadAttention`
`quantize_sdpa`	`SDPA`
`fuse_requantize`	Fuses `Dequantize – Quantize` pairs into `Requantize`

Quantization workflow diagram

Sources: tools/quantize/ncnn2int8.cpp37-106 tools/quantize/ncnn2int8.cpp108-131 tools/quantize/ncnn2int8.cpp138-194

ncnn2mem

ncnn2mem (in tools/ncnn2mem.cpp) converts a model's .param and .bin files into formats that can be compiled directly into a binary, eliminating filesystem access at runtime.

It produces:

Output file	Content
`model.param.bin`	Binary-encoded version of the text `.param` file
`model.id.h`	C++ header with a namespace of integer constants for each layer and blob name

The .bin file itself is used directly as a byte array — the application either mmaps it or stores it in a const unsigned char[].

ncnn2mem output diagram

The model.id.h header lets the application refer to blobs by name at compile time rather than via string lookup. For example:

Sources: tools/ncnn2mem.cpp153-230 tools/ncnn2mem.cpp31-51

ModelBin and DataReader

ModelWriter in tools/modelwriter.h writes weights with fwrite_weight_tag_data, which prepends the storage-type tag before the raw float data.

Sources: src/modelbin.h1-50 src/modelbin.cpp1-80 src/datareader.h1-40 tools/modelwriter.h238-244

Recommended Workflow

Use case	Recommended path
PyTorch model	`pnnx` → `ncnnoptimize`
ONNX model	`pnnx` (ONNX input) or `onnx2ncnn` → `ncnnoptimize`
Caffe model	`caffe2ncnn` → `ncnnoptimize`
MXNet model	`mxnet2ncnn` → `ncnnoptimize`
Mobile deployment, size matters	add `ncnnoptimize` with `storage_type=1` (FP16)
Highest throughput on ARM/x86	add `ncnn2table` + `ncnn2int8` for INT8
No filesystem on device	add `ncnn2mem` to embed the model

Sources: tools/pnnx/src/main.cpp201-224 tools/ncnnoptimize.cpp36-78 tools/quantize/ncnn2int8.cpp108-131 tools/ncnn2mem.cpp153-170

Model Conversion and Optimization

Overview

Conversion Tool Landscape

PNNX (Recommended Converter)

Command-line Interface

PNNX Conversion Pipeline

PNNX IR Classes

PNNX to ncnn Output

Legacy Converters

caffe2ncnn

onnx2ncnn

mxnet2ncnn

ncnnoptimize

ModelWriter

NetOptimize

BatchNorm Folding

FP16 Weight Storage

Usage

INT8 Post-Training Quantization

ncnn2int8

ncnn2mem

ModelBin and DataReader

Recommended Workflow

On this page

Model Conversion and Optimization

Overview

Conversion Tool Landscape

PNNX (Recommended Converter)

Command-line Interface

PNNX Conversion Pipeline

PNNX IR Classes

PNNX to ncnn Output

Legacy Converters

caffe2ncnn

onnx2ncnn

mxnet2ncnn

ncnnoptimize

ModelWriter

NetOptimize

BatchNorm Folding

FP16 Weight Storage

Usage

INT8 Post-Training Quantization

ncnn2int8

ncnn2mem

ModelBin and DataReader

Recommended Workflow

On this page