C++ API Reference

Relevant source files

This document provides comprehensive reference documentation for the ncnn C++ API. It covers the primary classes and methods that application developers use to load models, configure inference, and execute neural networks.

For information about the C API wrapper, see C API and Python Bindings. For details on how ncnn loads and optimizes models internally, see Network Loading and Inference Pipeline. For platform-specific build configuration, see Platform Support and Build System.

Core API Classes

The ncnn C++ API consists of several key classes that work together to provide neural network inference capabilities:

Sources: src/net.h27-164 src/net.h167-244 src/mat.h50-336 src/mat.h341-462 src/layer.h20-141

ncnn::Net Class

The ncnn::Net class is the primary container for neural network models. It manages the network structure (layers and blobs), provides methods for loading model files, and creates Extractor objects for inference.

Class Definition

Loading Models

The Net class provides multiple methods for loading model parameters and weights. Models in ncnn consist of two files:

.param or .param.bin: Network structure (text or binary format)
.bin: Layer weights

From File Paths:

Method	Purpose	Source
`load_param(const char* protopath)`	Load text parameter file	src/net.h72
`load_param_bin(const char* protopath)`	Load binary parameter file	src/net.h84
`load_model(const char* modelpath)`	Load weight binary file	src/net.h92

From Memory:

Method	Purpose	Source
`load_param(const unsigned char* mem)`	Load from 32-bit aligned memory, returns bytes consumed	src/net.h101
`load_model(const unsigned char* mem)`	Reference weights from memory (zero-copy), returns bytes consumed	src/net.h108

From DataReader:

Method	Purpose	Source
`load_param(const DataReader& dr)`	Load text parameters via DataReader	src/net.h60
`load_param_bin(const DataReader& dr)`	Load binary parameters via DataReader	src/net.h63
`load_model(const DataReader& dr)`	Load weights via DataReader	src/net.h65

Android Asset Manager (when NCNN_PLATFORM_API=ON):

Method	Purpose	Source
`load_param(AAssetManager* mgr, const char* assetpath)`	Load from Android asset	src/net.h115
`load_param_bin(AAssetManager* mgr, const char* assetpath)`	Load binary from Android asset	src/net.h119
`load_model(AAssetManager* mgr, const char* assetpath)`	Load weights from Android asset	src/net.h123

Configuration and Initialization

The Net class has a public opt member of type Option that should be configured before loading models:

Vulkan Device Management

When NCNN_VULKAN is enabled:

Method	Purpose	Source
`set_vulkan_device(int device_index)`	Set GPU by index	src/net.h41
`set_vulkan_device(const VulkanDevice* vkdev)`	Set GPU by device handle (no ownership transfer)	src/net.h44
`vulkan_device()`	Get current VulkanDevice	src/net.h46

Custom Layer Registration

Method	Purpose	Source
`register_custom_layer(const char* type, layer_creator_func, layer_destroyer_func, userdata)`	Register or override layer by name	src/net.h52
`register_custom_layer(int index, layer_creator_func, layer_destroyer_func, userdata)`	Register or override layer by type index	src/net.h57

Inference Creation

Method	Purpose	Source
`create_extractor()`	Construct an Extractor for inference	src/net.h131

Each call to create_extractor() returns a new, independent inference session that can run in parallel with other extractors.

Network Inspection

Method	Purpose	Source
`input_indexes()`	Get vector of input blob indices	src/net.h134
`output_indexes()`	Get vector of output blob indices	src/net.h135
`input_names()`	Get vector of input blob names (when `NCNN_STRING=ON`)	src/net.h137
`output_names()`	Get vector of output blob names (when `NCNN_STRING=ON`)	src/net.h138
`blobs()`	Get vector of Blob objects	src/net.h141
`layers()`	Get vector of Layer pointers	src/net.h142

Cleanup

Method	Purpose	Source
`clear()`	Unload network structure and weight data	src/net.h128

The destructor automatically calls clear(), so explicit cleanup is optional.

Sources: src/net.h27-164 src/c_api.cpp1-1095

ncnn::Extractor Class

The ncnn::Extractor class represents an inference session. It holds temporary blob storage, allocators, and provides methods to set inputs and extract outputs. Multiple extractors can be created from a single Net and used concurrently (each in its own thread).

Class Definition and Lifecycle

Constructor and Copy:

Method	Purpose	Source
`Extractor(const Extractor&)`	Copy constructor	src/net.h173
`operator=(const Extractor&)`	Assignment operator	src/net.h176
`~Extractor()`	Destructor	src/net.h170

Extractors are copyable but not recommended to copy due to internal allocator state.

Configuration Methods

Method	Purpose	Source
`clear()`	Clear blob mats and allocators	src/net.h179
`set_light_mode(bool enable)`	Enable light mode (intermediate blob recycling), enabled by default	src/net.h184
`set_blob_allocator(Allocator*)`	Set blob memory allocator	src/net.h187
`set_workspace_allocator(Allocator*)`	Set workspace memory allocator	src/net.h190

Vulkan-specific configuration (when NCNN_VULKAN=ON):

Method	Purpose	Source
`set_blob_vkallocator(VkAllocator*)`	Set GPU blob allocator	src/net.h193
`set_workspace_vkallocator(VkAllocator*)`	Set GPU workspace allocator	src/net.h195
`set_staging_vkallocator(VkAllocator*)`	Set GPU staging allocator for CPU-GPU transfers	src/net.h197

Input and Output Methods

CPU Tensors:

Method	Purpose	Source
`input(const char* blob_name, const Mat& in)`	Set input by blob name (when `NCNN_STRING=ON`)	src/net.h203
`input(int blob_index, const Mat& in)`	Set input by blob index	src/net.h214
`extract(const char* blob_name, Mat& feat, int type=0)`	Get output by blob name (when `NCNN_STRING=ON`)	src/net.h209
`extract(int blob_index, Mat& feat, int type=0)`	Get output by blob index	src/net.h220

The type parameter controls conversion behavior:

type = 0: Default, convert fp16/bf16 to fp32 and unpack if needed
type = 1: No conversion, return blob in its native format

GPU Tensors (when NCNN_VULKAN=ON):

Method	Purpose	Source
`input(const char* blob_name, const VkMat& in)`	Set GPU input by name	src/net.h224
`input(int blob_index, const VkMat& in)`	Set GPU input by index	src/net.h231
`extract(const char* blob_name, VkMat& feat, VkCompute&, int type=0)`	Get GPU output by name	src/net.h227
`extract(int blob_index, VkMat& feat, VkCompute&, int type=0)`	Get GPU output by index	src/net.h234
`input(const char* blob_name, const VkImageMat& in)`	Set GPU image input by name	src/net.h238
`input(int blob_index, const VkImageMat& in)`	Set GPU image input by index	src/net.h241

Sources: src/net.h167-244 src/c_api.cpp1-1095

ncnn::Mat Class

The ncnn::Mat class is the primary CPU tensor data structure in ncnn. It supports 1D, 2D, 3D, and 4D tensors with flexible element packing for SIMD optimization.

Memory Layout and Structure

Key Fields:

Field	Type	Description	Source
`data`	`void*`	Pointer to tensor data	src/mat.h305
`refcount`	`int*`	Reference counter, NULL for external data	src/mat.h309
`elemsize`	`size_t`	Element size: 4 (fp32/int32), 2 (fp16), 1 (int8), 0 (empty)	src/mat.h316
`elempack`	`int`	Packed count: 1 (scalar), 4 (SSE/NEON), 8 (AVX/fp16), 16 (AVX512)	src/mat.h322
`dims`	`int`	Dimension rank: 1 (vec), 2 (image), 3 (feature), 4 (volume)	src/mat.h328
`w, h, d, c`	`int`	Width, height, depth, channels	src/mat.h330-333
`cstep`	`size_t`	Channel step size, 16-byte aligned	src/mat.h335
`allocator`	`Allocator*`	Memory allocator	src/mat.h325

Constructors and Creation

Empty and Dimension Constructors:

Constructor	Purpose	Source
`Mat()`	Empty constructor	src/mat.h54
`Mat(int w, size_t elemsize, Allocator*)`	1D vector	src/mat.h56
`Mat(int w, int h, size_t elemsize, Allocator*)`	2D image	src/mat.h58
`Mat(int w, int h, int c, size_t elemsize, Allocator*)`	3D feature map	src/mat.h60
`Mat(int w, int h, int d, int c, size_t elemsize, Allocator*)`	4D volume	src/mat.h62

Packed Constructors:

Constructor	Purpose	Source
`Mat(int w, size_t elemsize, int elempack, Allocator*)`	Packed 1D	src/mat.h64
`Mat(int w, int h, size_t elemsize, int elempack, Allocator*)`	Packed 2D	src/mat.h66
`Mat(int w, int h, int c, size_t elemsize, int elempack, Allocator*)`	Packed 3D	src/mat.h68
`Mat(int w, int h, int d, int c, size_t elemsize, int elempack, Allocator*)`	Packed 4D	src/mat.h70

External Data Constructors:

External constructors wrap existing memory without copying. Useful for zero-copy integration.

Constructor	Purpose	Source
`Mat(int w, void* data, size_t elemsize, Allocator*)`	External 1D	src/mat.h74
`Mat(int w, int h, void* data, size_t elemsize, Allocator*)`	External 2D	src/mat.h76
`Mat(int w, int h, int c, void* data, size_t elemsize, Allocator*)`	External 3D	src/mat.h78
`Mat(int w, int h, int d, int c, void* data, size_t elemsize, Allocator*)`	External 4D	src/mat.h80

Memory Management Methods

Method	Purpose	Source
`create(int w, size_t elemsize, Allocator*)`	Allocate 1D	src/mat.h149
`create(int w, int h, size_t elemsize, Allocator*)`	Allocate 2D	src/mat.h151
`create(int w, int h, int c, size_t elemsize, Allocator*)`	Allocate 3D	src/mat.h153
`create(int w, int h, int d, int c, size_t elemsize, Allocator*)`	Allocate 4D	src/mat.h155
`create_like(const Mat& m, Allocator*)`	Allocate with same shape	src/mat.h165
`clone(Allocator*)`	Deep copy	src/mat.h137
`clone_from(const Mat& mat, Allocator*)`	Deep copy from another Mat, inplace	src/mat.h139
`addref()`	Increment reference count	src/mat.h173
`release()`	Decrement reference count, free if zero	src/mat.h175

Data Access Methods

Method	Purpose	Source
`empty()`	Check if Mat is empty	src/mat.h177
`total()`	Total element count	src/mat.h178
`elembits()`	Bits per element	src/mat.h181
`shape()`	Get shape-only Mat	src/mat.h184
`channel(int c)`	Get channel reference	src/mat.h187-188
`depth(int z)`	Get depth slice reference	src/mat.h189-190
`row(int y)`	Get row pointer	src/mat.h191-196
`channel_range(int c, int channels)`	Get channel range reference	src/mat.h199-200
`depth_range(int z, int depths)`	Get depth range reference	src/mat.h201-202
`row_range(int y, int rows)`	Get row range reference	src/mat.h203-204
`range(int x, int n)`	Get element range reference	src/mat.h205-206
`operator T*()`	Cast to typed pointer	src/mat.h210-212
`operator[](size_t i)`	Access element at index	src/mat.h215-216

Reshape Methods

Method	Purpose	Source
`reshape(int w, Allocator*)`	Reshape to 1D	src/mat.h141
`reshape(int w, int h, Allocator*)`	Reshape to 2D	src/mat.h143
`reshape(int w, int h, int c, Allocator*)`	Reshape to 3D	src/mat.h145
`reshape(int w, int h, int d, int c, Allocator*)`	Reshape to 4D	src/mat.h147

Fill Methods

Method	Purpose	Source
`fill(float v)`	Fill with float scalar	src/mat.h94
`fill(int v)`	Fill with int scalar	src/mat.h95
`fill<T>(T v)`	Template fill	src/mat.h135

Platform-specific SIMD fill methods are also available for NEON, SSE, AVX, etc.

Pixel Conversion Methods (when `NCNN_PIXEL=ON`)

The Mat class provides convenient methods for converting between image pixel formats and neural network input format.

Pixel Format Enum:

Format	Value	Description	Source
`PIXEL_RGB`	1	RGB color	src/mat.h225
`PIXEL_BGR`	2	BGR color	src/mat.h226
`PIXEL_GRAY`	3	Grayscale	src/mat.h227
`PIXEL_RGBA`	4	RGB with alpha	src/mat.h228
`PIXEL_BGRA`	5	BGR with alpha	src/mat.h229

Static Factory Methods:

Method	Purpose	Source
`from_pixels(const unsigned char* pixels, int type, int w, int h, Allocator*)`	Create from pixel data	src/mat.h257
`from_pixels(const unsigned char* pixels, int type, int w, int h, int stride, Allocator*)`	Create with stride parameter	src/mat.h259
`from_pixels_resize(const unsigned char* pixels, int type, int w, int h, int target_width, int target_height, Allocator*)`	Create and resize	src/mat.h261
`from_pixels_roi(const unsigned char* pixels, int type, int w, int h, int roix, int roiy, int roiw, int roih, Allocator*)`	Create from ROI	src/mat.h265
`from_pixels_roi_resize(...)`	Create from ROI and resize	src/mat.h269

Export Methods:

Method	Purpose	Source
`to_pixels(unsigned char* pixels, int type)`	Export to pixel data	src/mat.h274
`to_pixels(unsigned char* pixels, int type, int stride)`	Export with stride	src/mat.h276
`to_pixels_resize(unsigned char* pixels, int type, int target_width, int target_height)`	Export and resize	src/mat.h278

Android Integration (when NCNN_PLATFORM_API=ON and __ANDROID_API__ >= 9):

Method	Purpose	Source
`from_android_bitmap(JNIEnv* env, jobject bitmap, int type_to, Allocator*)`	Create from Android Bitmap	src/mat.h285
`from_android_bitmap_resize(JNIEnv* env, jobject bitmap, int type_to, int target_width, int target_height, Allocator*)`	Create from Bitmap and resize	src/mat.h287
`to_android_bitmap(JNIEnv* env, jobject bitmap, int type_from)`	Export to Android Bitmap	src/mat.h293

Preprocessing:

Method	Purpose	Source
`substract_mean_normalize(const float* mean_vals, const float* norm_vals)`	Subtract mean and normalize, pass 0 to skip	src/mat.h299

Sources: src/mat.h50-336 src/mat.cpp19-820

ncnn::VkMat and ncnn::VkImageMat Classes

When NCNN_VULKAN is enabled, ncnn provides GPU tensor classes for Vulkan compute.

ncnn::VkMat - GPU Buffer Tensor

VkMat is a GPU buffer-based tensor similar to Mat but allocated in GPU device memory.

Key Fields:

Field	Type	Description	Source
`data`	`VkBufferMemory*`	Device buffer memory	src/mat.h431
`refcount`	`int*`	Reference counter	src/mat.h435
`elemsize`	`size_t`	Element size (4, 2, 1, 0)	src/mat.h442
`elempack`	`int`	Packed element count	src/mat.h448
`allocator`	`VkAllocator*`	Vulkan allocator	src/mat.h451
`dims, w, h, d, c`	`int`	Dimensions	src/mat.h454-459
`cstep`	`size_t`	Channel step	src/mat.h461

Constructors:

Constructor	Purpose	Source
`VkMat()`	Empty	src/mat.h345
`VkMat(int w, size_t elemsize, VkAllocator*)`	1D	src/mat.h347
`VkMat(int w, int h, size_t elemsize, VkAllocator*)`	2D	src/mat.h349
`VkMat(int w, int h, int c, size_t elemsize, VkAllocator*)`	3D	src/mat.h351
`VkMat(int w, int h, int d, int c, size_t elemsize, VkAllocator*)`	4D	src/mat.h353

Methods:

Method	Purpose	Source
`create(int w, size_t elemsize, VkAllocator*)`	Allocate 1D	src/mat.h385
`create(int w, int h, size_t elemsize, VkAllocator*)`	Allocate 2D	src/mat.h387
`create(int w, int h, int c, size_t elemsize, VkAllocator*)`	Allocate 3D	src/mat.h389
`create_like(const Mat& m, VkAllocator*)`	Allocate like Mat	src/mat.h401
`create_like(const VkMat& m, VkAllocator*)`	Allocate like VkMat	src/mat.h403
`mapped()`	Get mapped CPU Mat	src/mat.h408
`mapped_ptr()`	Get mapped pointer	src/mat.h409
`buffer()`	Get VkBuffer handle	src/mat.h426
`buffer_offset()`	Get buffer offset	src/mat.h427
`buffer_capacity()`	Get buffer capacity	src/mat.h428

ncnn::VkImageMat - GPU Image Tensor

VkImageMat is a GPU image-based tensor optimized for texture sampling operations.

Key Fields:

Field	Type	Description	Source
`data`	`VkImageMemory*`	Device image memory	src/mat.h548
`refcount`	`int*`	Reference counter	src/mat.h552
`elemsize`	`size_t`	Element size	src/mat.h559
`elempack`	`int`	Packed element count	src/mat.h565
`allocator`	`VkAllocator*`	Vulkan allocator	src/mat.h568
`dims, w, h, d, c`	`int`	Dimensions	src/mat.h571-576

Constructors and Methods:

Similar to VkMat, with image-specific allocation.

Method	Purpose	Source
`create(int w, size_t elemsize, VkAllocator*)`	Allocate 1D image	src/mat.h508
`create(int w, int h, size_t elemsize, VkAllocator*)`	Allocate 2D image	src/mat.h510
`create(int w, int h, int c, size_t elemsize, VkAllocator*)`	Allocate 3D image	src/mat.h512
`mapped()`	Get mapped CPU Mat	src/mat.h524
`mapped_ptr()`	Get mapped pointer	src/mat.h525
`image()`	Get VkImage handle	src/mat.h534
`imageview()`	Get VkImageView handle	src/mat.h535

Sources: src/mat.h341-578 src/mat.cpp822-2500

ncnn::Option Class

The Option class controls runtime configuration for network inference. It affects threading, memory allocation, precision modes, and optimization strategies.

Key Configuration Fields

Threading Configuration:

Field	Type	Default	Description	Source
`num_threads`	`int`	CPU count	Number of threads for inference	References in src/c_api.cpp152-160
`use_local_pool_allocator`	`bool`	true	Use thread-local memory pool	References in src/c_api.cpp182-185

Precision Modes:

Field	Type	Default	Description	Source
`use_fp16_packed`	`bool`	false	Enable fp16 packing	References in src/c_api.cpp202-205
`use_fp16_storage`	`bool`	false	Store weights in fp16	References in src/c_api.cpp207-210
`use_fp16_arithmetic`	`bool`	false	Use fp16 arithmetic	References in src/c_api.cpp212-215
`use_int8_packed`	`bool`	false	Enable int8 packing	References in src/c_api.cpp217-220
`use_int8_storage`	`bool`	false	Store weights in int8	References in src/c_api.cpp222-225
`use_int8_arithmetic`	`bool`	false	Use int8 arithmetic	References in src/c_api.cpp227-230
`use_bf16_packed`	`bool`	false	Enable bf16 packing	References in src/c_api.cpp232-235
`use_bf16_storage`	`bool`	false	Store weights in bf16	References in src/c_api.cpp237-240

Optimization Strategies:

Field	Type	Default	Description	Source
`use_packing_layout`	`bool`	true	Use SIMD-packed layouts	References in src/c_api.cpp197-200
`use_winograd_convolution`	`bool`	true	Use Winograd fast convolution	References in src/c_api.cpp187-190
`use_sgemm_convolution`	`bool`	true	Use GEMM-based convolution	References in src/c_api.cpp192-195

Vulkan GPU Options (when NCNN_VULKAN=ON):

Field	Type	Default	Description	Source
`use_vulkan_compute`	`bool`	false	Enable Vulkan GPU compute	References in src/c_api.cpp172-180
`use_shader_local_memory`	`bool`	true	Use shader local memory	References in src/c_api.cpp242-250
`use_cooperative_matrix`	`bool`	false	Use cooperative matrix operations	References in src/c_api.cpp252-260

Memory Allocators:

Field	Type	Description	Source
`blob_allocator`	`Allocator*`	Allocator for blob storage	References in src/c_api.cpp162-165
`workspace_allocator`	`Allocator*`	Allocator for temporary workspace	References in src/c_api.cpp167-170
`blob_vkallocator`	`VkAllocator*`	GPU blob allocator	Vulkan-specific
`workspace_vkallocator`	`VkAllocator*`	GPU workspace allocator	Vulkan-specific
`staging_vkallocator`	`VkAllocator*`	GPU staging allocator for transfers	Vulkan-specific

Light Mode:

Field	Type	Default	Description
`lightmode`	`bool`	true	Recycle intermediate blobs to reduce memory usage

Sources: src/c_api.cpp141-350 src/option.h

ncnn::Layer Class

The Layer class is the base class for all neural network layer implementations in ncnn. Custom layers can inherit from this class.

Layer Lifecycle and Hooks

Virtual Methods

Initialization and Cleanup:

Method	Purpose	Return	Source
`load_param(const ParamDict& pd)`	Load layer-specific parameters from parsed dict	0 on success	src/layer.h30
`load_model(const ModelBin& mb)`	Load layer-specific weight data	0 on success	src/layer.h34
`create_pipeline(const Option& opt)`	Layer implementation setup	0 on success	src/layer.h38
`destroy_pipeline(const Option& opt)`	Layer implementation cleanup	0 on success	src/layer.h42

CPU Forward Methods:

Method	Purpose	Source
`forward(const std::vector<Mat>&, std::vector<Mat>&, const Option&)`	Forward pass with multiple blobs	src/layer.h64
`forward(const Mat&, Mat&, const Option&)`	Forward pass with single blob	src/layer.h80
`forward_inplace(std::vector<Mat>&, const Option&)`	Inplace forward with multiple blobs	src/layer.h92
`forward_inplace(Mat&, const Option&)`	Inplace forward with single blob	src/layer.h97

Vulkan Forward Methods (when NCNN_VULKAN=ON):

Method	Purpose	Source
`upload_model(VkTransfer&, const Option&)`	Upload weights to GPU	src/layer.h103
`forward(const std::vector<VkMat>&, std::vector<VkMat>&, VkCompute&, const Option&)`	GPU forward with multiple blobs	src/layer.h108
`forward(const VkMat&, VkMat&, VkCompute&, const Option&)`	GPU forward with single blob	src/layer.h122
`forward_inplace(std::vector<VkMat>&, VkCompute&, const Option&)`	GPU inplace forward multiple	src/layer.h132
`forward_inplace(VkMat&, VkCompute&, const Option&)`	GPU inplace forward single	src/layer.h137

Layer Capability Flags

Flag	Type	Description	Source
`one_blob_only`	`bool`	Layer has exactly one input and one output	src/layer.h46
`support_inplace`	`bool`	Supports inplace operation (input = output)	src/layer.h49
`support_vulkan`	`bool`	Has Vulkan GPU implementation	src/layer.h52
`support_packing`	`bool`	Accepts packed storage (elempack > 1)	src/layer.h55
`support_bf16_storage`	`bool`	Accepts bf16 weights	src/layer.h58
`support_fp16_storage`	`bool`	Accepts fp16 weights	src/layer.h61
`support_int8_storage`	`bool`	Accepts int8 weights	src/layer.h64
`support_tensor_storage`	`bool`	Uses shader tensor storage	src/layer.h67
`support_vulkan_packing`	`bool`	Vulkan implementation supports packing	Referenced in context
`support_any_packing`	`bool`	CPU implementation supports any packing	Referenced in context
`support_vulkan_any_packing`	`bool`	Vulkan implementation supports any packing	Referenced in context

Layer Type System

Layers are identified by both string names and integer type indices.

Layer Registry Functions:

Function	Purpose	Source
`layer_to_index(const char* type)`	Convert layer name to type index	src/layer.cpp148-155
`index_to_layer(int index)`	Convert type index to layer name	src/layer.cpp157-163
`create_layer(const char* type)`	Create layer by name	src/layer.cpp165-182
`create_layer(int index)`	Create layer by type index	src/layer.cpp184-296

Custom Layer Registration

Custom layers can be registered with the Net class:

Sources: src/layer.h20-141 src/layer.cpp14-296

Memory Allocators

ncnn uses an abstract allocator interface to manage memory allocation for tensors and temporary workspace.

Allocator Hierarchy

ncnn::Allocator (Abstract Base)

The base allocator interface.

Method	Purpose	Source
`fastMalloc(size_t size)`	Allocate memory	src/allocator.h
`fastFree(void* ptr)`	Free memory	src/allocator.h

ncnn::PoolAllocator

Thread-safe pooled allocator with budget-based memory recycling.

Features:

Maintains a pool of previously allocated memory blocks
Reuses blocks to minimize allocation overhead
Thread-safe via mutex locking
Budget parameter controls memory retention

Methods:

Method	Purpose	Source
`PoolAllocator()`	Constructor	References in src/c_api.cpp114-121
`set_size_compare_ratio(float ratio)`	Set size matching tolerance for reuse	References in allocator context
`clear()`	Clear all pooled memory	References in allocator context

ncnn::UnlockedPoolAllocator

Non-thread-safe pooled allocator for single-threaded or thread-local usage.

Features:

Same pooling strategy as PoolAllocator
No mutex locking overhead
Suitable for thread-local workspace

Methods:

Method	Purpose	Source
`UnlockedPoolAllocator()`	Constructor	References in src/c_api.cpp123-130
`set_size_compare_ratio(float ratio)`	Set size matching tolerance	References in allocator context
`clear()`	Clear all pooled memory	References in allocator context

Vulkan Allocators (when `NCNN_VULKAN=ON`)

ncnn::VkAllocator: Base class for Vulkan memory allocators.

ncnn::VkBlobAllocator: Allocates device-local memory for blob storage.

ncnn::VkStagingAllocator: Allocates host-visible memory for CPU-GPU data transfers.

ncnn::VkWeightAllocator: Specialized allocator for layer weights with staging support.

Sources: src/c_api.cpp47-139 src/allocator.h src/allocator.cpp

Type System and Data Layout

Element Size and Type Encoding

elemsize	Type	Description
4	fp32 / int32	32-bit floating point or integer
2	fp16	16-bit floating point (IEEE 754 half precision)
1	int8 / uint8	8-bit integer or unsigned integer
0	empty	Empty tensor

Element Packing for SIMD

The elempack field indicates how many elements are packed together for SIMD processing:

elempack	Target ISA	Description
1	Scalar	No packing, scalar processing
4	SSE2, NEON	4-element vectors (128-bit)
8	AVX, FP16	8-element vectors (256-bit) or 8×fp16
16	AVX512	16-element vectors (512-bit)

Example Layouts:

For a 3D tensor with shape (c, h, w) and elempack=4:

Physical channels stored: c/4
Each physical channel contains: 4 × h × w elements
Memory layout: [c0,c1,c2,c3][c4,c5,c6,c7]... for each spatial position

Channel Step Alignment

The cstep field represents the step size between channels, always 16-byte aligned:

This alignment ensures efficient SIMD access patterns.

Sources: src/mat.h312-322 src/mat.cpp222-825

This completes the C++ API Reference. For implementation details of specific layer types, see the layer source files in src/layer/ and src/layer/vulkan/ For advanced topics like custom layer development, refer to the ncnn wiki and examples.

C++ API Reference

Relevant source files

Core API Classes

The ncnn C++ API consists of several key classes that work together to provide neural network inference capabilities:

Sources: src/net.h27-164 src/net.h167-244 src/mat.h50-336 src/mat.h341-462 src/layer.h20-141

ncnn::Net Class

Class Definition

Loading Models

The Net class provides multiple methods for loading model parameters and weights. Models in ncnn consist of two files:

.param or .param.bin: Network structure (text or binary format)
.bin: Layer weights

From File Paths:

Method	Purpose	Source
`load_param(const char* protopath)`	Load text parameter file	src/net.h72
`load_param_bin(const char* protopath)`	Load binary parameter file	src/net.h84
`load_model(const char* modelpath)`	Load weight binary file	src/net.h92

From Memory:

Method	Purpose	Source
`load_param(const unsigned char* mem)`	Load from 32-bit aligned memory, returns bytes consumed	src/net.h101
`load_model(const unsigned char* mem)`	Reference weights from memory (zero-copy), returns bytes consumed	src/net.h108

From DataReader:

Method	Purpose	Source
`load_param(const DataReader& dr)`	Load text parameters via DataReader	src/net.h60
`load_param_bin(const DataReader& dr)`	Load binary parameters via DataReader	src/net.h63
`load_model(const DataReader& dr)`	Load weights via DataReader	src/net.h65

Android Asset Manager (when NCNN_PLATFORM_API=ON):

Method	Purpose	Source
`load_param(AAssetManager* mgr, const char* assetpath)`	Load from Android asset	src/net.h115
`load_param_bin(AAssetManager* mgr, const char* assetpath)`	Load binary from Android asset	src/net.h119
`load_model(AAssetManager* mgr, const char* assetpath)`	Load weights from Android asset	src/net.h123

Configuration and Initialization

The Net class has a public opt member of type Option that should be configured before loading models:

Vulkan Device Management

When NCNN_VULKAN is enabled:

Method	Purpose	Source
`set_vulkan_device(int device_index)`	Set GPU by index	src/net.h41
`set_vulkan_device(const VulkanDevice* vkdev)`	Set GPU by device handle (no ownership transfer)	src/net.h44
`vulkan_device()`	Get current VulkanDevice	src/net.h46

Custom Layer Registration

Method	Purpose	Source
`register_custom_layer(const char* type, layer_creator_func, layer_destroyer_func, userdata)`	Register or override layer by name	src/net.h52
`register_custom_layer(int index, layer_creator_func, layer_destroyer_func, userdata)`	Register or override layer by type index	src/net.h57

Inference Creation

Method	Purpose	Source
`create_extractor()`	Construct an Extractor for inference	src/net.h131

Each call to create_extractor() returns a new, independent inference session that can run in parallel with other extractors.

Network Inspection

Method	Purpose	Source
`input_indexes()`	Get vector of input blob indices	src/net.h134
`output_indexes()`	Get vector of output blob indices	src/net.h135
`input_names()`	Get vector of input blob names (when `NCNN_STRING=ON`)	src/net.h137
`output_names()`	Get vector of output blob names (when `NCNN_STRING=ON`)	src/net.h138
`blobs()`	Get vector of Blob objects	src/net.h141
`layers()`	Get vector of Layer pointers	src/net.h142

Cleanup

Method	Purpose	Source
`clear()`	Unload network structure and weight data	src/net.h128

The destructor automatically calls clear(), so explicit cleanup is optional.

Sources: src/net.h27-164 src/c_api.cpp1-1095

ncnn::Extractor Class

Class Definition and Lifecycle

Constructor and Copy:

Method	Purpose	Source
`Extractor(const Extractor&)`	Copy constructor	src/net.h173
`operator=(const Extractor&)`	Assignment operator	src/net.h176
`~Extractor()`	Destructor	src/net.h170

Extractors are copyable but not recommended to copy due to internal allocator state.

Configuration Methods

Method	Purpose	Source
`clear()`	Clear blob mats and allocators	src/net.h179
`set_light_mode(bool enable)`	Enable light mode (intermediate blob recycling), enabled by default	src/net.h184
`set_blob_allocator(Allocator*)`	Set blob memory allocator	src/net.h187
`set_workspace_allocator(Allocator*)`	Set workspace memory allocator	src/net.h190

Vulkan-specific configuration (when NCNN_VULKAN=ON):

Method	Purpose	Source
`set_blob_vkallocator(VkAllocator*)`	Set GPU blob allocator	src/net.h193
`set_workspace_vkallocator(VkAllocator*)`	Set GPU workspace allocator	src/net.h195
`set_staging_vkallocator(VkAllocator*)`	Set GPU staging allocator for CPU-GPU transfers	src/net.h197

Input and Output Methods

CPU Tensors:

Method	Purpose	Source
`input(const char* blob_name, const Mat& in)`	Set input by blob name (when `NCNN_STRING=ON`)	src/net.h203
`input(int blob_index, const Mat& in)`	Set input by blob index	src/net.h214
`extract(const char* blob_name, Mat& feat, int type=0)`	Get output by blob name (when `NCNN_STRING=ON`)	src/net.h209
`extract(int blob_index, Mat& feat, int type=0)`	Get output by blob index	src/net.h220

The type parameter controls conversion behavior:

type = 0: Default, convert fp16/bf16 to fp32 and unpack if needed
type = 1: No conversion, return blob in its native format

GPU Tensors (when NCNN_VULKAN=ON):

Method	Purpose	Source
`input(const char* blob_name, const VkMat& in)`	Set GPU input by name	src/net.h224
`input(int blob_index, const VkMat& in)`	Set GPU input by index	src/net.h231
`extract(const char* blob_name, VkMat& feat, VkCompute&, int type=0)`	Get GPU output by name	src/net.h227
`extract(int blob_index, VkMat& feat, VkCompute&, int type=0)`	Get GPU output by index	src/net.h234
`input(const char* blob_name, const VkImageMat& in)`	Set GPU image input by name	src/net.h238
`input(int blob_index, const VkImageMat& in)`	Set GPU image input by index	src/net.h241

Sources: src/net.h167-244 src/c_api.cpp1-1095

ncnn::Mat Class

The ncnn::Mat class is the primary CPU tensor data structure in ncnn. It supports 1D, 2D, 3D, and 4D tensors with flexible element packing for SIMD optimization.

Memory Layout and Structure

Key Fields:

Field	Type	Description	Source
`data`	`void*`	Pointer to tensor data	src/mat.h305
`refcount`	`int*`	Reference counter, NULL for external data	src/mat.h309
`elemsize`	`size_t`	Element size: 4 (fp32/int32), 2 (fp16), 1 (int8), 0 (empty)	src/mat.h316
`elempack`	`int`	Packed count: 1 (scalar), 4 (SSE/NEON), 8 (AVX/fp16), 16 (AVX512)	src/mat.h322
`dims`	`int`	Dimension rank: 1 (vec), 2 (image), 3 (feature), 4 (volume)	src/mat.h328
`w, h, d, c`	`int`	Width, height, depth, channels	src/mat.h330-333
`cstep`	`size_t`	Channel step size, 16-byte aligned	src/mat.h335
`allocator`	`Allocator*`	Memory allocator	src/mat.h325

Constructors and Creation

Empty and Dimension Constructors:

Constructor	Purpose	Source
`Mat()`	Empty constructor	src/mat.h54
`Mat(int w, size_t elemsize, Allocator*)`	1D vector	src/mat.h56
`Mat(int w, int h, size_t elemsize, Allocator*)`	2D image	src/mat.h58
`Mat(int w, int h, int c, size_t elemsize, Allocator*)`	3D feature map	src/mat.h60
`Mat(int w, int h, int d, int c, size_t elemsize, Allocator*)`	4D volume	src/mat.h62

Packed Constructors:

Constructor	Purpose	Source
`Mat(int w, size_t elemsize, int elempack, Allocator*)`	Packed 1D	src/mat.h64
`Mat(int w, int h, size_t elemsize, int elempack, Allocator*)`	Packed 2D	src/mat.h66
`Mat(int w, int h, int c, size_t elemsize, int elempack, Allocator*)`	Packed 3D	src/mat.h68
`Mat(int w, int h, int d, int c, size_t elemsize, int elempack, Allocator*)`	Packed 4D	src/mat.h70

External Data Constructors:

External constructors wrap existing memory without copying. Useful for zero-copy integration.

Constructor	Purpose	Source
`Mat(int w, void* data, size_t elemsize, Allocator*)`	External 1D	src/mat.h74
`Mat(int w, int h, void* data, size_t elemsize, Allocator*)`	External 2D	src/mat.h76
`Mat(int w, int h, int c, void* data, size_t elemsize, Allocator*)`	External 3D	src/mat.h78
`Mat(int w, int h, int d, int c, void* data, size_t elemsize, Allocator*)`	External 4D	src/mat.h80

Memory Management Methods

Method	Purpose	Source
`create(int w, size_t elemsize, Allocator*)`	Allocate 1D	src/mat.h149
`create(int w, int h, size_t elemsize, Allocator*)`	Allocate 2D	src/mat.h151
`create(int w, int h, int c, size_t elemsize, Allocator*)`	Allocate 3D	src/mat.h153
`create(int w, int h, int d, int c, size_t elemsize, Allocator*)`	Allocate 4D	src/mat.h155
`create_like(const Mat& m, Allocator*)`	Allocate with same shape	src/mat.h165
`clone(Allocator*)`	Deep copy	src/mat.h137
`clone_from(const Mat& mat, Allocator*)`	Deep copy from another Mat, inplace	src/mat.h139
`addref()`	Increment reference count	src/mat.h173
`release()`	Decrement reference count, free if zero	src/mat.h175

Data Access Methods

Method	Purpose	Source
`empty()`	Check if Mat is empty	src/mat.h177
`total()`	Total element count	src/mat.h178
`elembits()`	Bits per element	src/mat.h181
`shape()`	Get shape-only Mat	src/mat.h184
`channel(int c)`	Get channel reference	src/mat.h187-188
`depth(int z)`	Get depth slice reference	src/mat.h189-190
`row(int y)`	Get row pointer	src/mat.h191-196
`channel_range(int c, int channels)`	Get channel range reference	src/mat.h199-200
`depth_range(int z, int depths)`	Get depth range reference	src/mat.h201-202
`row_range(int y, int rows)`	Get row range reference	src/mat.h203-204
`range(int x, int n)`	Get element range reference	src/mat.h205-206
`operator T*()`	Cast to typed pointer	src/mat.h210-212
`operator[](size_t i)`	Access element at index	src/mat.h215-216

Reshape Methods

Method	Purpose	Source
`reshape(int w, Allocator*)`	Reshape to 1D	src/mat.h141
`reshape(int w, int h, Allocator*)`	Reshape to 2D	src/mat.h143
`reshape(int w, int h, int c, Allocator*)`	Reshape to 3D	src/mat.h145
`reshape(int w, int h, int d, int c, Allocator*)`	Reshape to 4D	src/mat.h147

Fill Methods

Method	Purpose	Source
`fill(float v)`	Fill with float scalar	src/mat.h94
`fill(int v)`	Fill with int scalar	src/mat.h95
`fill<T>(T v)`	Template fill	src/mat.h135

Platform-specific SIMD fill methods are also available for NEON, SSE, AVX, etc.

Pixel Conversion Methods (when `NCNN_PIXEL=ON`)

The Mat class provides convenient methods for converting between image pixel formats and neural network input format.

Pixel Format Enum:

Format	Value	Description	Source
`PIXEL_RGB`	1	RGB color	src/mat.h225
`PIXEL_BGR`	2	BGR color	src/mat.h226
`PIXEL_GRAY`	3	Grayscale	src/mat.h227
`PIXEL_RGBA`	4	RGB with alpha	src/mat.h228
`PIXEL_BGRA`	5	BGR with alpha	src/mat.h229

Static Factory Methods:

Method	Purpose	Source
`from_pixels(const unsigned char* pixels, int type, int w, int h, Allocator*)`	Create from pixel data	src/mat.h257
`from_pixels(const unsigned char* pixels, int type, int w, int h, int stride, Allocator*)`	Create with stride parameter	src/mat.h259
`from_pixels_resize(const unsigned char* pixels, int type, int w, int h, int target_width, int target_height, Allocator*)`	Create and resize	src/mat.h261
`from_pixels_roi(const unsigned char* pixels, int type, int w, int h, int roix, int roiy, int roiw, int roih, Allocator*)`	Create from ROI	src/mat.h265
`from_pixels_roi_resize(...)`	Create from ROI and resize	src/mat.h269

Export Methods:

Method	Purpose	Source
`to_pixels(unsigned char* pixels, int type)`	Export to pixel data	src/mat.h274
`to_pixels(unsigned char* pixels, int type, int stride)`	Export with stride	src/mat.h276
`to_pixels_resize(unsigned char* pixels, int type, int target_width, int target_height)`	Export and resize	src/mat.h278

Android Integration (when NCNN_PLATFORM_API=ON and __ANDROID_API__ >= 9):

Method	Purpose	Source
`from_android_bitmap(JNIEnv* env, jobject bitmap, int type_to, Allocator*)`	Create from Android Bitmap	src/mat.h285
`from_android_bitmap_resize(JNIEnv* env, jobject bitmap, int type_to, int target_width, int target_height, Allocator*)`	Create from Bitmap and resize	src/mat.h287
`to_android_bitmap(JNIEnv* env, jobject bitmap, int type_from)`	Export to Android Bitmap	src/mat.h293

Preprocessing:

Method	Purpose	Source
`substract_mean_normalize(const float* mean_vals, const float* norm_vals)`	Subtract mean and normalize, pass 0 to skip	src/mat.h299

Sources: src/mat.h50-336 src/mat.cpp19-820

ncnn::VkMat and ncnn::VkImageMat Classes

When NCNN_VULKAN is enabled, ncnn provides GPU tensor classes for Vulkan compute.

ncnn::VkMat - GPU Buffer Tensor

VkMat is a GPU buffer-based tensor similar to Mat but allocated in GPU device memory.

Key Fields:

Field	Type	Description	Source
`data`	`VkBufferMemory*`	Device buffer memory	src/mat.h431
`refcount`	`int*`	Reference counter	src/mat.h435
`elemsize`	`size_t`	Element size (4, 2, 1, 0)	src/mat.h442
`elempack`	`int`	Packed element count	src/mat.h448
`allocator`	`VkAllocator*`	Vulkan allocator	src/mat.h451
`dims, w, h, d, c`	`int`	Dimensions	src/mat.h454-459
`cstep`	`size_t`	Channel step	src/mat.h461

Constructors:

Constructor	Purpose	Source
`VkMat()`	Empty	src/mat.h345
`VkMat(int w, size_t elemsize, VkAllocator*)`	1D	src/mat.h347
`VkMat(int w, int h, size_t elemsize, VkAllocator*)`	2D	src/mat.h349
`VkMat(int w, int h, int c, size_t elemsize, VkAllocator*)`	3D	src/mat.h351
`VkMat(int w, int h, int d, int c, size_t elemsize, VkAllocator*)`	4D	src/mat.h353

Methods:

Method	Purpose	Source
`create(int w, size_t elemsize, VkAllocator*)`	Allocate 1D	src/mat.h385
`create(int w, int h, size_t elemsize, VkAllocator*)`	Allocate 2D	src/mat.h387
`create(int w, int h, int c, size_t elemsize, VkAllocator*)`	Allocate 3D	src/mat.h389
`create_like(const Mat& m, VkAllocator*)`	Allocate like Mat	src/mat.h401
`create_like(const VkMat& m, VkAllocator*)`	Allocate like VkMat	src/mat.h403
`mapped()`	Get mapped CPU Mat	src/mat.h408
`mapped_ptr()`	Get mapped pointer	src/mat.h409
`buffer()`	Get VkBuffer handle	src/mat.h426
`buffer_offset()`	Get buffer offset	src/mat.h427
`buffer_capacity()`	Get buffer capacity	src/mat.h428

ncnn::VkImageMat - GPU Image Tensor

VkImageMat is a GPU image-based tensor optimized for texture sampling operations.

Key Fields:

Field	Type	Description	Source
`data`	`VkImageMemory*`	Device image memory	src/mat.h548
`refcount`	`int*`	Reference counter	src/mat.h552
`elemsize`	`size_t`	Element size	src/mat.h559
`elempack`	`int`	Packed element count	src/mat.h565
`allocator`	`VkAllocator*`	Vulkan allocator	src/mat.h568
`dims, w, h, d, c`	`int`	Dimensions	src/mat.h571-576

Constructors and Methods:

Similar to VkMat, with image-specific allocation.

Method	Purpose	Source
`create(int w, size_t elemsize, VkAllocator*)`	Allocate 1D image	src/mat.h508
`create(int w, int h, size_t elemsize, VkAllocator*)`	Allocate 2D image	src/mat.h510
`create(int w, int h, int c, size_t elemsize, VkAllocator*)`	Allocate 3D image	src/mat.h512
`mapped()`	Get mapped CPU Mat	src/mat.h524
`mapped_ptr()`	Get mapped pointer	src/mat.h525
`image()`	Get VkImage handle	src/mat.h534
`imageview()`	Get VkImageView handle	src/mat.h535

Sources: src/mat.h341-578 src/mat.cpp822-2500

ncnn::Option Class

The Option class controls runtime configuration for network inference. It affects threading, memory allocation, precision modes, and optimization strategies.

Key Configuration Fields

Threading Configuration:

Field	Type	Default	Description	Source
`num_threads`	`int`	CPU count	Number of threads for inference	References in src/c_api.cpp152-160
`use_local_pool_allocator`	`bool`	true	Use thread-local memory pool	References in src/c_api.cpp182-185

Precision Modes:

Field	Type	Default	Description	Source
`use_fp16_packed`	`bool`	false	Enable fp16 packing	References in src/c_api.cpp202-205
`use_fp16_storage`	`bool`	false	Store weights in fp16	References in src/c_api.cpp207-210
`use_fp16_arithmetic`	`bool`	false	Use fp16 arithmetic	References in src/c_api.cpp212-215
`use_int8_packed`	`bool`	false	Enable int8 packing	References in src/c_api.cpp217-220
`use_int8_storage`	`bool`	false	Store weights in int8	References in src/c_api.cpp222-225
`use_int8_arithmetic`	`bool`	false	Use int8 arithmetic	References in src/c_api.cpp227-230
`use_bf16_packed`	`bool`	false	Enable bf16 packing	References in src/c_api.cpp232-235
`use_bf16_storage`	`bool`	false	Store weights in bf16	References in src/c_api.cpp237-240

Optimization Strategies:

Field	Type	Default	Description	Source
`use_packing_layout`	`bool`	true	Use SIMD-packed layouts	References in src/c_api.cpp197-200
`use_winograd_convolution`	`bool`	true	Use Winograd fast convolution	References in src/c_api.cpp187-190
`use_sgemm_convolution`	`bool`	true	Use GEMM-based convolution	References in src/c_api.cpp192-195

Vulkan GPU Options (when NCNN_VULKAN=ON):

Field	Type	Default	Description	Source
`use_vulkan_compute`	`bool`	false	Enable Vulkan GPU compute	References in src/c_api.cpp172-180
`use_shader_local_memory`	`bool`	true	Use shader local memory	References in src/c_api.cpp242-250
`use_cooperative_matrix`	`bool`	false	Use cooperative matrix operations	References in src/c_api.cpp252-260

Memory Allocators:

Field	Type	Description	Source
`blob_allocator`	`Allocator*`	Allocator for blob storage	References in src/c_api.cpp162-165
`workspace_allocator`	`Allocator*`	Allocator for temporary workspace	References in src/c_api.cpp167-170
`blob_vkallocator`	`VkAllocator*`	GPU blob allocator	Vulkan-specific
`workspace_vkallocator`	`VkAllocator*`	GPU workspace allocator	Vulkan-specific
`staging_vkallocator`	`VkAllocator*`	GPU staging allocator for transfers	Vulkan-specific

Light Mode:

Field	Type	Default	Description
`lightmode`	`bool`	true	Recycle intermediate blobs to reduce memory usage

Sources: src/c_api.cpp141-350 src/option.h

ncnn::Layer Class

The Layer class is the base class for all neural network layer implementations in ncnn. Custom layers can inherit from this class.

Layer Lifecycle and Hooks

Virtual Methods

Initialization and Cleanup:

Method	Purpose	Return	Source
`load_param(const ParamDict& pd)`	Load layer-specific parameters from parsed dict	0 on success	src/layer.h30
`load_model(const ModelBin& mb)`	Load layer-specific weight data	0 on success	src/layer.h34
`create_pipeline(const Option& opt)`	Layer implementation setup	0 on success	src/layer.h38
`destroy_pipeline(const Option& opt)`	Layer implementation cleanup	0 on success	src/layer.h42

CPU Forward Methods:

Method	Purpose	Source
`forward(const std::vector<Mat>&, std::vector<Mat>&, const Option&)`	Forward pass with multiple blobs	src/layer.h64
`forward(const Mat&, Mat&, const Option&)`	Forward pass with single blob	src/layer.h80
`forward_inplace(std::vector<Mat>&, const Option&)`	Inplace forward with multiple blobs	src/layer.h92
`forward_inplace(Mat&, const Option&)`	Inplace forward with single blob	src/layer.h97

Vulkan Forward Methods (when NCNN_VULKAN=ON):

Method	Purpose	Source
`upload_model(VkTransfer&, const Option&)`	Upload weights to GPU	src/layer.h103
`forward(const std::vector<VkMat>&, std::vector<VkMat>&, VkCompute&, const Option&)`	GPU forward with multiple blobs	src/layer.h108
`forward(const VkMat&, VkMat&, VkCompute&, const Option&)`	GPU forward with single blob	src/layer.h122
`forward_inplace(std::vector<VkMat>&, VkCompute&, const Option&)`	GPU inplace forward multiple	src/layer.h132
`forward_inplace(VkMat&, VkCompute&, const Option&)`	GPU inplace forward single	src/layer.h137

Layer Capability Flags

Flag	Type	Description	Source
`one_blob_only`	`bool`	Layer has exactly one input and one output	src/layer.h46
`support_inplace`	`bool`	Supports inplace operation (input = output)	src/layer.h49
`support_vulkan`	`bool`	Has Vulkan GPU implementation	src/layer.h52
`support_packing`	`bool`	Accepts packed storage (elempack > 1)	src/layer.h55
`support_bf16_storage`	`bool`	Accepts bf16 weights	src/layer.h58
`support_fp16_storage`	`bool`	Accepts fp16 weights	src/layer.h61
`support_int8_storage`	`bool`	Accepts int8 weights	src/layer.h64
`support_tensor_storage`	`bool`	Uses shader tensor storage	src/layer.h67
`support_vulkan_packing`	`bool`	Vulkan implementation supports packing	Referenced in context
`support_any_packing`	`bool`	CPU implementation supports any packing	Referenced in context
`support_vulkan_any_packing`	`bool`	Vulkan implementation supports any packing	Referenced in context

Layer Type System

Layers are identified by both string names and integer type indices.

Layer Registry Functions:

Function	Purpose	Source
`layer_to_index(const char* type)`	Convert layer name to type index	src/layer.cpp148-155
`index_to_layer(int index)`	Convert type index to layer name	src/layer.cpp157-163
`create_layer(const char* type)`	Create layer by name	src/layer.cpp165-182
`create_layer(int index)`	Create layer by type index	src/layer.cpp184-296

Custom Layer Registration

Custom layers can be registered with the Net class:

Sources: src/layer.h20-141 src/layer.cpp14-296

Memory Allocators

ncnn uses an abstract allocator interface to manage memory allocation for tensors and temporary workspace.

Allocator Hierarchy

ncnn::Allocator (Abstract Base)

The base allocator interface.

Method	Purpose	Source
`fastMalloc(size_t size)`	Allocate memory	src/allocator.h
`fastFree(void* ptr)`	Free memory	src/allocator.h

ncnn::PoolAllocator

Thread-safe pooled allocator with budget-based memory recycling.

Features:

Maintains a pool of previously allocated memory blocks
Reuses blocks to minimize allocation overhead
Thread-safe via mutex locking
Budget parameter controls memory retention

Methods:

Method	Purpose	Source
`PoolAllocator()`	Constructor	References in src/c_api.cpp114-121
`set_size_compare_ratio(float ratio)`	Set size matching tolerance for reuse	References in allocator context
`clear()`	Clear all pooled memory	References in allocator context

ncnn::UnlockedPoolAllocator

Non-thread-safe pooled allocator for single-threaded or thread-local usage.

Features:

Same pooling strategy as PoolAllocator
No mutex locking overhead
Suitable for thread-local workspace

Methods:

Method	Purpose	Source
`UnlockedPoolAllocator()`	Constructor	References in src/c_api.cpp123-130
`set_size_compare_ratio(float ratio)`	Set size matching tolerance	References in allocator context
`clear()`	Clear all pooled memory	References in allocator context

Vulkan Allocators (when `NCNN_VULKAN=ON`)

ncnn::VkAllocator: Base class for Vulkan memory allocators.

ncnn::VkBlobAllocator: Allocates device-local memory for blob storage.

ncnn::VkStagingAllocator: Allocates host-visible memory for CPU-GPU data transfers.

ncnn::VkWeightAllocator: Specialized allocator for layer weights with staging support.

Sources: src/c_api.cpp47-139 src/allocator.h src/allocator.cpp

Type System and Data Layout

Element Size and Type Encoding

elemsize	Type	Description
4	fp32 / int32	32-bit floating point or integer
2	fp16	16-bit floating point (IEEE 754 half precision)
1	int8 / uint8	8-bit integer or unsigned integer
0	empty	Empty tensor

Element Packing for SIMD

The elempack field indicates how many elements are packed together for SIMD processing:

elempack	Target ISA	Description
1	Scalar	No packing, scalar processing
4	SSE2, NEON	4-element vectors (128-bit)
8	AVX, FP16	8-element vectors (256-bit) or 8×fp16
16	AVX512	16-element vectors (512-bit)

Example Layouts:

For a 3D tensor with shape (c, h, w) and elempack=4:

Physical channels stored: c/4
Each physical channel contains: 4 × h × w elements
Memory layout: [c0,c1,c2,c3][c4,c5,c6,c7]... for each spatial position

Channel Step Alignment

The cstep field represents the step size between channels, always 16-byte aligned:

This alignment ensures efficient SIMD access patterns.

Sources: src/mat.h312-322 src/mat.cpp222-825

C++ API Reference

Core API Classes

ncnn::Net Class

Class Definition

Loading Models

Configuration and Initialization

Vulkan Device Management

Custom Layer Registration

Inference Creation

Network Inspection

Cleanup

ncnn::Extractor Class

Class Definition and Lifecycle

Configuration Methods

Input and Output Methods

ncnn::Mat Class

Memory Layout and Structure

Constructors and Creation

Memory Management Methods

Data Access Methods

Reshape Methods

Fill Methods

Pixel Conversion Methods (when NCNN_PIXEL=ON)

ncnn::VkMat and ncnn::VkImageMat Classes

ncnn::VkMat - GPU Buffer Tensor

ncnn::VkImageMat - GPU Image Tensor

ncnn::Option Class

Key Configuration Fields

ncnn::Layer Class

Layer Lifecycle and Hooks

Virtual Methods

Layer Capability Flags

Layer Type System

Custom Layer Registration

Memory Allocators

Allocator Hierarchy

ncnn::Allocator (Abstract Base)

ncnn::PoolAllocator

ncnn::UnlockedPoolAllocator

Vulkan Allocators (when NCNN_VULKAN=ON)

Type System and Data Layout

Element Size and Type Encoding

Element Packing for SIMD

Channel Step Alignment

On this page

C++ API Reference

Core API Classes

ncnn::Net Class

Class Definition

Loading Models

Configuration and Initialization

Vulkan Device Management

Custom Layer Registration

Inference Creation

Network Inspection

Cleanup

ncnn::Extractor Class

Class Definition and Lifecycle

Configuration Methods

Input and Output Methods

ncnn::Mat Class

Memory Layout and Structure

Constructors and Creation

Memory Management Methods

Data Access Methods

Reshape Methods

Fill Methods

Pixel Conversion Methods (when NCNN_PIXEL=ON)

ncnn::VkMat and ncnn::VkImageMat Classes

ncnn::VkMat - GPU Buffer Tensor

ncnn::VkImageMat - GPU Image Tensor

ncnn::Option Class

Key Configuration Fields

ncnn::Layer Class

Layer Lifecycle and Hooks

Virtual Methods

Layer Capability Flags

Layer Type System

Custom Layer Registration

Memory Allocators

Pixel Conversion Methods (when `NCNN_PIXEL=ON`)

Vulkan Allocators (when `NCNN_VULKAN=ON`)

Pixel Conversion Methods (when `NCNN_PIXEL=ON`)

Vulkan Allocators (when `NCNN_VULKAN=ON`)