This document provides comprehensive reference documentation for the ncnn C++ API. It covers the primary classes and methods that application developers use to load models, configure inference, and execute neural networks.
For information about the C API wrapper, see C API and Python Bindings. For details on how ncnn loads and optimizes models internally, see Network Loading and Inference Pipeline. For platform-specific build configuration, see Platform Support and Build System.
The ncnn C++ API consists of several key classes that work together to provide neural network inference capabilities:
Sources: src/net.h27-164 src/net.h167-244 src/mat.h50-336 src/mat.h341-462 src/layer.h20-141
The ncnn::Net class is the primary container for neural network models. It manages the network structure (layers and blobs), provides methods for loading model files, and creates Extractor objects for inference.
The Net class provides multiple methods for loading model parameters and weights. Models in ncnn consist of two files:
.param or .param.bin: Network structure (text or binary format).bin: Layer weightsFrom File Paths:
| Method | Purpose | Source |
|---|---|---|
load_param(const char* protopath) | Load text parameter file | src/net.h72 |
load_param_bin(const char* protopath) | Load binary parameter file | src/net.h84 |
load_model(const char* modelpath) | Load weight binary file | src/net.h92 |
From Memory:
| Method | Purpose | Source |
|---|---|---|
load_param(const unsigned char* mem) | Load from 32-bit aligned memory, returns bytes consumed | src/net.h101 |
load_model(const unsigned char* mem) | Reference weights from memory (zero-copy), returns bytes consumed | src/net.h108 |
From DataReader:
| Method | Purpose | Source |
|---|---|---|
load_param(const DataReader& dr) | Load text parameters via DataReader | src/net.h60 |
load_param_bin(const DataReader& dr) | Load binary parameters via DataReader | src/net.h63 |
load_model(const DataReader& dr) | Load weights via DataReader | src/net.h65 |
Android Asset Manager (when NCNN_PLATFORM_API=ON):
| Method | Purpose | Source |
|---|---|---|
load_param(AAssetManager* mgr, const char* assetpath) | Load from Android asset | src/net.h115 |
load_param_bin(AAssetManager* mgr, const char* assetpath) | Load binary from Android asset | src/net.h119 |
load_model(AAssetManager* mgr, const char* assetpath) | Load weights from Android asset | src/net.h123 |
The Net class has a public opt member of type Option that should be configured before loading models:
When NCNN_VULKAN is enabled:
| Method | Purpose | Source |
|---|---|---|
set_vulkan_device(int device_index) | Set GPU by index | src/net.h41 |
set_vulkan_device(const VulkanDevice* vkdev) | Set GPU by device handle (no ownership transfer) | src/net.h44 |
vulkan_device() | Get current VulkanDevice | src/net.h46 |
| Method | Purpose | Source |
|---|---|---|
register_custom_layer(const char* type, layer_creator_func, layer_destroyer_func, userdata) | Register or override layer by name | src/net.h52 |
register_custom_layer(int index, layer_creator_func, layer_destroyer_func, userdata) | Register or override layer by type index | src/net.h57 |
| Method | Purpose | Source |
|---|---|---|
create_extractor() | Construct an Extractor for inference | src/net.h131 |
Each call to create_extractor() returns a new, independent inference session that can run in parallel with other extractors.
| Method | Purpose | Source |
|---|---|---|
input_indexes() | Get vector of input blob indices | src/net.h134 |
output_indexes() | Get vector of output blob indices | src/net.h135 |
input_names() | Get vector of input blob names (when NCNN_STRING=ON) | src/net.h137 |
output_names() | Get vector of output blob names (when NCNN_STRING=ON) | src/net.h138 |
blobs() | Get vector of Blob objects | src/net.h141 |
layers() | Get vector of Layer pointers | src/net.h142 |
| Method | Purpose | Source |
|---|---|---|
clear() | Unload network structure and weight data | src/net.h128 |
The destructor automatically calls clear(), so explicit cleanup is optional.
Sources: src/net.h27-164 src/c_api.cpp1-1095
The ncnn::Extractor class represents an inference session. It holds temporary blob storage, allocators, and provides methods to set inputs and extract outputs. Multiple extractors can be created from a single Net and used concurrently (each in its own thread).
Constructor and Copy:
| Method | Purpose | Source |
|---|---|---|
Extractor(const Extractor&) | Copy constructor | src/net.h173 |
operator=(const Extractor&) | Assignment operator | src/net.h176 |
~Extractor() | Destructor | src/net.h170 |
Extractors are copyable but not recommended to copy due to internal allocator state.
| Method | Purpose | Source |
|---|---|---|
clear() | Clear blob mats and allocators | src/net.h179 |
set_light_mode(bool enable) | Enable light mode (intermediate blob recycling), enabled by default | src/net.h184 |
set_blob_allocator(Allocator*) | Set blob memory allocator | src/net.h187 |
set_workspace_allocator(Allocator*) | Set workspace memory allocator | src/net.h190 |
Vulkan-specific configuration (when NCNN_VULKAN=ON):
| Method | Purpose | Source |
|---|---|---|
set_blob_vkallocator(VkAllocator*) | Set GPU blob allocator | src/net.h193 |
set_workspace_vkallocator(VkAllocator*) | Set GPU workspace allocator | src/net.h195 |
set_staging_vkallocator(VkAllocator*) | Set GPU staging allocator for CPU-GPU transfers | src/net.h197 |
CPU Tensors:
| Method | Purpose | Source |
|---|---|---|
input(const char* blob_name, const Mat& in) | Set input by blob name (when NCNN_STRING=ON) | src/net.h203 |
input(int blob_index, const Mat& in) | Set input by blob index | src/net.h214 |
extract(const char* blob_name, Mat& feat, int type=0) | Get output by blob name (when NCNN_STRING=ON) | src/net.h209 |
extract(int blob_index, Mat& feat, int type=0) | Get output by blob index | src/net.h220 |
The type parameter controls conversion behavior:
type = 0: Default, convert fp16/bf16 to fp32 and unpack if neededtype = 1: No conversion, return blob in its native formatGPU Tensors (when NCNN_VULKAN=ON):
| Method | Purpose | Source |
|---|---|---|
input(const char* blob_name, const VkMat& in) | Set GPU input by name | src/net.h224 |
input(int blob_index, const VkMat& in) | Set GPU input by index | src/net.h231 |
extract(const char* blob_name, VkMat& feat, VkCompute&, int type=0) | Get GPU output by name | src/net.h227 |
extract(int blob_index, VkMat& feat, VkCompute&, int type=0) | Get GPU output by index | src/net.h234 |
input(const char* blob_name, const VkImageMat& in) | Set GPU image input by name | src/net.h238 |
input(int blob_index, const VkImageMat& in) | Set GPU image input by index | src/net.h241 |
Sources: src/net.h167-244 src/c_api.cpp1-1095
The ncnn::Mat class is the primary CPU tensor data structure in ncnn. It supports 1D, 2D, 3D, and 4D tensors with flexible element packing for SIMD optimization.
Key Fields:
| Field | Type | Description | Source |
|---|---|---|---|
data | void* | Pointer to tensor data | src/mat.h305 |
refcount | int* | Reference counter, NULL for external data | src/mat.h309 |
elemsize | size_t | Element size: 4 (fp32/int32), 2 (fp16), 1 (int8), 0 (empty) | src/mat.h316 |
elempack | int | Packed count: 1 (scalar), 4 (SSE/NEON), 8 (AVX/fp16), 16 (AVX512) | src/mat.h322 |
dims | int | Dimension rank: 1 (vec), 2 (image), 3 (feature), 4 (volume) | src/mat.h328 |
w, h, d, c | int | Width, height, depth, channels | src/mat.h330-333 |
cstep | size_t | Channel step size, 16-byte aligned | src/mat.h335 |
allocator | Allocator* | Memory allocator | src/mat.h325 |
Empty and Dimension Constructors:
| Constructor | Purpose | Source |
|---|---|---|
Mat() | Empty constructor | src/mat.h54 |
Mat(int w, size_t elemsize, Allocator*) | 1D vector | src/mat.h56 |
Mat(int w, int h, size_t elemsize, Allocator*) | 2D image | src/mat.h58 |
Mat(int w, int h, int c, size_t elemsize, Allocator*) | 3D feature map | src/mat.h60 |
Mat(int w, int h, int d, int c, size_t elemsize, Allocator*) | 4D volume | src/mat.h62 |
Packed Constructors:
| Constructor | Purpose | Source |
|---|---|---|
Mat(int w, size_t elemsize, int elempack, Allocator*) | Packed 1D | src/mat.h64 |
Mat(int w, int h, size_t elemsize, int elempack, Allocator*) | Packed 2D | src/mat.h66 |
Mat(int w, int h, int c, size_t elemsize, int elempack, Allocator*) | Packed 3D | src/mat.h68 |
Mat(int w, int h, int d, int c, size_t elemsize, int elempack, Allocator*) | Packed 4D | src/mat.h70 |
External Data Constructors:
External constructors wrap existing memory without copying. Useful for zero-copy integration.
| Constructor | Purpose | Source |
|---|---|---|
Mat(int w, void* data, size_t elemsize, Allocator*) | External 1D | src/mat.h74 |
Mat(int w, int h, void* data, size_t elemsize, Allocator*) | External 2D | src/mat.h76 |
Mat(int w, int h, int c, void* data, size_t elemsize, Allocator*) | External 3D | src/mat.h78 |
Mat(int w, int h, int d, int c, void* data, size_t elemsize, Allocator*) | External 4D | src/mat.h80 |
| Method | Purpose | Source |
|---|---|---|
create(int w, size_t elemsize, Allocator*) | Allocate 1D | src/mat.h149 |
create(int w, int h, size_t elemsize, Allocator*) | Allocate 2D | src/mat.h151 |
create(int w, int h, int c, size_t elemsize, Allocator*) | Allocate 3D | src/mat.h153 |
create(int w, int h, int d, int c, size_t elemsize, Allocator*) | Allocate 4D | src/mat.h155 |
create_like(const Mat& m, Allocator*) | Allocate with same shape | src/mat.h165 |
clone(Allocator*) | Deep copy | src/mat.h137 |
clone_from(const Mat& mat, Allocator*) | Deep copy from another Mat, inplace | src/mat.h139 |
addref() | Increment reference count | src/mat.h173 |
release() | Decrement reference count, free if zero | src/mat.h175 |
| Method | Purpose | Source |
|---|---|---|
empty() | Check if Mat is empty | src/mat.h177 |
total() | Total element count | src/mat.h178 |
elembits() | Bits per element | src/mat.h181 |
shape() | Get shape-only Mat | src/mat.h184 |
channel(int c) | Get channel reference | src/mat.h187-188 |
depth(int z) | Get depth slice reference | src/mat.h189-190 |
row(int y) | Get row pointer | src/mat.h191-196 |
channel_range(int c, int channels) | Get channel range reference | src/mat.h199-200 |
depth_range(int z, int depths) | Get depth range reference | src/mat.h201-202 |
row_range(int y, int rows) | Get row range reference | src/mat.h203-204 |
range(int x, int n) | Get element range reference | src/mat.h205-206 |
operator T*() | Cast to typed pointer | src/mat.h210-212 |
operator[](size_t i) | Access element at index | src/mat.h215-216 |
| Method | Purpose | Source |
|---|---|---|
reshape(int w, Allocator*) | Reshape to 1D | src/mat.h141 |
reshape(int w, int h, Allocator*) | Reshape to 2D | src/mat.h143 |
reshape(int w, int h, int c, Allocator*) | Reshape to 3D | src/mat.h145 |
reshape(int w, int h, int d, int c, Allocator*) | Reshape to 4D | src/mat.h147 |
| Method | Purpose | Source |
|---|---|---|
fill(float v) | Fill with float scalar | src/mat.h94 |
fill(int v) | Fill with int scalar | src/mat.h95 |
fill<T>(T v) | Template fill | src/mat.h135 |
Platform-specific SIMD fill methods are also available for NEON, SSE, AVX, etc.
NCNN_PIXEL=ON)The Mat class provides convenient methods for converting between image pixel formats and neural network input format.
Pixel Format Enum:
| Format | Value | Description | Source |
|---|---|---|---|
PIXEL_RGB | 1 | RGB color | src/mat.h225 |
PIXEL_BGR | 2 | BGR color | src/mat.h226 |
PIXEL_GRAY | 3 | Grayscale | src/mat.h227 |
PIXEL_RGBA | 4 | RGB with alpha | src/mat.h228 |
PIXEL_BGRA | 5 | BGR with alpha | src/mat.h229 |
Static Factory Methods:
| Method | Purpose | Source |
|---|---|---|
from_pixels(const unsigned char* pixels, int type, int w, int h, Allocator*) | Create from pixel data | src/mat.h257 |
from_pixels(const unsigned char* pixels, int type, int w, int h, int stride, Allocator*) | Create with stride parameter | src/mat.h259 |
from_pixels_resize(const unsigned char* pixels, int type, int w, int h, int target_width, int target_height, Allocator*) | Create and resize | src/mat.h261 |
from_pixels_roi(const unsigned char* pixels, int type, int w, int h, int roix, int roiy, int roiw, int roih, Allocator*) | Create from ROI | src/mat.h265 |
from_pixels_roi_resize(...) | Create from ROI and resize | src/mat.h269 |
Export Methods:
| Method | Purpose | Source |
|---|---|---|
to_pixels(unsigned char* pixels, int type) | Export to pixel data | src/mat.h274 |
to_pixels(unsigned char* pixels, int type, int stride) | Export with stride | src/mat.h276 |
to_pixels_resize(unsigned char* pixels, int type, int target_width, int target_height) | Export and resize | src/mat.h278 |
Android Integration (when NCNN_PLATFORM_API=ON and __ANDROID_API__ >= 9):
| Method | Purpose | Source |
|---|---|---|
from_android_bitmap(JNIEnv* env, jobject bitmap, int type_to, Allocator*) | Create from Android Bitmap | src/mat.h285 |
from_android_bitmap_resize(JNIEnv* env, jobject bitmap, int type_to, int target_width, int target_height, Allocator*) | Create from Bitmap and resize | src/mat.h287 |
to_android_bitmap(JNIEnv* env, jobject bitmap, int type_from) | Export to Android Bitmap | src/mat.h293 |
Preprocessing:
| Method | Purpose | Source |
|---|---|---|
substract_mean_normalize(const float* mean_vals, const float* norm_vals) | Subtract mean and normalize, pass 0 to skip | src/mat.h299 |
Sources: src/mat.h50-336 src/mat.cpp19-820
When NCNN_VULKAN is enabled, ncnn provides GPU tensor classes for Vulkan compute.
VkMat is a GPU buffer-based tensor similar to Mat but allocated in GPU device memory.
Key Fields:
| Field | Type | Description | Source |
|---|---|---|---|
data | VkBufferMemory* | Device buffer memory | src/mat.h431 |
refcount | int* | Reference counter | src/mat.h435 |
elemsize | size_t | Element size (4, 2, 1, 0) | src/mat.h442 |
elempack | int | Packed element count | src/mat.h448 |
allocator | VkAllocator* | Vulkan allocator | src/mat.h451 |
dims, w, h, d, c | int | Dimensions | src/mat.h454-459 |
cstep | size_t | Channel step | src/mat.h461 |
Constructors:
| Constructor | Purpose | Source |
|---|---|---|
VkMat() | Empty | src/mat.h345 |
VkMat(int w, size_t elemsize, VkAllocator*) | 1D | src/mat.h347 |
VkMat(int w, int h, size_t elemsize, VkAllocator*) | 2D | src/mat.h349 |
VkMat(int w, int h, int c, size_t elemsize, VkAllocator*) | 3D | src/mat.h351 |
VkMat(int w, int h, int d, int c, size_t elemsize, VkAllocator*) | 4D | src/mat.h353 |
Methods:
| Method | Purpose | Source |
|---|---|---|
create(int w, size_t elemsize, VkAllocator*) | Allocate 1D | src/mat.h385 |
create(int w, int h, size_t elemsize, VkAllocator*) | Allocate 2D | src/mat.h387 |
create(int w, int h, int c, size_t elemsize, VkAllocator*) | Allocate 3D | src/mat.h389 |
create_like(const Mat& m, VkAllocator*) | Allocate like Mat | src/mat.h401 |
create_like(const VkMat& m, VkAllocator*) | Allocate like VkMat | src/mat.h403 |
mapped() | Get mapped CPU Mat | src/mat.h408 |
mapped_ptr() | Get mapped pointer | src/mat.h409 |
buffer() | Get VkBuffer handle | src/mat.h426 |
buffer_offset() | Get buffer offset | src/mat.h427 |
buffer_capacity() | Get buffer capacity | src/mat.h428 |
VkImageMat is a GPU image-based tensor optimized for texture sampling operations.
Key Fields:
| Field | Type | Description | Source |
|---|---|---|---|
data | VkImageMemory* | Device image memory | src/mat.h548 |
refcount | int* | Reference counter | src/mat.h552 |
elemsize | size_t | Element size | src/mat.h559 |
elempack | int | Packed element count | src/mat.h565 |
allocator | VkAllocator* | Vulkan allocator | src/mat.h568 |
dims, w, h, d, c | int | Dimensions | src/mat.h571-576 |
Constructors and Methods:
Similar to VkMat, with image-specific allocation.
| Method | Purpose | Source |
|---|---|---|
create(int w, size_t elemsize, VkAllocator*) | Allocate 1D image | src/mat.h508 |
create(int w, int h, size_t elemsize, VkAllocator*) | Allocate 2D image | src/mat.h510 |
create(int w, int h, int c, size_t elemsize, VkAllocator*) | Allocate 3D image | src/mat.h512 |
mapped() | Get mapped CPU Mat | src/mat.h524 |
mapped_ptr() | Get mapped pointer | src/mat.h525 |
image() | Get VkImage handle | src/mat.h534 |
imageview() | Get VkImageView handle | src/mat.h535 |
Sources: src/mat.h341-578 src/mat.cpp822-2500
The Option class controls runtime configuration for network inference. It affects threading, memory allocation, precision modes, and optimization strategies.
Threading Configuration:
| Field | Type | Default | Description | Source |
|---|---|---|---|---|
num_threads | int | CPU count | Number of threads for inference | References in src/c_api.cpp152-160 |
use_local_pool_allocator | bool | true | Use thread-local memory pool | References in src/c_api.cpp182-185 |
Precision Modes:
| Field | Type | Default | Description | Source |
|---|---|---|---|---|
use_fp16_packed | bool | false | Enable fp16 packing | References in src/c_api.cpp202-205 |
use_fp16_storage | bool | false | Store weights in fp16 | References in src/c_api.cpp207-210 |
use_fp16_arithmetic | bool | false | Use fp16 arithmetic | References in src/c_api.cpp212-215 |
use_int8_packed | bool | false | Enable int8 packing | References in src/c_api.cpp217-220 |
use_int8_storage | bool | false | Store weights in int8 | References in src/c_api.cpp222-225 |
use_int8_arithmetic | bool | false | Use int8 arithmetic | References in src/c_api.cpp227-230 |
use_bf16_packed | bool | false | Enable bf16 packing | References in src/c_api.cpp232-235 |
use_bf16_storage | bool | false | Store weights in bf16 | References in src/c_api.cpp237-240 |
Optimization Strategies:
| Field | Type | Default | Description | Source |
|---|---|---|---|---|
use_packing_layout | bool | true | Use SIMD-packed layouts | References in src/c_api.cpp197-200 |
use_winograd_convolution | bool | true | Use Winograd fast convolution | References in src/c_api.cpp187-190 |
use_sgemm_convolution | bool | true | Use GEMM-based convolution | References in src/c_api.cpp192-195 |
Vulkan GPU Options (when NCNN_VULKAN=ON):
| Field | Type | Default | Description | Source |
|---|---|---|---|---|
use_vulkan_compute | bool | false | Enable Vulkan GPU compute | References in src/c_api.cpp172-180 |
use_shader_local_memory | bool | true | Use shader local memory | References in src/c_api.cpp242-250 |
use_cooperative_matrix | bool | false | Use cooperative matrix operations | References in src/c_api.cpp252-260 |
Memory Allocators:
| Field | Type | Description | Source |
|---|---|---|---|
blob_allocator | Allocator* | Allocator for blob storage | References in src/c_api.cpp162-165 |
workspace_allocator | Allocator* | Allocator for temporary workspace | References in src/c_api.cpp167-170 |
blob_vkallocator | VkAllocator* | GPU blob allocator | Vulkan-specific |
workspace_vkallocator | VkAllocator* | GPU workspace allocator | Vulkan-specific |
staging_vkallocator | VkAllocator* | GPU staging allocator for transfers | Vulkan-specific |
Light Mode:
| Field | Type | Default | Description |
|---|---|---|---|
lightmode | bool | true | Recycle intermediate blobs to reduce memory usage |
Sources: src/c_api.cpp141-350 src/option.h
The Layer class is the base class for all neural network layer implementations in ncnn. Custom layers can inherit from this class.
Initialization and Cleanup:
| Method | Purpose | Return | Source |
|---|---|---|---|
load_param(const ParamDict& pd) | Load layer-specific parameters from parsed dict | 0 on success | src/layer.h30 |
load_model(const ModelBin& mb) | Load layer-specific weight data | 0 on success | src/layer.h34 |
create_pipeline(const Option& opt) | Layer implementation setup | 0 on success | src/layer.h38 |
destroy_pipeline(const Option& opt) | Layer implementation cleanup | 0 on success | src/layer.h42 |
CPU Forward Methods:
| Method | Purpose | Source |
|---|---|---|
forward(const std::vector<Mat>&, std::vector<Mat>&, const Option&) | Forward pass with multiple blobs | src/layer.h64 |
forward(const Mat&, Mat&, const Option&) | Forward pass with single blob | src/layer.h80 |
forward_inplace(std::vector<Mat>&, const Option&) | Inplace forward with multiple blobs | src/layer.h92 |
forward_inplace(Mat&, const Option&) | Inplace forward with single blob | src/layer.h97 |
Vulkan Forward Methods (when NCNN_VULKAN=ON):
| Method | Purpose | Source |
|---|---|---|
upload_model(VkTransfer&, const Option&) | Upload weights to GPU | src/layer.h103 |
forward(const std::vector<VkMat>&, std::vector<VkMat>&, VkCompute&, const Option&) | GPU forward with multiple blobs | src/layer.h108 |
forward(const VkMat&, VkMat&, VkCompute&, const Option&) | GPU forward with single blob | src/layer.h122 |
forward_inplace(std::vector<VkMat>&, VkCompute&, const Option&) | GPU inplace forward multiple | src/layer.h132 |
forward_inplace(VkMat&, VkCompute&, const Option&) | GPU inplace forward single | src/layer.h137 |
| Flag | Type | Description | Source |
|---|---|---|---|
one_blob_only | bool | Layer has exactly one input and one output | src/layer.h46 |
support_inplace | bool | Supports inplace operation (input = output) | src/layer.h49 |
support_vulkan | bool | Has Vulkan GPU implementation | src/layer.h52 |
support_packing | bool | Accepts packed storage (elempack > 1) | src/layer.h55 |
support_bf16_storage | bool | Accepts bf16 weights | src/layer.h58 |
support_fp16_storage | bool | Accepts fp16 weights | src/layer.h61 |
support_int8_storage | bool | Accepts int8 weights | src/layer.h64 |
support_tensor_storage | bool | Uses shader tensor storage | src/layer.h67 |
support_vulkan_packing | bool | Vulkan implementation supports packing | Referenced in context |
support_any_packing | bool | CPU implementation supports any packing | Referenced in context |
support_vulkan_any_packing | bool | Vulkan implementation supports any packing | Referenced in context |
Layers are identified by both string names and integer type indices.
Layer Registry Functions:
| Function | Purpose | Source |
|---|---|---|
layer_to_index(const char* type) | Convert layer name to type index | src/layer.cpp148-155 |
index_to_layer(int index) | Convert type index to layer name | src/layer.cpp157-163 |
create_layer(const char* type) | Create layer by name | src/layer.cpp165-182 |
create_layer(int index) | Create layer by type index | src/layer.cpp184-296 |
Custom layers can be registered with the Net class:
Sources: src/layer.h20-141 src/layer.cpp14-296
ncnn uses an abstract allocator interface to manage memory allocation for tensors and temporary workspace.
The base allocator interface.
| Method | Purpose | Source |
|---|---|---|
fastMalloc(size_t size) | Allocate memory | src/allocator.h |
fastFree(void* ptr) | Free memory | src/allocator.h |
Thread-safe pooled allocator with budget-based memory recycling.
Features:
Methods:
| Method | Purpose | Source |
|---|---|---|
PoolAllocator() | Constructor | References in src/c_api.cpp114-121 |
set_size_compare_ratio(float ratio) | Set size matching tolerance for reuse | References in allocator context |
clear() | Clear all pooled memory | References in allocator context |
Non-thread-safe pooled allocator for single-threaded or thread-local usage.
Features:
Methods:
| Method | Purpose | Source |
|---|---|---|
UnlockedPoolAllocator() | Constructor | References in src/c_api.cpp123-130 |
set_size_compare_ratio(float ratio) | Set size matching tolerance | References in allocator context |
clear() | Clear all pooled memory | References in allocator context |
NCNN_VULKAN=ON)ncnn::VkAllocator: Base class for Vulkan memory allocators.
ncnn::VkBlobAllocator: Allocates device-local memory for blob storage.
ncnn::VkStagingAllocator: Allocates host-visible memory for CPU-GPU data transfers.
ncnn::VkWeightAllocator: Specialized allocator for layer weights with staging support.
Sources: src/c_api.cpp47-139 src/allocator.h src/allocator.cpp
| elemsize | Type | Description |
|---|---|---|
| 4 | fp32 / int32 | 32-bit floating point or integer |
| 2 | fp16 | 16-bit floating point (IEEE 754 half precision) |
| 1 | int8 / uint8 | 8-bit integer or unsigned integer |
| 0 | empty | Empty tensor |
The elempack field indicates how many elements are packed together for SIMD processing:
| elempack | Target ISA | Description |
|---|---|---|
| 1 | Scalar | No packing, scalar processing |
| 4 | SSE2, NEON | 4-element vectors (128-bit) |
| 8 | AVX, FP16 | 8-element vectors (256-bit) or 8×fp16 |
| 16 | AVX512 | 16-element vectors (512-bit) |
Example Layouts:
For a 3D tensor with shape (c, h, w) and elempack=4:
c/44 × h × w elements[c0,c1,c2,c3][c4,c5,c6,c7]... for each spatial positionThe cstep field represents the step size between channels, always 16-byte aligned:
This alignment ensures efficient SIMD access patterns.
Sources: src/mat.h312-322 src/mat.cpp222-825
This completes the C++ API Reference. For implementation details of specific layer types, see the layer source files in src/layer/ and src/layer/vulkan/ For advanced topics like custom layer development, refer to the ncnn wiki and examples.
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.