This document describes the tensor data structures used in ncnn for storing and manipulating multidimensional arrays during neural network inference. The primary data structures are Mat (CPU tensors), VkMat (GPU buffer tensors), and VkImageMat (GPU image tensors).
For information about memory allocators used with these structures, see page 2.4. For details on network loading and blob management during inference, see page 2.1. For GPU compute and how VkMat is used in pipeline dispatch, see page 3.
ncnn provides three main tensor representations optimized for different execution contexts:
| Class | Purpose | Memory Location | Use Case |
|---|---|---|---|
Mat | CPU tensor | System RAM | CPU layer execution, data preparation |
VkMat | GPU buffer tensor | GPU device memory | GPU computation via Vulkan compute shaders |
VkImageMat | GPU image tensor | GPU image memory | GPU computation with image samplers |
Sources: src/mat.h50-336 src/mat.h341-462 src/mat.h464-589
Mat supports four dimensional ranks:
Diagram: Mat Dimensional Structure
The dimension fields determine the tensor shape:
dims: Dimension rank (1, 2, 3, or 4)w: Width (x-axis)h: Height (y-axis)d: Depth (z-axis, for 4D tensors)c: Channels (number of feature channels)Sources: src/mat.h327-334
Mat uses a Channel-Height-Width (CHW) memory layout where channels are stored sequentially, with each channel containing a contiguous HW plane.
Diagram: Mat Memory Layout with Channel Stride
The cstep field specifies the stride in elements between consecutive channels, accounting for 16-byte alignment:
cstep = alignSize(w × h × d × elemsize, 16) / elemsize
Element Packing: Mat supports packing multiple scalar values into one memory slot for SIMD operations. When elempack > 1, the outermost dimension (c for 3D/4D tensors, h for 2D, w for 1D) is divided by elempack, and elemsize is multiplied by elempack. Each slot then holds elempack consecutive scalar values, matching a SIMD register width.
The elempack comment in src/mat.h documents the layout conventions:
c/1-d-h-w-1 → scalar (elempack=1)
c/4-d-h-w-4 → SSE/NEON pack4 (elempack=4, elemsize=4×scalar_bytes)
c/8-d-h-w-8 → AVX/FP16 pack8 (elempack=8, elemsize=8×scalar_bytes)
elempack | Architecture | SIMD width (float32) |
|---|---|---|
| 1 | Scalar | 32-bit |
| 4 | SSE/NEON | 128-bit |
| 8 | AVX | 256-bit |
| 8 | ARM FP16 | 128-bit (8×16-bit) |
| 16 | AVX-512 | 512-bit |
Because elemsize encodes the packing (elemsize = scalar_bytes × elempack), the total allocated bytes per channel plane is simply cstep × elemsize, where cstep already accounts for alignment.
Sources: src/mat.h311-336 src/mat.cpp222-254
Diagram: Mat Class Structure
Sources: src/mat.h304-336
Mat objects are created through various constructors:
Diagram: Mat Object Lifecycle
Reference Counting: Mat uses automatic reference counting for memory management. Multiple Mat objects can share the same underlying data:
refcountrelease() decrements refcount and frees memory when it reaches zeroclone() creates a deep copy with independent memorySources: src/mat.h52-92 src/mat.cpp19-53 src/mat.cpp222-255
Mat provides multiple access patterns:
Diagram: Mat Data Access Patterns
Key access methods:
channel(int c): Returns a Mat view of a single channeldepth(int z): Returns a Mat view of a single depth slice (for 4D)row(int y): Returns a pointer to a rowoperator[](size_t i): Direct element access for 1D vectorschannel_range(), depth_range(), row_range(), range(): Return views of subsetsAll range methods return Mat objects that share the underlying data (shallow views).
Sources: src/mat.h187-206
Mat provides extensive support for image pixel data conversion:
Diagram: Mat Pixel Format Conversion Pipeline
Pixel Type Enumeration: The PixelType enum in Mat defines conversion operations:
PIXEL_RGB, PIXEL_BGR, PIXEL_GRAY, PIXEL_RGBA, PIXEL_BGRA
PIXEL_RGB2BGR, PIXEL_RGB2GRAY, PIXEL_BGR2RGB, etc.
Format encoding: type = source_format | (dest_format << 16)
Key pixel methods:
from_pixels(): Convert packed pixel data to CHW float formatfrom_pixels_resize(): Convert and resize in one operationfrom_pixels_roi(): Extract region of interest during conversionto_pixels(): Convert CHW float format back to packed pixelssubstract_mean_normalize(): Apply mean subtraction and normalizationSources: src/mat.h218-299 src/mat_pixel.cpp16-127 src/mat_pixel_resize.cpp190-374
Diagram: Mat Reshape and Copy Operations
The reshape() method attempts to return a view with different dimensions when possible, but may allocate new memory if channel stride alignment requirements differ between the source and target shapes.
Sources: src/mat.h140-147 src/mat.cpp60-220
VkMat represents tensor data in GPU device memory as Vulkan buffer objects, suitable for compute shader operations.
Diagram: VkMat Class Structure and Relationships
| Aspect | Mat | VkMat |
|---|---|---|
| Memory | System RAM (void*) | GPU device memory (VkBufferMemory*) |
| Allocator | Allocator* | VkAllocator* |
| Access | Direct CPU pointer | Via mapped() or buffer handles |
| Refcount location | After data | In VkBufferMemory struct |
| Creation | Allocates CPU memory | Allocates VkBuffer and VkDeviceMemory |
Diagram: VkMat CPU-GPU Data Transfer Flow
Data transfer methods:
mapped(): Returns a CPU-accessible Mat view of the buffer (requires host-visible memory)mapped_ptr(): Returns the raw CPU-mapped pointerVkStagingAllocator for temporary host-visible buffersVkCompute command buffersSources: src/mat.h341-462 src/mat.cpp544-806
VkImageMat stores tensor data as Vulkan image objects, optimized for texture sampling and image operations.
Diagram: VkImageMat Class Structure
| Property | Details |
|---|---|
| Memory Layout | 2D/3D image layout (GPU-optimized tiling) |
| Element Access | Via image samplers in shaders |
| Channel Packing | Stored in image RGBA components |
| Dimensions | 4D tensors stored as 3D images (h×d as height) |
The image format is determined by elemsize and elempack:
elempack=1, elemsize=4: R32F (single channel float32)elempack=4, elemsize=16: RGBA32F (4-channel float32)elempack=1, elemsize=2: R16F (half precision)elempack=4, elemsize=8: RGBA16F (4-channel half)For 4D tensors, the depth dimension is folded into the height: image_height = h * d
Sources: src/mat.h464-589 src/mat.cpp847-1118
Mat, VkMat, and VkImageMat support multiple numeric precisions. The elemsize field holds the total bytes per packed slot — it scales with elempack.
Diagram: Mat elemsize Values by Precision and Packing
Sources: src/mat.h311-322
The elembits() method returns the scalar element precision in bits, independent of packing. Because elemsize already encodes the packing factor (elemsize = scalar_bytes × elempack), dividing by elempack recovers the per-scalar width:
elembits() = (elemsize × 8) / elempack
elembits() is used in NetPrivate::convert_layout (see src/net.cpp) to determine which type-conversion path to apply — for example, elembits() == 32 triggers an fp16/bf16 cast, and elembits() == 16 triggers a cast back to fp32. It always returns 32, 16, or 8 regardless of packing.
| Precision | elempack | elemsize | elembits() |
|---|---|---|---|
| float32, scalar | 1 | 4 | 32 |
| float32, pack4 (SSE/NEON) | 4 | 16 | 32 |
| float32, pack8 (AVX) | 8 | 32 | 32 |
| float16/bf16, scalar | 1 | 2 | 16 |
| float16/bf16, pack8 | 8 | 16 | 16 |
| int8, scalar | 1 | 1 | 8 |
| int8, pack8 | 8 | 8 | 8 |
Sources: src/mat.h180-181 src/mat.h311-322
ncnn provides element-wise and tensor-level functions for converting between numeric formats. These are invoked automatically by NetPrivate::convert_layout when a blob's elembits() does not match what the next layer requires.
Diagram: Type Conversion Functions (mat.h scalar helpers and tensor-level casts)
| Conversion | Notes |
|---|---|
float32_to_float16() | Full IEEE 754 half-precision |
float32_to_bfloat16() | Truncates mantissa; keeps exponent range |
float8 (E4M3) | 4-bit exponent, 3-bit mantissa |
bfloat8 (E5M2) | 5-bit exponent, 2-bit mantissa |
Tensor-level cast_* functions allocate an output Mat and process all elements, with SIMD optimizations per platform.
Sources: src/mat.h726-771 src/mat.h790-796 src/net.cpp358-549
All three tensor classes integrate with ncnn's allocator system:
Diagram: Tensor Memory Management Integration
Key behaviors:
create() methods check if existing allocation matches requirements before reallocatingfastMalloc() methodnull for default system allocation (Mat only)Sources: src/mat.h324-325 src/mat.cpp222-500 src/mat.cpp544-806 src/mat.cpp847-1079
| Feature | Mat | VkMat | VkImageMat |
|---|---|---|---|
| Memory | System RAM | GPU buffer memory | GPU image memory |
| Layout | Linear CHW | Linear CHW | Tiled 2D/3D |
| CPU Access | Direct pointer | Via mapped() | Via mapped() |
| GPU Access | N/A | Storage buffer | Sampled image |
| Allocator Type | Allocator* | VkAllocator* | VkAllocator* |
| Primary Use | CPU layers, I/O | GPU compute | GPU compute (texture ops) |
| Channel Packing | elempack channels per element | elempack channels per element | elempack in RGBA components |
| Precision Support | FP32, FP16, INT8, BF16 | FP32, FP16, INT8, BF16 | FP32, FP16 |
Sources: src/mat.h50-589
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.