Vulkan Layer Implementations

Relevant source files

This page documents the GPU-side layer classes located in src/layer/vulkan/. Each class wraps a CPU layer with a Vulkan compute pipeline that operates on VkMat tensors. The page covers the shared lifecycle pattern, packing layout selection, and the key per-layer implementation details including weight packing strategies, Winograd convolution shaders, and cooperative-matrix GEMM.

For the Vulkan pipeline and command recording infrastructure (Pipeline, VkCompute, record_pipeline), see 3.2. For GPU memory management (VkAllocator, VkTransfer), see 3.3. For the CPU-side counterparts of these layers, see 4.

Vulkan Layer Lifecycle

Every Vulkan layer class follows a three-phase lifecycle. The methods are defined on the base Layer class and overridden in each _vulkan subclass.

Lifecycle Phases and Method Calls

Sources: src/layer/vulkan/convolution_vulkan.cpp58-176 src/layer/vulkan/convolutiondepthwise_vulkan.cpp316-348 src/layer/vulkan/innerproduct_vulkan.cpp26-225

Phase	Method	Key Actions
Parameter load	`load_param`	Reads layer params; may set `support_vulkan = false` for dynamic weights
Pipeline setup	`create_pipeline`	Packs weights on CPU; allocates `Pipeline` objects with specialization constants
Weight upload	`upload_model`	Transfers packed weights to device-local `VkMat` via `VkTransfer::record_upload`
Inference	`forward(VkMat, ...)`	Calls `VkCompute::record_pipeline` for each dispatch; no CPU/GPU sync here
Cleanup	`destroy_pipeline`	Deletes `Pipeline` objects and any sub-layers

Pack Variant Selection

The Vulkan layers use a pack4 layout when channel counts are divisible by 4. Each shader invocation processes 4 channels at once using GLSL vec4. Two independent packing factors are computed: one for the input tensor and one for the output tensor.

elempack     = (num_input  % 4 == 0) ? 4 : 1
out_elempack = (num_output % 4 == 0) ? 4 : 1

This yields four possible shader variants selected at create_pipeline time:

`elempack`	`out_elempack`	Shader suffix example
1	1	`convolution`
4	4	`convolution_pack4`
1	4	`convolution_pack1to4`
4	1	`convolution_pack4to1`

The correct LayerShaderType enum value is chosen and passed to Pipeline::create. The pipeline object is stored as a class member (e.g., pipeline_convolution_pack4).

Sources: src/layer/vulkan/convolution_vulkan.cpp97-111 src/layer/vulkan/convolutiondepthwise_vulkan.cpp74-76 src/layer/vulkan/innerproduct_vulkan.cpp33-34

Packing variant selection in create_pipeline

Sources: src/layer/vulkan/convolution_vulkan.cpp97-111 src/layer/vulkan/deconvolution_vulkan.cpp69-70

Convolution_vulkan

Class: Convolution_vulkan in src/layer/vulkan/convolution_vulkan.h

Convolution_vulkan inherits from Convolution and selects one of several compute strategies at create_pipeline time, depending on kernel geometry, channel counts, and Option flags.

Compute Path Selection

Sources: src/layer/vulkan/convolution_vulkan.cpp58-180 src/layer/vulkan/convolution_vulkan.h36-50

Winograd Paths

For 3×3 stride-1 dilation-1 convolutions with at least 16 input and output channels, two Winograd variants are supported:

Variant	Tile size	Filter domain size	Pipeline members
Winograd F(2,3)	2×2 output	4×4 = 16 transforms	`pipeline_convolution_3x3s1d1_winograd23_*`
Winograd F(4,3)	4×4 output	6×6 = 36 transforms	`pipeline_convolution_3x3s1d1_winograd43_*`

Each variant is a three-stage pipeline:

transform_input — transforms input tiles into the frequency domain
gemm — batched matrix multiply over all frequency positions
transform_output — applies bias and activation, transforms back

Kernel weights are transformed on the CPU during create_pipeline and stored in weight_winograd23_data_packed or weight_winograd43_data_packed, then uploaded via upload_model.

Sources: src/layer/vulkan/convolution_vulkan.cpp207-533 src/layer/vulkan/convolution_vulkan.h43-50

Cooperative Matrix GEMM

When vkdev->info.support_cooperative_matrix() returns true and FP16 storage is active, the Winograd GEMM stage uses the convolution_winograd_gemm_cm shader (src/layer/vulkan/shader/convolution_winograd_gemm_cm.comp) instead of the standard GEMM. This shader uses GL_KHR_cooperative_matrix (or GL_NV_cooperative_matrix as fallback).

The parameters coopmat_M, coopmat_N, coopmat_K and the unroll factors UNROLL_SG_M/N/K, UNROLL_WG_M/N are computed at pipeline creation time by calling vkdev->info.get_optimal_cooperative_matrix_mnk(...). These are passed as Vulkan specialization constants.

Weight data is re-packed in cooperative-matrix-friendly layout:

Standard layout: inch/pa × outch/pb × 36 (for F(4,3))
Cooperative matrix layout: blocks_n × (coopmat_N×coopmat_K×UNROLL_SG_N×UNROLL_WG_N×kk) per frequency position

Sources: src/layer/vulkan/convolution_vulkan.cpp180-210 src/layer/vulkan/convolution_vulkan.cpp258-409 src/layer/vulkan/shader/convolution_winograd_gemm_cm.comp1-30

Weight Packing Layout

Strategy	CPU Mat layout (`weight_data_packed`)
Direct / GEMM	`maxk × (num_input/ep) × (num_output/oep)` with elempack `ep*oep`
Winograd standard	`(num_input/ep) × (num_output/oep) × 36 (or 16)` with elempack `ep*oep`
Winograd coopmat	`(coopmat_Ncoopmat_K...*kk) × blocks_n × 36 (or 16)` with elempack 1

Sources: src/layer/vulkan/convolution_vulkan.cpp382-409 src/layer/vulkan/convolution_vulkan.cpp710-740

ConvolutionDepthWise_vulkan

File: src/layer/vulkan/convolutiondepthwise_vulkan.cpp

ConvolutionDepthWise_vulkan handles both true depthwise convolution (channels == group == num_output) and grouped convolution with smaller per-group elempack_g.

The class creates a Padding_vulkan sub-layer for input padding, stored in the member padding.

True depthwise path:

Weights are packed via convert_packing(weight_data_r2, weight_data_packed, elempack, opt).
One of pipeline_convolutiondepthwise (pack1) or pipeline_convolutiondepthwise_pack4 (pack4) is created.

Group convolution path:

Packing factors are computed per group: elempack_g and out_elempack_g.
Four pipelines cover the 2×2 combinations: pipeline_convolutiondepthwise_group, ..._pack4, ..._pack1to4, ..._pack4to1.
Packed weights are stored in weight_data_packed_groups.

Sources: src/layer/vulkan/convolutiondepthwise_vulkan.cpp39-284 src/layer/vulkan/convolutiondepthwise_vulkan.cpp316-348

Deconvolution_vulkan

File: src/layer/vulkan/deconvolution_vulkan.cpp

Deconvolution_vulkan implements transposed convolution. It creates two Crop_vulkan sub-layers (crop and output_crop) for post-convolution output trimming.

Compute Paths

GEMM + col2im path (when opt.use_sgemm_convolution && num_input >= 8 && maxk*num_output >= 8):
- Weight layout: (num_input/ep) × (maxk*num_output/oep) — the kernel and output dimensions are fused.
- Two pipelines: pipeline_deconvolution_gemm (one invocation per input spatial position) and pipeline_deconvolution_col2im (scatter results to output).
- Supports cooperative matrix via deconvolution_gemm_cm shader.
Direct path: Weights are transposed (kernel order reversed) and packed as maxk × (num_input/ep) × (num_output/oep). A single pipeline_deconvolution handles the full computation.

Sources: src/layer/vulkan/deconvolution_vulkan.cpp48-501 src/layer/vulkan/deconvolution_vulkan.h1-55

InnerProduct_vulkan

File: src/layer/vulkan/innerproduct_vulkan.cpp

InnerProduct_vulkan creates a Flatten_vulkan sub-layer to convert multi-dimensional input into a 1D vector before the weight multiply.

Dispatch Strategy

The class chooses among three approaches depending on input shape and num_input:

1. GEMM path (2D input — batch matrix multiply)

Triggered when shape.dims == 2 && shape.w == num_input. Uses pipeline_innerproduct_gemm with shader variants: innerproduct_gemm, innerproduct_gemm_wp4, innerproduct_gemm_wp1to4, innerproduct_gemm_wp4to1.

2. Two-pass sum8 path (large flat input)

Triggered when num_input / in_elempack >= 32. Splits work across two pipelines:

pipeline_innerproduct_sum8 — partial dot products grouped into blocks of 8
pipeline_innerproduct_reduce_sum8 — reduces partial sums and applies bias/activation

3. Direct path (small flat input)

A single pipeline_innerproduct computes the full dot product in one dispatch.

The weight packing layout is always (num_input/ep) × (num_output/oep) with elempack ep*oep.

Sources: src/layer/vulkan/innerproduct_vulkan.cpp26-392

Pooling_vulkan

File: src/layer/vulkan/pooling_vulkan.cpp

Pooling_vulkan creates a Padding_vulkan sub-layer for pre-pooling padding and selects among three compute strategies:

Mode	Pipelines Created
Standard (non-global, non-adaptive)	`pipeline_pooling`, `pipeline_pooling_pack4`
Adaptive	`pipeline_pooling_adaptive`, `pipeline_pooling_adaptive_pack4`
Global pooling	`pipeline_pooling_global_reduce_first[_pack4]`, `pipeline_pooling_global_reduce[_pack4]`, `pipeline_pooling_global_reduce_last[_pack4]`

Global pooling uses a three-stage tree reduction to avoid serializing all spatial positions through a single shader invocation.

Specialization constants encode pooling_type, kernel_w/h, stride_w/h, pad_mode, and avgpool_count_include_pad.

Sources: src/layer/vulkan/pooling_vulkan.cpp33-420

Padding_vulkan

File: src/layer/vulkan/padding_vulkan.cpp

Padding_vulkan is a utility layer used as a sub-layer by Convolution_vulkan, ConvolutionDepthWise_vulkan, and Pooling_vulkan. It handles the GPU-side border extension before convolution or pooling.

Packing awareness:

The shader requires that the padding amount along the packed dimension is consistent with the packing factor. An offset_elempack is computed from the amount to be padded on the leading edge.
When the input elempack does not match the required offset packing (e.g., padding 2 elements into a pack4 tensor), an unpacking pass is inserted using pipeline_padding_pack4to1.

Pipeline members:

Member	Condition
`pipeline_padding`	pack1 input/output
`pipeline_padding_pack4`	pack4 input/output
`pipeline_padding_pack1to4`	pack1 in → pack4 out
`pipeline_padding_pack4to1`	pack4 in → pack1 out
`pipeline_padding_3d` / `pipeline_padding_3d_pack4`	4D (depth) input

Sources: src/layer/vulkan/padding_vulkan.cpp10-80

Interp_vulkan

File: src/layer/vulkan/interp_vulkan.cpp

Interp_vulkan implements nearest/bilinear/bicubic resize on the GPU.

resize_type == 1 or 2 (nearest/bilinear): A single shader interp / interp_pack4 handles both modes, with resize_type and align_corner passed as specialization constants.
resize_type == 3 (bicubic): Requires three pipelines:
1. pipeline_interp_bicubic_coeffs_x — pre-computes horizontal cubic coefficients
2. pipeline_interp_bicubic_coeffs_y — pre-computes vertical cubic coefficients
3. pipeline_interp_bicubic / pipeline_interp_bicubic_pack4 — applies the coefficients

Sources: src/layer/vulkan/interp_vulkan.cpp24-174

Other Notable Vulkan Layers

Layer class	File	Strategy
`InstanceNorm_vulkan`	`src/layer/vulkan/instancenorm_vulkan.cpp`	Multi-pass: reduce→mean→sub_mean_square→coeffs→norm; pack1 and pack4 variants
`PReLU_vulkan`	`src/layer/vulkan/prelu_vulkan.cpp`	Single pipeline; `num_slope` and optional single-slope constant as specialization
`Scale_vulkan`	`src/layer/vulkan/scale_vulkan.cpp`	Single pipeline; `pipeline_scale` / `pipeline_scale_pack4`
`Normalize_vulkan`	`src/layer/vulkan/normalize_vulkan.cpp`	Multi-pass reduce and normalize; pack1 and pack4 variants
`LRN_vulkan`	`src/layer/vulkan/lrn_vulkan.cpp`	Two-pass: square-sum then normalize
`DeconvolutionDepthWise_vulkan`	`src/layer/vulkan/deconvolutiondepthwise_vulkan.cpp`	True depthwise and group variants; creates Crop sub-layers

Sources: src/layer/vulkan/instancenorm_vulkan.cpp10-30 src/layer/vulkan/prelu_vulkan.cpp10-17 src/layer/vulkan/scale_vulkan.cpp10-17

Vulkan Layer Class Hierarchy

The diagram below maps layer names to their class names, source files, and the pipeline member names that identify which shaders are loaded.

Key Vulkan Layer Classes and Their Pipeline Members

Sources: src/layer/vulkan/convolution_vulkan.h11-67 src/layer/vulkan/convolutiondepthwise_vulkan.cpp11-25 src/layer/vulkan/innerproduct_vulkan.cpp11-24 src/layer/vulkan/pooling_vulkan.cpp13-31 src/layer/vulkan/padding_vulkan.cpp10-22 src/layer/vulkan/deconvolution_vulkan.h10-55

Specialization Constants Pattern

Every pipeline creation follows the same pattern of encoding both static parameters and optional shape hints as specialization constants:

specializations[0..N-1]  = layer parameters (kernel size, stride, bias_term, activation_type, ...)
specializations[N+0..N+4] = input shape (dims, w, h, c, cstep)
specializations[N+5..N+9] = output shape (dims, w, h, c, cstep)

Shape-based specializations are zero when shapes are not known at create_pipeline time (i.e., bottom_shapes is empty). In that case, the shader uses runtime push_constant values provided via vk_constant_type in forward.

This two-level configuration allows the shader compiler to constant-fold dimension checks when shapes are known statically, while remaining functional when shapes are dynamic.

Sources: src/layer/vulkan/convolution_vulkan.cpp435-452 src/layer/vulkan/pooling_vulkan.cpp120-165 src/layer/vulkan/interp_vulkan.cpp31-44

Vulkan Layer Implementations

Relevant source files

Vulkan Layer Lifecycle

Every Vulkan layer class follows a three-phase lifecycle. The methods are defined on the base Layer class and overridden in each _vulkan subclass.

Lifecycle Phases and Method Calls

Sources: src/layer/vulkan/convolution_vulkan.cpp58-176 src/layer/vulkan/convolutiondepthwise_vulkan.cpp316-348 src/layer/vulkan/innerproduct_vulkan.cpp26-225

Phase	Method	Key Actions
Parameter load	`load_param`	Reads layer params; may set `support_vulkan = false` for dynamic weights
Pipeline setup	`create_pipeline`	Packs weights on CPU; allocates `Pipeline` objects with specialization constants
Weight upload	`upload_model`	Transfers packed weights to device-local `VkMat` via `VkTransfer::record_upload`
Inference	`forward(VkMat, ...)`	Calls `VkCompute::record_pipeline` for each dispatch; no CPU/GPU sync here
Cleanup	`destroy_pipeline`	Deletes `Pipeline` objects and any sub-layers

Pack Variant Selection

elempack     = (num_input  % 4 == 0) ? 4 : 1
out_elempack = (num_output % 4 == 0) ? 4 : 1

This yields four possible shader variants selected at create_pipeline time:

`elempack`	`out_elempack`	Shader suffix example
1	1	`convolution`
4	4	`convolution_pack4`
1	4	`convolution_pack1to4`
4	1	`convolution_pack4to1`

The correct LayerShaderType enum value is chosen and passed to Pipeline::create. The pipeline object is stored as a class member (e.g., pipeline_convolution_pack4).

Sources: src/layer/vulkan/convolution_vulkan.cpp97-111 src/layer/vulkan/convolutiondepthwise_vulkan.cpp74-76 src/layer/vulkan/innerproduct_vulkan.cpp33-34

Packing variant selection in create_pipeline

Sources: src/layer/vulkan/convolution_vulkan.cpp97-111 src/layer/vulkan/deconvolution_vulkan.cpp69-70

Convolution_vulkan

Class: Convolution_vulkan in src/layer/vulkan/convolution_vulkan.h

Convolution_vulkan inherits from Convolution and selects one of several compute strategies at create_pipeline time, depending on kernel geometry, channel counts, and Option flags.

Compute Path Selection

Sources: src/layer/vulkan/convolution_vulkan.cpp58-180 src/layer/vulkan/convolution_vulkan.h36-50

Winograd Paths

For 3×3 stride-1 dilation-1 convolutions with at least 16 input and output channels, two Winograd variants are supported:

Variant	Tile size	Filter domain size	Pipeline members
Winograd F(2,3)	2×2 output	4×4 = 16 transforms	`pipeline_convolution_3x3s1d1_winograd23_*`
Winograd F(4,3)	4×4 output	6×6 = 36 transforms	`pipeline_convolution_3x3s1d1_winograd43_*`

Each variant is a three-stage pipeline:

transform_input — transforms input tiles into the frequency domain
gemm — batched matrix multiply over all frequency positions
transform_output — applies bias and activation, transforms back

Kernel weights are transformed on the CPU during create_pipeline and stored in weight_winograd23_data_packed or weight_winograd43_data_packed, then uploaded via upload_model.

Sources: src/layer/vulkan/convolution_vulkan.cpp207-533 src/layer/vulkan/convolution_vulkan.h43-50

Cooperative Matrix GEMM

Weight data is re-packed in cooperative-matrix-friendly layout:

Standard layout: inch/pa × outch/pb × 36 (for F(4,3))
Cooperative matrix layout: blocks_n × (coopmat_N×coopmat_K×UNROLL_SG_N×UNROLL_WG_N×kk) per frequency position

Sources: src/layer/vulkan/convolution_vulkan.cpp180-210 src/layer/vulkan/convolution_vulkan.cpp258-409 src/layer/vulkan/shader/convolution_winograd_gemm_cm.comp1-30

Weight Packing Layout

Strategy	CPU Mat layout (`weight_data_packed`)
Direct / GEMM	`maxk × (num_input/ep) × (num_output/oep)` with elempack `ep*oep`
Winograd standard	`(num_input/ep) × (num_output/oep) × 36 (or 16)` with elempack `ep*oep`
Winograd coopmat	`(coopmat_Ncoopmat_K...*kk) × blocks_n × 36 (or 16)` with elempack 1

Sources: src/layer/vulkan/convolution_vulkan.cpp382-409 src/layer/vulkan/convolution_vulkan.cpp710-740

ConvolutionDepthWise_vulkan

File: src/layer/vulkan/convolutiondepthwise_vulkan.cpp

ConvolutionDepthWise_vulkan handles both true depthwise convolution (channels == group == num_output) and grouped convolution with smaller per-group elempack_g.

The class creates a Padding_vulkan sub-layer for input padding, stored in the member padding.

True depthwise path:

Weights are packed via convert_packing(weight_data_r2, weight_data_packed, elempack, opt).
One of pipeline_convolutiondepthwise (pack1) or pipeline_convolutiondepthwise_pack4 (pack4) is created.

Group convolution path:

Packing factors are computed per group: elempack_g and out_elempack_g.
Four pipelines cover the 2×2 combinations: pipeline_convolutiondepthwise_group, ..._pack4, ..._pack1to4, ..._pack4to1.
Packed weights are stored in weight_data_packed_groups.

Sources: src/layer/vulkan/convolutiondepthwise_vulkan.cpp39-284 src/layer/vulkan/convolutiondepthwise_vulkan.cpp316-348

Deconvolution_vulkan

File: src/layer/vulkan/deconvolution_vulkan.cpp

Deconvolution_vulkan implements transposed convolution. It creates two Crop_vulkan sub-layers (crop and output_crop) for post-convolution output trimming.

Compute Paths

GEMM + col2im path (when opt.use_sgemm_convolution && num_input >= 8 && maxk*num_output >= 8):
- Weight layout: (num_input/ep) × (maxk*num_output/oep) — the kernel and output dimensions are fused.
- Two pipelines: pipeline_deconvolution_gemm (one invocation per input spatial position) and pipeline_deconvolution_col2im (scatter results to output).
- Supports cooperative matrix via deconvolution_gemm_cm shader.
Direct path: Weights are transposed (kernel order reversed) and packed as maxk × (num_input/ep) × (num_output/oep). A single pipeline_deconvolution handles the full computation.

Sources: src/layer/vulkan/deconvolution_vulkan.cpp48-501 src/layer/vulkan/deconvolution_vulkan.h1-55

InnerProduct_vulkan

File: src/layer/vulkan/innerproduct_vulkan.cpp

InnerProduct_vulkan creates a Flatten_vulkan sub-layer to convert multi-dimensional input into a 1D vector before the weight multiply.

Dispatch Strategy

The class chooses among three approaches depending on input shape and num_input:

1. GEMM path (2D input — batch matrix multiply)

2. Two-pass sum8 path (large flat input)

Triggered when num_input / in_elempack >= 32. Splits work across two pipelines:

pipeline_innerproduct_sum8 — partial dot products grouped into blocks of 8
pipeline_innerproduct_reduce_sum8 — reduces partial sums and applies bias/activation

3. Direct path (small flat input)

A single pipeline_innerproduct computes the full dot product in one dispatch.

The weight packing layout is always (num_input/ep) × (num_output/oep) with elempack ep*oep.

Sources: src/layer/vulkan/innerproduct_vulkan.cpp26-392

Pooling_vulkan

File: src/layer/vulkan/pooling_vulkan.cpp

Pooling_vulkan creates a Padding_vulkan sub-layer for pre-pooling padding and selects among three compute strategies:

Mode	Pipelines Created
Standard (non-global, non-adaptive)	`pipeline_pooling`, `pipeline_pooling_pack4`
Adaptive	`pipeline_pooling_adaptive`, `pipeline_pooling_adaptive_pack4`
Global pooling	`pipeline_pooling_global_reduce_first[_pack4]`, `pipeline_pooling_global_reduce[_pack4]`, `pipeline_pooling_global_reduce_last[_pack4]`

Global pooling uses a three-stage tree reduction to avoid serializing all spatial positions through a single shader invocation.

Specialization constants encode pooling_type, kernel_w/h, stride_w/h, pad_mode, and avgpool_count_include_pad.

Sources: src/layer/vulkan/pooling_vulkan.cpp33-420

Padding_vulkan

File: src/layer/vulkan/padding_vulkan.cpp

Packing awareness:

The shader requires that the padding amount along the packed dimension is consistent with the packing factor. An offset_elempack is computed from the amount to be padded on the leading edge.
When the input elempack does not match the required offset packing (e.g., padding 2 elements into a pack4 tensor), an unpacking pass is inserted using pipeline_padding_pack4to1.

Pipeline members:

Member	Condition
`pipeline_padding`	pack1 input/output
`pipeline_padding_pack4`	pack4 input/output
`pipeline_padding_pack1to4`	pack1 in → pack4 out
`pipeline_padding_pack4to1`	pack4 in → pack1 out
`pipeline_padding_3d` / `pipeline_padding_3d_pack4`	4D (depth) input

Sources: src/layer/vulkan/padding_vulkan.cpp10-80

Interp_vulkan

File: src/layer/vulkan/interp_vulkan.cpp

Interp_vulkan implements nearest/bilinear/bicubic resize on the GPU.

resize_type == 1 or 2 (nearest/bilinear): A single shader interp / interp_pack4 handles both modes, with resize_type and align_corner passed as specialization constants.
resize_type == 3 (bicubic): Requires three pipelines:
1. pipeline_interp_bicubic_coeffs_x — pre-computes horizontal cubic coefficients
2. pipeline_interp_bicubic_coeffs_y — pre-computes vertical cubic coefficients
3. pipeline_interp_bicubic / pipeline_interp_bicubic_pack4 — applies the coefficients

Sources: src/layer/vulkan/interp_vulkan.cpp24-174

Other Notable Vulkan Layers

Layer class	File	Strategy
`InstanceNorm_vulkan`	`src/layer/vulkan/instancenorm_vulkan.cpp`	Multi-pass: reduce→mean→sub_mean_square→coeffs→norm; pack1 and pack4 variants
`PReLU_vulkan`	`src/layer/vulkan/prelu_vulkan.cpp`	Single pipeline; `num_slope` and optional single-slope constant as specialization
`Scale_vulkan`	`src/layer/vulkan/scale_vulkan.cpp`	Single pipeline; `pipeline_scale` / `pipeline_scale_pack4`
`Normalize_vulkan`	`src/layer/vulkan/normalize_vulkan.cpp`	Multi-pass reduce and normalize; pack1 and pack4 variants
`LRN_vulkan`	`src/layer/vulkan/lrn_vulkan.cpp`	Two-pass: square-sum then normalize
`DeconvolutionDepthWise_vulkan`	`src/layer/vulkan/deconvolutiondepthwise_vulkan.cpp`	True depthwise and group variants; creates Crop sub-layers

Sources: src/layer/vulkan/instancenorm_vulkan.cpp10-30 src/layer/vulkan/prelu_vulkan.cpp10-17 src/layer/vulkan/scale_vulkan.cpp10-17

Vulkan Layer Class Hierarchy

The diagram below maps layer names to their class names, source files, and the pipeline member names that identify which shaders are loaded.

Key Vulkan Layer Classes and Their Pipeline Members

Specialization Constants Pattern

Every pipeline creation follows the same pattern of encoding both static parameters and optional shape hints as specialization constants:

specializations[0..N-1]  = layer parameters (kernel size, stride, bias_term, activation_type, ...)
specializations[N+0..N+4] = input shape (dims, w, h, c, cstep)
specializations[N+5..N+9] = output shape (dims, w, h, c, cstep)

This two-level configuration allows the shader compiler to constant-fold dimension checks when shapes are known statically, while remaining functional when shapes are dynamic.

Sources: src/layer/vulkan/convolution_vulkan.cpp435-452 src/layer/vulkan/pooling_vulkan.cpp120-165 src/layer/vulkan/interp_vulkan.cpp31-44

Vulkan Layer Implementations

Vulkan Layer Lifecycle

Pack Variant Selection

Convolution_vulkan

Compute Path Selection

Winograd Paths

Cooperative Matrix GEMM

Weight Packing Layout

ConvolutionDepthWise_vulkan

Deconvolution_vulkan

Compute Paths

InnerProduct_vulkan

Dispatch Strategy

Pooling_vulkan

Padding_vulkan

Interp_vulkan

Other Notable Vulkan Layers

Vulkan Layer Class Hierarchy

Specialization Constants Pattern

On this page

Vulkan Layer Implementations

Vulkan Layer Lifecycle

Pack Variant Selection

Convolution_vulkan

Compute Path Selection

Winograd Paths

Cooperative Matrix GEMM

Weight Packing Layout

ConvolutionDepthWise_vulkan

Deconvolution_vulkan

Compute Paths

InnerProduct_vulkan

Dispatch Strategy

Pooling_vulkan

Padding_vulkan

Interp_vulkan

Other Notable Vulkan Layers

Vulkan Layer Class Hierarchy

Specialization Constants Pattern

On this page