Runtime CPU Detection and Dispatch

Relevant source files

Purpose and Scope

This page documents ncnn's runtime CPU feature detection and dynamic dispatch system. At startup, ncnn detects which CPU instruction set extensions (ISA) are available on the host processor and selects the most optimized layer implementations accordingly. This allows a single ncnn binary to run efficiently across different CPU architectures and feature levels without requiring separate builds for each ISA variant.

For details on specific optimizations for ARM and x86 platforms, see ARM NEON Optimizations and x86 SIMD Optimizations. For quantized inference modes, see INT8 Quantization and Precision Modes.

Overview of Runtime CPU Dispatch

ncnn's runtime dispatch system operates through three main components:

CPU Feature Detection: At library initialization, ncnn queries the operating system and hardware to determine available CPU features
Layer Registry System: Multiple layer implementation variants are compiled with different ISA flags and registered in separate arrays
Dynamic Layer Creation: When creating a layer, ncnn selects the most optimized variant based on detected CPU capabilities

Diagram: Runtime CPU Dispatch Flow

Sources: src/cpu.cpp182-250 src/layer.cpp145-230

Build Configuration

NCNN_RUNTIME_CPU Option

Runtime CPU dispatch is controlled by the NCNN_RUNTIME_CPU CMake option, which is enabled by default:

CMakeLists.txt84

option(NCNN_RUNTIME_CPU "runtime dispatch cpu routines" ON)

When NCNN_RUNTIME_CPU=ON, ncnn compiles multiple ISA-specific versions of performance-critical layers. Each variant is compiled with appropriate compiler flags to enable specific instruction sets.

Compilation of ISA Variants

The build system generates multiple layer source files with different ISA flags. The ncnn_add_arch_opt_layer macro creates ISA-specific variants:

cmake/ncnn_add_layer.cmake2-46

For x86 platforms, variants are generated for:

Generic (no special flags)
AVX: layer_registry_avx[]
FMA: layer_registry_fma[]
AVX512: layer_registry_avx512[]
AVX512_VNNI: layer_registry_avx512_vnni[]

For ARM platforms:

Generic
Architecture-specific: layer_registry_arch[] (NEON, dotprod, etc.)

The layer source generation is automated through custom CMake commands that duplicate and rename layer implementations:

cmake/ncnn_add_layer.cmake8-27

Sources: CMakeLists.txt84 cmake/ncnn_add_layer.cmake2-80 src/CMakeLists.txt405-720

CPU Feature Detection

Platform-Specific Detection Methods

ncnn uses different methods to detect CPU capabilities depending on the platform:

Platform	Method	Key Functions
Linux/Android	`getauxval()` + `/proc/self/auxv`	`get_elf_hwcap()`
Windows (ARM)	`ruapu` library	`ruapu_init()`, `ruapu_supports()`
macOS/iOS	`sysctlbyname()`	`get_hw_cpufamily()`, `get_hw_capability()`
x86/x86_64	CPUID instruction	`x86_cpuid()`, `x86_get_xcr0()`

Diagram: Platform-Specific CPU Detection Methods

Sources: src/cpu.cpp312-519 src/cpu.cpp521-838

Hardware Capability Queries

Linux/Android ARM Detection

On Linux and Android, ncnn reads hardware capabilities from the ELF auxiliary vector:

src/cpu.cpp367-403

The get_elf_hwcap_from_getauxval() function first attempts to use getauxval() (available on Android API level 18+), then falls back to reading /proc/self/auxv:

src/cpu.cpp407-451

Hardware capability flags are defined for ARM64:

HWCAP_ASIMD: NEON/ASIMD support
HWCAP_ASIMDHP: Half-precision floating point
HWCAP_ASIMDDP: Dot product instructions
HWCAP2_I8MM: 8-bit integer matrix multiply
HWCAP2_BF16: BFloat16 support
HWCAP_SVE, HWCAP2_SVE2: Scalable Vector Extensions

src/cpu.cpp314-348

Windows ARM Detection

On Windows ARM, ncnn uses the ruapu library for runtime feature detection:

src/cpu.cpp140-143

The ruapu library performs safe feature detection by testing instructions and catching illegal instruction exceptions.

Apple Platform Detection

On macOS and iOS, ncnn queries CPU features through sysctlbyname():

src/cpu.cpp487-518

Key queries include:

hw.cpufamily: CPU family identifier (A-series chip generation)
hw.optional.arm.FEAT_FP16: FP16 arithmetic support
hw.optional.arm.FEAT_DotProd: Dot product support
hw.optional.arm.FEAT_BF16: BFloat16 support

CPU family identifiers distinguish between generations:

CPUFAMILY_ARM_FIRESTORM_ICESTORM: A14/M1
CPUFAMILY_ARM_AVALANCHE_BLIZZARD: A15/M2
CPUFAMILY_ARM_EVEREST_SAWTOOTH: A16
CPUFAMILY_ARM_DONAN: M4

src/cpu.cpp75-133

x86/x86_64 Detection

On x86 platforms, ncnn uses the CPUID instruction to query processor features:

src/cpu.cpp522-567

The detection process involves:

Execute CPUID with function 0 to get maximum supported level
Execute CPUID with function 1 to get basic feature flags
Execute CPUID with function 7 (sub-function 0/1) for extended features
Execute XGETBV instruction to verify OS support for AVX/AVX512

src/cpu.cpp569-658

Example AVX2 detection logic:

1. Check CPUID function 0 returns >= 7
2. Check CPUID function 1 returns AVX, XSAVE, OSXSAVE bits
3. Check XCR0 register has AVX state enabled (bits 1-2)
4. Check CPUID function 7.0 returns AVX2 bit

Sources: src/cpu.cpp312-838 src/cpu.h44-101

Global Feature Flags

CPU detection results are stored in global static variables:

src/cpu.cpp182-250

For ARM platforms:

For x86 platforms:

These flags are initialized once at startup and queried during layer creation.

Sources: src/cpu.cpp182-250

Layer Registry System

Multiple Registry Arrays

ncnn maintains separate layer registry arrays for different ISA levels. Each array contains function pointers to layer creator functions:

src/layer_registry.h.in1-34

The template generates:

layer_registry[]: Generic implementations
layer_registry_arch[]: Architecture-specific (ARM NEON, etc.)
layer_registry_avx[]: AVX-optimized (x86)
layer_registry_fma[]: FMA-optimized (x86)
layer_registry_avx512[]: AVX512-optimized (x86)
layer_registry_avx512_vnni[]: AVX512-VNNI-optimized
Similar for other ISA extensions

Each registry entry maps a layer type to its creator function:

ISA-Specific Layer Variants

For each layer that benefits from ISA-specific optimization, multiple source files are generated:

Example for Convolution layer:

convolution.cpp - Generic implementation
convolution_x86.cpp - x86 base implementation
convolution_x86_avx.cpp - Generated with AVX flags
convolution_x86_avx512.cpp - Generated with AVX512 flags
convolution_arm.cpp - ARM base implementation
convolution_arm_asimdhp.cpp - Generated with FP16 flags

The generation is performed by ncnn_generate_*_source.cmake scripts that:

Copy the source file to a new location
Rename the class (e.g., Convolution_x86 → Convolution_x86_avx)
The build system compiles with appropriate ISA flags

cmake/ncnn_add_layer.cmake2-46

Sources: src/layer_registry.h.in1-34 cmake/ncnn_add_layer.cmake2-80

Runtime Dispatch Mechanism

Layer Creation Flow

When a layer is created, ncnn selects the most optimized implementation through a multi-level dispatch:

Diagram: Layer Creation and ISA Dispatch

ISA Selection Logic

The runtime dispatch logic is implemented in create_layer():

src/layer.cpp161-230

Selection priority (highest to lowest):

For x86 platforms:

AVX512-VNNI: If NCNN_RUNTIME_CPU && NCNN_AVX512VNNI && cpu_support_x86_avx512_vnni()
AVX512-FP16: If NCNN_RUNTIME_CPU && NCNN_AVX512FP16 && cpu_support_x86_avx512_fp16()
AVX512-BF16: If NCNN_RUNTIME_CPU && NCNN_AVX512BF16 && cpu_support_x86_avx512_bf16()
AVX512: If NCNN_RUNTIME_CPU && NCNN_AVX512 && cpu_support_x86_avx512()
AVX-VNNI: If NCNN_RUNTIME_CPU && NCNN_AVXVNNI && cpu_support_x86_avx_vnni()
FMA: If NCNN_RUNTIME_CPU && NCNN_FMA && cpu_support_x86_fma()
AVX: If NCNN_RUNTIME_CPU && NCNN_AVX && cpu_support_x86_avx()
Generic fallback

For ARM platforms:

Architecture-specific (includes NEON, dotprod, i8mm, SVE variants)
Generic fallback

The logic checks if:

The ISA was enabled at compile time (e.g., NCNN_AVX512)
The runtime CPU supports it (e.g., cpu_support_x86_avx512())
A layer variant exists in the registry (non-NULL creator function)

src/layer.cpp161-230

Example: Convolution Layer Dispatch

For the Convolution layer on an AVX512-capable CPU:

Build time: Multiple variants compiled
- convolution.cpp → Convolution_layer_creator → layer_registry[typeindex]
- convolution_x86_avx512.cpp → Convolution_x86_avx512_layer_creator → layer_registry_avx512[typeindex]
Runtime: Feature detection
- cpu_support_x86_avx512() returns 1
Layer creation: Dispatch selects AVX512
- create_layer() finds non-NULL creator in layer_registry_avx512[]
- Calls Convolution_x86_avx512_layer_creator()
- Returns Convolution_x86_avx512* instance

If AVX512 is not available:

Falls back to next available ISA (AVX2/FMA/AVX)
Eventually falls back to generic implementation

Sources: src/layer.cpp161-230 src/layer_registry.h.in1-34

Supported ISA Features

ARM ISA Features

Feature	Detection Function	Description
NEON	`cpu_support_arm_neon()`	Basic SIMD (armv7) or ASIMD (armv8)
VFPv4	`cpu_support_arm_vfpv4()`	FP16 conversion + FMA
ASIMDHP	`cpu_support_arm_asimdhp()`	FP16 arithmetic (ARMv8.2)
ASIMDDP	`cpu_support_arm_asimddp()`	Dot product (ARMv8.2)
ASIMDFHM	`cpu_support_arm_asimdfhm()`	FP16 FML (ARMv8.2)
BF16	`cpu_support_arm_bf16()`	BFloat16 (ARMv8.4)
I8MM	`cpu_support_arm_i8mm()`	INT8 matrix multiply (ARMv8.4)
SVE	`cpu_support_arm_sve()`	Scalable Vector Ext (ARMv8.6)
SVE2	`cpu_support_arm_sve2()`	SVE version 2

x86 ISA Features

Feature	Detection Function	Description
AVX	`cpu_support_x86_avx()`	256-bit vectors
FMA	`cpu_support_x86_fma()`	Fused multiply-add
XOP	`cpu_support_x86_xop()`	AMD extended operations
F16C	`cpu_support_x86_f16c()`	FP16 conversion
AVX2	`cpu_support_x86_avx2()`	AVX2 + FMA + F16C
AVX-VNNI	`cpu_support_x86_avx_vnni()`	AVX VNNI
AVX512	`cpu_support_x86_avx512()`	AVX512F/BW/CD/DQ/VL
AVX512-VNNI	`cpu_support_x86_avx512_vnni()`	AVX512 VNNI
AVX512-BF16	`cpu_support_x86_avx512_bf16()`	AVX512 BFloat16
AVX512-FP16	`cpu_support_x86_avx512_fp16()`	AVX512 FP16

Other Architectures

Architecture	Features	Detection
MIPS	MSA, MMI	`cpu_support_mips_msa()`, `cpu_support_loongson_mmi()`
LoongArch	LSX, LASX	`cpu_support_loongarch_lsx()`, `cpu_support_loongarch_lasx()`
RISC-V	RVV, ZFH	`cpu_support_riscv_v()`, `cpu_support_riscv_zfh()`

Sources: src/cpu.h43-115 src/cpu.cpp839-1419

Disabling Runtime CPU Dispatch

To build ncnn without runtime dispatch (smaller binary, single ISA):

This compiles only the enabled ISA variants and removes dispatch overhead. The resulting binary requires CPUs with those features or will crash with illegal instruction errors.

For maximum compatibility, enable NCNN_RUNTIME_CPU=ON (default) and let ncnn automatically select the best implementation at runtime.

Sources: CMakeLists.txt84 src/CMakeLists.txt405-720

Runtime CPU Detection and Dispatch

Relevant source files

Purpose and Scope

For details on specific optimizations for ARM and x86 platforms, see ARM NEON Optimizations and x86 SIMD Optimizations. For quantized inference modes, see INT8 Quantization and Precision Modes.

Overview of Runtime CPU Dispatch

ncnn's runtime dispatch system operates through three main components:

CPU Feature Detection: At library initialization, ncnn queries the operating system and hardware to determine available CPU features
Layer Registry System: Multiple layer implementation variants are compiled with different ISA flags and registered in separate arrays
Dynamic Layer Creation: When creating a layer, ncnn selects the most optimized variant based on detected CPU capabilities

Diagram: Runtime CPU Dispatch Flow

Sources: src/cpu.cpp182-250 src/layer.cpp145-230

Build Configuration

NCNN_RUNTIME_CPU Option

Runtime CPU dispatch is controlled by the NCNN_RUNTIME_CPU CMake option, which is enabled by default:

CMakeLists.txt84

option(NCNN_RUNTIME_CPU "runtime dispatch cpu routines" ON)

When NCNN_RUNTIME_CPU=ON, ncnn compiles multiple ISA-specific versions of performance-critical layers. Each variant is compiled with appropriate compiler flags to enable specific instruction sets.

Compilation of ISA Variants

The build system generates multiple layer source files with different ISA flags. The ncnn_add_arch_opt_layer macro creates ISA-specific variants:

cmake/ncnn_add_layer.cmake2-46

For x86 platforms, variants are generated for:

Generic (no special flags)
AVX: layer_registry_avx[]
FMA: layer_registry_fma[]
AVX512: layer_registry_avx512[]
AVX512_VNNI: layer_registry_avx512_vnni[]

For ARM platforms:

Generic
Architecture-specific: layer_registry_arch[] (NEON, dotprod, etc.)

The layer source generation is automated through custom CMake commands that duplicate and rename layer implementations:

cmake/ncnn_add_layer.cmake8-27

Sources: CMakeLists.txt84 cmake/ncnn_add_layer.cmake2-80 src/CMakeLists.txt405-720

CPU Feature Detection

Platform-Specific Detection Methods

ncnn uses different methods to detect CPU capabilities depending on the platform:

Platform	Method	Key Functions
Linux/Android	`getauxval()` + `/proc/self/auxv`	`get_elf_hwcap()`
Windows (ARM)	`ruapu` library	`ruapu_init()`, `ruapu_supports()`
macOS/iOS	`sysctlbyname()`	`get_hw_cpufamily()`, `get_hw_capability()`
x86/x86_64	CPUID instruction	`x86_cpuid()`, `x86_get_xcr0()`

Diagram: Platform-Specific CPU Detection Methods

Sources: src/cpu.cpp312-519 src/cpu.cpp521-838

Hardware Capability Queries

Linux/Android ARM Detection

On Linux and Android, ncnn reads hardware capabilities from the ELF auxiliary vector:

src/cpu.cpp367-403

The get_elf_hwcap_from_getauxval() function first attempts to use getauxval() (available on Android API level 18+), then falls back to reading /proc/self/auxv:

src/cpu.cpp407-451

Hardware capability flags are defined for ARM64:

HWCAP_ASIMD: NEON/ASIMD support
HWCAP_ASIMDHP: Half-precision floating point
HWCAP_ASIMDDP: Dot product instructions
HWCAP2_I8MM: 8-bit integer matrix multiply
HWCAP2_BF16: BFloat16 support
HWCAP_SVE, HWCAP2_SVE2: Scalable Vector Extensions

src/cpu.cpp314-348

Windows ARM Detection

On Windows ARM, ncnn uses the ruapu library for runtime feature detection:

src/cpu.cpp140-143

The ruapu library performs safe feature detection by testing instructions and catching illegal instruction exceptions.

Apple Platform Detection

On macOS and iOS, ncnn queries CPU features through sysctlbyname():

src/cpu.cpp487-518

Key queries include:

hw.cpufamily: CPU family identifier (A-series chip generation)
hw.optional.arm.FEAT_FP16: FP16 arithmetic support
hw.optional.arm.FEAT_DotProd: Dot product support
hw.optional.arm.FEAT_BF16: BFloat16 support

CPU family identifiers distinguish between generations:

CPUFAMILY_ARM_FIRESTORM_ICESTORM: A14/M1
CPUFAMILY_ARM_AVALANCHE_BLIZZARD: A15/M2
CPUFAMILY_ARM_EVEREST_SAWTOOTH: A16
CPUFAMILY_ARM_DONAN: M4

src/cpu.cpp75-133

x86/x86_64 Detection

On x86 platforms, ncnn uses the CPUID instruction to query processor features:

src/cpu.cpp522-567

The detection process involves:

Execute CPUID with function 0 to get maximum supported level
Execute CPUID with function 1 to get basic feature flags
Execute CPUID with function 7 (sub-function 0/1) for extended features
Execute XGETBV instruction to verify OS support for AVX/AVX512

src/cpu.cpp569-658

Example AVX2 detection logic:

1. Check CPUID function 0 returns >= 7
2. Check CPUID function 1 returns AVX, XSAVE, OSXSAVE bits
3. Check XCR0 register has AVX state enabled (bits 1-2)
4. Check CPUID function 7.0 returns AVX2 bit

Sources: src/cpu.cpp312-838 src/cpu.h44-101

Global Feature Flags

CPU detection results are stored in global static variables:

src/cpu.cpp182-250

For ARM platforms:

For x86 platforms:

These flags are initialized once at startup and queried during layer creation.

Sources: src/cpu.cpp182-250

Layer Registry System

Multiple Registry Arrays

ncnn maintains separate layer registry arrays for different ISA levels. Each array contains function pointers to layer creator functions:

src/layer_registry.h.in1-34

The template generates:

layer_registry[]: Generic implementations
layer_registry_arch[]: Architecture-specific (ARM NEON, etc.)
layer_registry_avx[]: AVX-optimized (x86)
layer_registry_fma[]: FMA-optimized (x86)
layer_registry_avx512[]: AVX512-optimized (x86)
layer_registry_avx512_vnni[]: AVX512-VNNI-optimized
Similar for other ISA extensions

Each registry entry maps a layer type to its creator function:

ISA-Specific Layer Variants

For each layer that benefits from ISA-specific optimization, multiple source files are generated:

Example for Convolution layer:

convolution.cpp - Generic implementation
convolution_x86.cpp - x86 base implementation
convolution_x86_avx.cpp - Generated with AVX flags
convolution_x86_avx512.cpp - Generated with AVX512 flags
convolution_arm.cpp - ARM base implementation
convolution_arm_asimdhp.cpp - Generated with FP16 flags

The generation is performed by ncnn_generate_*_source.cmake scripts that:

Copy the source file to a new location
Rename the class (e.g., Convolution_x86 → Convolution_x86_avx)
The build system compiles with appropriate ISA flags

cmake/ncnn_add_layer.cmake2-46

Sources: src/layer_registry.h.in1-34 cmake/ncnn_add_layer.cmake2-80

Runtime Dispatch Mechanism

Layer Creation Flow

When a layer is created, ncnn selects the most optimized implementation through a multi-level dispatch:

Diagram: Layer Creation and ISA Dispatch

ISA Selection Logic

The runtime dispatch logic is implemented in create_layer():

src/layer.cpp161-230

Selection priority (highest to lowest):

For x86 platforms:

AVX512-VNNI: If NCNN_RUNTIME_CPU && NCNN_AVX512VNNI && cpu_support_x86_avx512_vnni()
AVX512-FP16: If NCNN_RUNTIME_CPU && NCNN_AVX512FP16 && cpu_support_x86_avx512_fp16()
AVX512-BF16: If NCNN_RUNTIME_CPU && NCNN_AVX512BF16 && cpu_support_x86_avx512_bf16()
AVX512: If NCNN_RUNTIME_CPU && NCNN_AVX512 && cpu_support_x86_avx512()
AVX-VNNI: If NCNN_RUNTIME_CPU && NCNN_AVXVNNI && cpu_support_x86_avx_vnni()
FMA: If NCNN_RUNTIME_CPU && NCNN_FMA && cpu_support_x86_fma()
AVX: If NCNN_RUNTIME_CPU && NCNN_AVX && cpu_support_x86_avx()
Generic fallback

For ARM platforms:

Architecture-specific (includes NEON, dotprod, i8mm, SVE variants)
Generic fallback

The logic checks if:

The ISA was enabled at compile time (e.g., NCNN_AVX512)
The runtime CPU supports it (e.g., cpu_support_x86_avx512())
A layer variant exists in the registry (non-NULL creator function)

src/layer.cpp161-230

Example: Convolution Layer Dispatch

For the Convolution layer on an AVX512-capable CPU:

Build time: Multiple variants compiled
- convolution.cpp → Convolution_layer_creator → layer_registry[typeindex]
- convolution_x86_avx512.cpp → Convolution_x86_avx512_layer_creator → layer_registry_avx512[typeindex]
Runtime: Feature detection
- cpu_support_x86_avx512() returns 1
Layer creation: Dispatch selects AVX512
- create_layer() finds non-NULL creator in layer_registry_avx512[]
- Calls Convolution_x86_avx512_layer_creator()
- Returns Convolution_x86_avx512* instance

If AVX512 is not available:

Falls back to next available ISA (AVX2/FMA/AVX)
Eventually falls back to generic implementation

Sources: src/layer.cpp161-230 src/layer_registry.h.in1-34

Supported ISA Features

ARM ISA Features

Feature	Detection Function	Description
NEON	`cpu_support_arm_neon()`	Basic SIMD (armv7) or ASIMD (armv8)
VFPv4	`cpu_support_arm_vfpv4()`	FP16 conversion + FMA
ASIMDHP	`cpu_support_arm_asimdhp()`	FP16 arithmetic (ARMv8.2)
ASIMDDP	`cpu_support_arm_asimddp()`	Dot product (ARMv8.2)
ASIMDFHM	`cpu_support_arm_asimdfhm()`	FP16 FML (ARMv8.2)
BF16	`cpu_support_arm_bf16()`	BFloat16 (ARMv8.4)
I8MM	`cpu_support_arm_i8mm()`	INT8 matrix multiply (ARMv8.4)
SVE	`cpu_support_arm_sve()`	Scalable Vector Ext (ARMv8.6)
SVE2	`cpu_support_arm_sve2()`	SVE version 2

x86 ISA Features

Feature	Detection Function	Description
AVX	`cpu_support_x86_avx()`	256-bit vectors
FMA	`cpu_support_x86_fma()`	Fused multiply-add
XOP	`cpu_support_x86_xop()`	AMD extended operations
F16C	`cpu_support_x86_f16c()`	FP16 conversion
AVX2	`cpu_support_x86_avx2()`	AVX2 + FMA + F16C
AVX-VNNI	`cpu_support_x86_avx_vnni()`	AVX VNNI
AVX512	`cpu_support_x86_avx512()`	AVX512F/BW/CD/DQ/VL
AVX512-VNNI	`cpu_support_x86_avx512_vnni()`	AVX512 VNNI
AVX512-BF16	`cpu_support_x86_avx512_bf16()`	AVX512 BFloat16
AVX512-FP16	`cpu_support_x86_avx512_fp16()`	AVX512 FP16

Other Architectures

Architecture	Features	Detection
MIPS	MSA, MMI	`cpu_support_mips_msa()`, `cpu_support_loongson_mmi()`
LoongArch	LSX, LASX	`cpu_support_loongarch_lsx()`, `cpu_support_loongarch_lasx()`
RISC-V	RVV, ZFH	`cpu_support_riscv_v()`, `cpu_support_riscv_zfh()`

Sources: src/cpu.h43-115 src/cpu.cpp839-1419

Disabling Runtime CPU Dispatch

To build ncnn without runtime dispatch (smaller binary, single ISA):

This compiles only the enabled ISA variants and removes dispatch overhead. The resulting binary requires CPUs with those features or will crash with illegal instruction errors.

For maximum compatibility, enable NCNN_RUNTIME_CPU=ON (default) and let ncnn automatically select the best implementation at runtime.

Sources: CMakeLists.txt84 src/CMakeLists.txt405-720

Runtime CPU Detection and Dispatch

Purpose and Scope

Overview of Runtime CPU Dispatch

Build Configuration

NCNN_RUNTIME_CPU Option

Compilation of ISA Variants

CPU Feature Detection

Platform-Specific Detection Methods

Hardware Capability Queries

Linux/Android ARM Detection

Windows ARM Detection

Apple Platform Detection

x86/x86_64 Detection

Global Feature Flags

Layer Registry System

Multiple Registry Arrays

ISA-Specific Layer Variants

Runtime Dispatch Mechanism

Layer Creation Flow

ISA Selection Logic

Example: Convolution Layer Dispatch

Supported ISA Features

ARM ISA Features

x86 ISA Features

Other Architectures

Disabling Runtime CPU Dispatch

On this page

Runtime CPU Detection and Dispatch

Purpose and Scope

Overview of Runtime CPU Dispatch

Build Configuration

NCNN_RUNTIME_CPU Option

Compilation of ISA Variants

CPU Feature Detection

Platform-Specific Detection Methods

Hardware Capability Queries

Linux/Android ARM Detection

Windows ARM Detection

Apple Platform Detection

x86/x86_64 Detection

Global Feature Flags

Layer Registry System

Multiple Registry Arrays

ISA-Specific Layer Variants

Runtime Dispatch Mechanism

Layer Creation Flow

ISA Selection Logic

Example: Convolution Layer Dispatch

Supported ISA Features

ARM ISA Features

x86 ISA Features

Other Architectures

Disabling Runtime CPU Dispatch

On this page