Backend System and Registration

Relevant source files

Purpose and Scope

This page documents the GGML backend abstraction layer: the set of types, interface structs, and registration mechanisms that allow GGML to execute tensor operations on different hardware targets through a uniform API. It covers the type hierarchy, the vtable pattern used by each backend, and the registry that discovers and loads backends at startup.

For details on specific backend implementations, see:

Page 4.3 for the CPU backend
Pages 5.1–5.5 for GPU and accelerator backends

For how the backend scheduler distributes computation graph nodes across multiple backends during inference, see pages 3.4 and 3.7.

Type Hierarchy

The backend system is organized around five distinct opaque types, each defined alongside a corresponding interface struct (vtable):

Opaque Handle	Concrete Struct	Interface Struct	Role
`ggml_backend_reg_t`	`ggml_backend_reg`	`ggml_backend_reg_i`	A compiled-in or dynamically loaded backend plugin
`ggml_backend_dev_t`	`ggml_backend_dev`	`ggml_backend_dev_i`	A specific physical device (e.g., GPU 0, GPU 1)
`ggml_backend_buffer_type_t`	`ggml_backend_buffer_type`	`ggml_backend_buffer_type_i`	Describes how memory is allocated for a device
`ggml_backend_buffer_t`	`ggml_backend_buffer`	`ggml_backend_buffer_i`	An allocated region of device memory
`ggml_backend_t`	`ggml_backend`	`ggml_backend_i`	A running compute context bound to a device

Each opaque handle is a pointer to a concrete struct that contains the vtable (iface) and a backend-specific context pointer. Backends populate these structs at construction time.

Type Hierarchy and Derivation

Sources: ggml/src/ggml-backend-impl.h ggml/include/ggml-backend.h

Interface Structures

Each type's behavior is defined by a C struct of function pointers. Backends implement these structs. The GGML core never casts to a concrete backend type; it only calls through the vtable.

`ggml_backend_reg_i`

Function Pointer	Signature	Purpose
`get_name`	`(reg) → const char *`	Human-readable backend name (e.g., `"CUDA"`, `"Metal"`)
`get_dev_count`	`(reg) → size_t`	Number of physical devices this backend manages
`get_dev`	`(reg, index) → ggml_backend_dev_t`	Access a specific device by index
`get_proc_address`	`(reg, name) → void *`	Extension point: look up optional function by name

`ggml_backend_dev_i`

Function Pointer	Purpose
`get_name`	Device name string
`get_description`	Detailed description (vendor, driver version, etc.)
`get_memory`	Query free and total device memory
`get_type`	Returns `GGML_BACKEND_DEVICE_TYPE_CPU`, `_GPU`, or `_ACCEL`
`get_props`	Populate a `ggml_backend_dev_props` struct with capabilities
`init_backend`	Instantiate a `ggml_backend_t` on this device
`get_buffer_type`	Return the default buffer type for this device
`get_host_buffer_type`	Return a pinned/mapped host buffer type for async transfers
`buffer_from_host_ptr`	Wrap an existing host pointer as a device-accessible buffer
`supports_op`	Check if a `ggml_tensor` op can execute on this device
`supports_buft`	Check if a buffer type is accessible by this device
`offload_op`	Hint to the scheduler whether an op should be offloaded
`event_new` / `event_free` / `event_synchronize`	Synchronization event management

`ggml_backend_buffer_type_i`

Function Pointer	Purpose
`get_name`	Buffer type name string
`alloc_buffer`	Allocate a `ggml_backend_buffer_t` of the given byte size
`get_alignment`	Required alignment for allocations
`get_max_size`	Maximum single allocation size (`SIZE_MAX` if unbounded)
`get_alloc_size`	Bytes needed for a specific tensor (may exceed `ggml_nbytes`)
`is_host`	Whether the buffer is CPU-accessible without an explicit copy

`ggml_backend_buffer_i`

Function Pointer	Purpose
`free_buffer`	Release the underlying allocation
`get_base`	Return the base pointer of the allocation
`init_tensor`	Bind a `ggml_tensor` to this buffer (sets `tensor->data`)
`cpy_tensor`	Backend-accelerated tensor copy (e.g., DMA, peer transfer)
`clear`	Fill the entire buffer with a byte value
`reset`	Reset any per-buffer state (e.g., memory pools)

`ggml_backend_i`

Function Pointer	Purpose
`get_name`	Instance name string
`free`	Destroy the backend instance and release its resources
`graph_plan_create` / `graph_plan_free`	Pre-compile a `ggml_cgraph` into a reusable execution plan
`graph_plan_compute`	Execute a pre-compiled plan
`graph_compute`	Execute a `ggml_cgraph` directly
`event_record` / `event_wait`	Cross-backend asynchronous dependency tracking

Sources: ggml/src/ggml-backend-impl.h ggml/src/ggml-backend.cpp31-254

Backend Registration

The Global Registry

The registry is a singleton ggml_backend_registry struct defined in ggml/src/ggml-backend-reg.cpp107-200 that owns two collections:

std::vector<ggml_backend_reg_entry> backends — one entry per plugin, plus an optional dl_handle_ptr for dynamically loaded backends.
std::vector<ggml_backend_dev_t> devices — a flat, ordered list of all devices across all backends.

Static (Compile-Time) Registration

When backends are statically linked, the ggml_backend_registry constructor registers them in a fixed priority order. This order determines which backend the scheduler prefers when multiple backends support the same operation:

CUDA → Metal → SYCL → Vulkan → WebGPU → zDNN → VirtGPU → OpenCL → ZenDNN → Hexagon → CANN → BLAS → RPC → CPU

Each backend provides a function of the form ggml_backend_<name>_reg() returning a ggml_backend_reg_t. Examples: ggml_backend_cuda_reg(), ggml_backend_metal_reg(), ggml_backend_cpu_reg().

After register_backend() stores the entry, it enumerates the backend's devices using ggml_backend_reg_dev_count() and ggml_backend_reg_dev_get(), calling register_device() for each one.

Registration Flow

Dynamic (Runtime) Loading

When GGML_BACKEND_DL=ON, each backend is compiled as a loadable MODULE library rather than being statically linked. The registry can load them at runtime:

ggml_backend_load(const char * path) — load a single .so / .dll.
ggml_backend_load_all(const char * dir) — scan a directory and load all files matching the pattern ggml-*.so / ggml-*.dll.

The dynamic loader (in ggml-backend-dl.cpp) resolves the symbol ggml_backend_<name>_reg from the shared object and calls it to obtain the registry entry, then proceeds identically to static registration.

The directory to scan can be set via the GGML_BACKEND_DIR CMake variable at build time, which bakes the path into the binary as GGML_BACKEND_DIR.

Sources: ggml/src/ggml-backend-reg.cpp102-200

CMake Build Integration

Key CMake Functions

ggml_add_backend_library(backend, sources...) — ggml/src/CMakeLists.txt247-292

Creates a CMake library target for a backend. The output type depends on GGML_BACKEND_DL:

`GGML_BACKEND_DL`	CMake Library Type	How `ggml` Links It
`OFF` (default)	`STATIC` or `SHARED`	`target_link_libraries(ggml PUBLIC ${backend})`
`ON`	`MODULE` (dlopen-able)	`add_dependencies(ggml ${backend})` only (not linked)

All backend targets link privately to ggml-base for core tensor types and utilities.

ggml_add_backend(backend) — ggml/src/CMakeLists.txt294-305

Checks the GGML_<BACKEND> CMake option, includes the backend's subdirectory, and (in static mode) adds GGML_USE_<BACKEND> as a compile definition on the ggml target so ggml-backend-reg.cpp knows which #ifdef branches to compile.

Backend Targets and Registration Functions

CMake Option	Library Target	`ggml-backend-reg.cpp` Guard	Registration Function
`GGML_CPU`	`ggml-cpu`	`GGML_USE_CPU`	`ggml_backend_cpu_reg()`
`GGML_CUDA`	`ggml-cuda`	`GGML_USE_CUDA`	`ggml_backend_cuda_reg()`
`GGML_METAL`	`ggml-metal`	`GGML_USE_METAL`	`ggml_backend_metal_reg()`
`GGML_VULKAN`	`ggml-vulkan`	`GGML_USE_VULKAN`	`ggml_backend_vk_reg()`
`GGML_HIP`	`ggml-hip`	`GGML_USE_HIP`	`ggml_backend_hip_reg()`
`GGML_SYCL`	`ggml-sycl`	`GGML_USE_SYCL`	`ggml_backend_sycl_reg()`
`GGML_OPENCL`	`ggml-opencl`	`GGML_USE_OPENCL`	`ggml_backend_opencl_reg()`
`GGML_CANN`	`ggml-cann`	`GGML_USE_CANN`	`ggml_backend_cann_reg()`
`GGML_BLAS`	`ggml-blas`	`GGML_USE_BLAS`	`ggml_backend_blas_reg()`
`GGML_RPC`	`ggml-rpc`	`GGML_USE_RPC`	`ggml_backend_rpc_reg()`

Static-Link Library Dependency Graph

Sources: ggml/src/CMakeLists.txt192-292 ggml/src/CMakeLists.txt448-462

Public API

The public API is declared in ggml/include/ggml-backend.h and implemented across ggml/src/ggml-backend.cpp and ggml/src/ggml-backend-reg.cpp

Registry and Device Enumeration

Instantiating and Destroying a Backend

Buffer Allocation

Tensor Data Transfer

Asynchronous variants (_async suffix) defer the transfer and require a subsequent ggml_backend_synchronize() call.

Graph Execution

Backend Usage Lifecycle

Sources: ggml/include/ggml-backend.h ggml/src/ggml-backend.cpp213-254

Backend Scheduler

For inference across multiple backends (e.g., some layers on GPU, some on CPU), ggml_backend_sched_t distributes nodes of a ggml_cgraph to the appropriate backends. It calls supports_op() and offload_op() on each device to assign each node, then manages inter-backend tensor copies automatically.

The backends and bufts arrays are in priority order. The first backend in the list whose device returns true from offload_op() for a given node claims that node.

Sources: ggml/include/ggml-backend.h ggml/src/ggml-backend.cpp

Buffer Usage Flags

Each ggml_backend_buffer_t carries a ggml_backend_buffer_usage tag:

Value	Meaning
`GGML_BACKEND_BUFFER_USAGE_ANY`	General purpose (default)
`GGML_BACKEND_BUFFER_USAGE_WEIGHTS`	Model weight storage; may be read-only or mmap-backed
`GGML_BACKEND_BUFFER_USAGE_COMPUTE`	Intermediate/scratch tensors

Set via ggml_backend_buffer_set_usage(). Backends may use this hint to choose between memory-mapped and regular allocations, or to enable GPU read-only caches for weight tensors.

Sources: ggml/src/ggml-backend.cpp178-196

Backend Feature Query via `get_proc_address`

Backends can expose optional named capabilities through ggml_backend_reg_get_proc_address. This is used in testing and scheduler logic to query features not present in the standard vtable:

Each ggml_backend_feature has a name and value string. For example, a CUDA backend might advertise { "flash_attn", "1" } when Flash Attention kernels are compiled in. The CPU backend uses this mechanism to advertise SIMD capability sets.

Sources: tests/test-backend-ops.cpp459-479

Backend System and Registration

Relevant source files

Purpose and Scope

For details on specific backend implementations, see:

Page 4.3 for the CPU backend
Pages 5.1–5.5 for GPU and accelerator backends

For how the backend scheduler distributes computation graph nodes across multiple backends during inference, see pages 3.4 and 3.7.

Type Hierarchy

The backend system is organized around five distinct opaque types, each defined alongside a corresponding interface struct (vtable):

Opaque Handle	Concrete Struct	Interface Struct	Role
`ggml_backend_reg_t`	`ggml_backend_reg`	`ggml_backend_reg_i`	A compiled-in or dynamically loaded backend plugin
`ggml_backend_dev_t`	`ggml_backend_dev`	`ggml_backend_dev_i`	A specific physical device (e.g., GPU 0, GPU 1)
`ggml_backend_buffer_type_t`	`ggml_backend_buffer_type`	`ggml_backend_buffer_type_i`	Describes how memory is allocated for a device
`ggml_backend_buffer_t`	`ggml_backend_buffer`	`ggml_backend_buffer_i`	An allocated region of device memory
`ggml_backend_t`	`ggml_backend`	`ggml_backend_i`	A running compute context bound to a device

Each opaque handle is a pointer to a concrete struct that contains the vtable (iface) and a backend-specific context pointer. Backends populate these structs at construction time.

Type Hierarchy and Derivation

Sources: ggml/src/ggml-backend-impl.h ggml/include/ggml-backend.h

Interface Structures

Each type's behavior is defined by a C struct of function pointers. Backends implement these structs. The GGML core never casts to a concrete backend type; it only calls through the vtable.

`ggml_backend_reg_i`

Function Pointer	Signature	Purpose
`get_name`	`(reg) → const char *`	Human-readable backend name (e.g., `"CUDA"`, `"Metal"`)
`get_dev_count`	`(reg) → size_t`	Number of physical devices this backend manages
`get_dev`	`(reg, index) → ggml_backend_dev_t`	Access a specific device by index
`get_proc_address`	`(reg, name) → void *`	Extension point: look up optional function by name

`ggml_backend_dev_i`

Function Pointer	Purpose
`get_name`	Device name string
`get_description`	Detailed description (vendor, driver version, etc.)
`get_memory`	Query free and total device memory
`get_type`	Returns `GGML_BACKEND_DEVICE_TYPE_CPU`, `_GPU`, or `_ACCEL`
`get_props`	Populate a `ggml_backend_dev_props` struct with capabilities
`init_backend`	Instantiate a `ggml_backend_t` on this device
`get_buffer_type`	Return the default buffer type for this device
`get_host_buffer_type`	Return a pinned/mapped host buffer type for async transfers
`buffer_from_host_ptr`	Wrap an existing host pointer as a device-accessible buffer
`supports_op`	Check if a `ggml_tensor` op can execute on this device
`supports_buft`	Check if a buffer type is accessible by this device
`offload_op`	Hint to the scheduler whether an op should be offloaded
`event_new` / `event_free` / `event_synchronize`	Synchronization event management

`ggml_backend_buffer_type_i`

Function Pointer	Purpose
`get_name`	Buffer type name string
`alloc_buffer`	Allocate a `ggml_backend_buffer_t` of the given byte size
`get_alignment`	Required alignment for allocations
`get_max_size`	Maximum single allocation size (`SIZE_MAX` if unbounded)
`get_alloc_size`	Bytes needed for a specific tensor (may exceed `ggml_nbytes`)
`is_host`	Whether the buffer is CPU-accessible without an explicit copy

`ggml_backend_buffer_i`

Function Pointer	Purpose
`free_buffer`	Release the underlying allocation
`get_base`	Return the base pointer of the allocation
`init_tensor`	Bind a `ggml_tensor` to this buffer (sets `tensor->data`)
`cpy_tensor`	Backend-accelerated tensor copy (e.g., DMA, peer transfer)
`clear`	Fill the entire buffer with a byte value
`reset`	Reset any per-buffer state (e.g., memory pools)

`ggml_backend_i`

Function Pointer	Purpose
`get_name`	Instance name string
`free`	Destroy the backend instance and release its resources
`graph_plan_create` / `graph_plan_free`	Pre-compile a `ggml_cgraph` into a reusable execution plan
`graph_plan_compute`	Execute a pre-compiled plan
`graph_compute`	Execute a `ggml_cgraph` directly
`event_record` / `event_wait`	Cross-backend asynchronous dependency tracking

Sources: ggml/src/ggml-backend-impl.h ggml/src/ggml-backend.cpp31-254

Backend Registration

The Global Registry

The registry is a singleton ggml_backend_registry struct defined in ggml/src/ggml-backend-reg.cpp107-200 that owns two collections:

std::vector<ggml_backend_reg_entry> backends — one entry per plugin, plus an optional dl_handle_ptr for dynamically loaded backends.
std::vector<ggml_backend_dev_t> devices — a flat, ordered list of all devices across all backends.

Static (Compile-Time) Registration

CUDA → Metal → SYCL → Vulkan → WebGPU → zDNN → VirtGPU → OpenCL → ZenDNN → Hexagon → CANN → BLAS → RPC → CPU

Each backend provides a function of the form ggml_backend_<name>_reg() returning a ggml_backend_reg_t. Examples: ggml_backend_cuda_reg(), ggml_backend_metal_reg(), ggml_backend_cpu_reg().

After register_backend() stores the entry, it enumerates the backend's devices using ggml_backend_reg_dev_count() and ggml_backend_reg_dev_get(), calling register_device() for each one.

Registration Flow

Dynamic (Runtime) Loading

When GGML_BACKEND_DL=ON, each backend is compiled as a loadable MODULE library rather than being statically linked. The registry can load them at runtime:

ggml_backend_load(const char * path) — load a single .so / .dll.
ggml_backend_load_all(const char * dir) — scan a directory and load all files matching the pattern ggml-*.so / ggml-*.dll.

The directory to scan can be set via the GGML_BACKEND_DIR CMake variable at build time, which bakes the path into the binary as GGML_BACKEND_DIR.

Sources: ggml/src/ggml-backend-reg.cpp102-200

CMake Build Integration

Key CMake Functions

ggml_add_backend_library(backend, sources...) — ggml/src/CMakeLists.txt247-292

Creates a CMake library target for a backend. The output type depends on GGML_BACKEND_DL:

`GGML_BACKEND_DL`	CMake Library Type	How `ggml` Links It
`OFF` (default)	`STATIC` or `SHARED`	`target_link_libraries(ggml PUBLIC ${backend})`
`ON`	`MODULE` (dlopen-able)	`add_dependencies(ggml ${backend})` only (not linked)

All backend targets link privately to ggml-base for core tensor types and utilities.

ggml_add_backend(backend) — ggml/src/CMakeLists.txt294-305

Backend Targets and Registration Functions

CMake Option	Library Target	`ggml-backend-reg.cpp` Guard	Registration Function
`GGML_CPU`	`ggml-cpu`	`GGML_USE_CPU`	`ggml_backend_cpu_reg()`
`GGML_CUDA`	`ggml-cuda`	`GGML_USE_CUDA`	`ggml_backend_cuda_reg()`
`GGML_METAL`	`ggml-metal`	`GGML_USE_METAL`	`ggml_backend_metal_reg()`
`GGML_VULKAN`	`ggml-vulkan`	`GGML_USE_VULKAN`	`ggml_backend_vk_reg()`
`GGML_HIP`	`ggml-hip`	`GGML_USE_HIP`	`ggml_backend_hip_reg()`
`GGML_SYCL`	`ggml-sycl`	`GGML_USE_SYCL`	`ggml_backend_sycl_reg()`
`GGML_OPENCL`	`ggml-opencl`	`GGML_USE_OPENCL`	`ggml_backend_opencl_reg()`
`GGML_CANN`	`ggml-cann`	`GGML_USE_CANN`	`ggml_backend_cann_reg()`
`GGML_BLAS`	`ggml-blas`	`GGML_USE_BLAS`	`ggml_backend_blas_reg()`
`GGML_RPC`	`ggml-rpc`	`GGML_USE_RPC`	`ggml_backend_rpc_reg()`

Static-Link Library Dependency Graph

Sources: ggml/src/CMakeLists.txt192-292 ggml/src/CMakeLists.txt448-462

Public API

The public API is declared in ggml/include/ggml-backend.h and implemented across ggml/src/ggml-backend.cpp and ggml/src/ggml-backend-reg.cpp

Registry and Device Enumeration

Instantiating and Destroying a Backend

Buffer Allocation

Tensor Data Transfer

Asynchronous variants (_async suffix) defer the transfer and require a subsequent ggml_backend_synchronize() call.

Graph Execution

Backend Usage Lifecycle

Sources: ggml/include/ggml-backend.h ggml/src/ggml-backend.cpp213-254

Backend Scheduler

The backends and bufts arrays are in priority order. The first backend in the list whose device returns true from offload_op() for a given node claims that node.

Sources: ggml/include/ggml-backend.h ggml/src/ggml-backend.cpp

Buffer Usage Flags

Each ggml_backend_buffer_t carries a ggml_backend_buffer_usage tag:

Value	Meaning
`GGML_BACKEND_BUFFER_USAGE_ANY`	General purpose (default)
`GGML_BACKEND_BUFFER_USAGE_WEIGHTS`	Model weight storage; may be read-only or mmap-backed
`GGML_BACKEND_BUFFER_USAGE_COMPUTE`	Intermediate/scratch tensors

Set via ggml_backend_buffer_set_usage(). Backends may use this hint to choose between memory-mapped and regular allocations, or to enable GPU read-only caches for weight tensors.

Sources: ggml/src/ggml-backend.cpp178-196

Backend Feature Query via `get_proc_address`

Backends can expose optional named capabilities through ggml_backend_reg_get_proc_address. This is used in testing and scheduler logic to query features not present in the standard vtable:

Sources: tests/test-backend-ops.cpp459-479

Backend System and Registration

Purpose and Scope

Type Hierarchy

Interface Structures

ggml_backend_reg_i

ggml_backend_dev_i

ggml_backend_buffer_type_i

ggml_backend_buffer_i

ggml_backend_i

Backend Registration

The Global Registry

Static (Compile-Time) Registration

Dynamic (Runtime) Loading

CMake Build Integration

Key CMake Functions

Backend Targets and Registration Functions

Static-Link Library Dependency Graph

Public API

Registry and Device Enumeration

Instantiating and Destroying a Backend

Buffer Allocation

Tensor Data Transfer

Graph Execution

Backend Usage Lifecycle

Backend Scheduler

Buffer Usage Flags

Backend Feature Query via get_proc_address

On this page

Backend System and Registration

Purpose and Scope

Type Hierarchy

Interface Structures

ggml_backend_reg_i

ggml_backend_dev_i

ggml_backend_buffer_type_i

ggml_backend_buffer_i

ggml_backend_i

Backend Registration

The Global Registry

Static (Compile-Time) Registration

Dynamic (Runtime) Loading

CMake Build Integration

Key CMake Functions

Backend Targets and Registration Functions

Static-Link Library Dependency Graph

Public API

Registry and Device Enumeration

Instantiating and Destroying a Backend

Buffer Allocation

Tensor Data Transfer

Graph Execution

Backend Usage Lifecycle

Backend Scheduler

Buffer Usage Flags

Backend Feature Query via get_proc_address

On this page

`ggml_backend_reg_i`

`ggml_backend_dev_i`

`ggml_backend_buffer_type_i`

`ggml_backend_buffer_i`

`ggml_backend_i`

Backend Feature Query via `get_proc_address`

`ggml_backend_reg_i`

`ggml_backend_dev_i`

`ggml_backend_buffer_type_i`

`ggml_backend_buffer_i`

`ggml_backend_i`

Backend Feature Query via `get_proc_address`