This page documents the GGML backend abstraction layer: the set of types, interface structs, and registration mechanisms that allow GGML to execute tensor operations on different hardware targets through a uniform API. It covers the type hierarchy, the vtable pattern used by each backend, and the registry that discovers and loads backends at startup.
For details on specific backend implementations, see:
For how the backend scheduler distributes computation graph nodes across multiple backends during inference, see pages 3.4 and 3.7.
The backend system is organized around five distinct opaque types, each defined alongside a corresponding interface struct (vtable):
| Opaque Handle | Concrete Struct | Interface Struct | Role |
|---|---|---|---|
ggml_backend_reg_t | ggml_backend_reg | ggml_backend_reg_i | A compiled-in or dynamically loaded backend plugin |
ggml_backend_dev_t | ggml_backend_dev | ggml_backend_dev_i | A specific physical device (e.g., GPU 0, GPU 1) |
ggml_backend_buffer_type_t | ggml_backend_buffer_type | ggml_backend_buffer_type_i | Describes how memory is allocated for a device |
ggml_backend_buffer_t | ggml_backend_buffer | ggml_backend_buffer_i | An allocated region of device memory |
ggml_backend_t | ggml_backend | ggml_backend_i | A running compute context bound to a device |
Each opaque handle is a pointer to a concrete struct that contains the vtable (iface) and a backend-specific context pointer. Backends populate these structs at construction time.
Type Hierarchy and Derivation
Sources: ggml/src/ggml-backend-impl.h ggml/include/ggml-backend.h
Each type's behavior is defined by a C struct of function pointers. Backends implement these structs. The GGML core never casts to a concrete backend type; it only calls through the vtable.
ggml_backend_reg_i| Function Pointer | Signature | Purpose |
|---|---|---|
get_name | (reg) → const char * | Human-readable backend name (e.g., "CUDA", "Metal") |
get_dev_count | (reg) → size_t | Number of physical devices this backend manages |
get_dev | (reg, index) → ggml_backend_dev_t | Access a specific device by index |
get_proc_address | (reg, name) → void * | Extension point: look up optional function by name |
ggml_backend_dev_i| Function Pointer | Purpose |
|---|---|
get_name | Device name string |
get_description | Detailed description (vendor, driver version, etc.) |
get_memory | Query free and total device memory |
get_type | Returns GGML_BACKEND_DEVICE_TYPE_CPU, _GPU, or _ACCEL |
get_props | Populate a ggml_backend_dev_props struct with capabilities |
init_backend | Instantiate a ggml_backend_t on this device |
get_buffer_type | Return the default buffer type for this device |
get_host_buffer_type | Return a pinned/mapped host buffer type for async transfers |
buffer_from_host_ptr | Wrap an existing host pointer as a device-accessible buffer |
supports_op | Check if a ggml_tensor op can execute on this device |
supports_buft | Check if a buffer type is accessible by this device |
offload_op | Hint to the scheduler whether an op should be offloaded |
event_new / event_free / event_synchronize | Synchronization event management |
ggml_backend_buffer_type_i| Function Pointer | Purpose |
|---|---|
get_name | Buffer type name string |
alloc_buffer | Allocate a ggml_backend_buffer_t of the given byte size |
get_alignment | Required alignment for allocations |
get_max_size | Maximum single allocation size (SIZE_MAX if unbounded) |
get_alloc_size | Bytes needed for a specific tensor (may exceed ggml_nbytes) |
is_host | Whether the buffer is CPU-accessible without an explicit copy |
ggml_backend_buffer_i| Function Pointer | Purpose |
|---|---|
free_buffer | Release the underlying allocation |
get_base | Return the base pointer of the allocation |
init_tensor | Bind a ggml_tensor to this buffer (sets tensor->data) |
cpy_tensor | Backend-accelerated tensor copy (e.g., DMA, peer transfer) |
clear | Fill the entire buffer with a byte value |
reset | Reset any per-buffer state (e.g., memory pools) |
ggml_backend_i| Function Pointer | Purpose |
|---|---|
get_name | Instance name string |
free | Destroy the backend instance and release its resources |
graph_plan_create / graph_plan_free | Pre-compile a ggml_cgraph into a reusable execution plan |
graph_plan_compute | Execute a pre-compiled plan |
graph_compute | Execute a ggml_cgraph directly |
event_record / event_wait | Cross-backend asynchronous dependency tracking |
Sources: ggml/src/ggml-backend-impl.h ggml/src/ggml-backend.cpp31-254
The registry is a singleton ggml_backend_registry struct defined in ggml/src/ggml-backend-reg.cpp107-200 that owns two collections:
std::vector<ggml_backend_reg_entry> backends — one entry per plugin, plus an optional dl_handle_ptr for dynamically loaded backends.std::vector<ggml_backend_dev_t> devices — a flat, ordered list of all devices across all backends.When backends are statically linked, the ggml_backend_registry constructor registers them in a fixed priority order. This order determines which backend the scheduler prefers when multiple backends support the same operation:
CUDA → Metal → SYCL → Vulkan → WebGPU → zDNN → VirtGPU → OpenCL → ZenDNN → Hexagon → CANN → BLAS → RPC → CPU
Each backend provides a function of the form ggml_backend_<name>_reg() returning a ggml_backend_reg_t. Examples: ggml_backend_cuda_reg(), ggml_backend_metal_reg(), ggml_backend_cpu_reg().
After register_backend() stores the entry, it enumerates the backend's devices using ggml_backend_reg_dev_count() and ggml_backend_reg_dev_get(), calling register_device() for each one.
Registration Flow
When GGML_BACKEND_DL=ON, each backend is compiled as a loadable MODULE library rather than being statically linked. The registry can load them at runtime:
ggml_backend_load(const char * path) — load a single .so / .dll.ggml_backend_load_all(const char * dir) — scan a directory and load all files matching the pattern ggml-*.so / ggml-*.dll.The dynamic loader (in ggml-backend-dl.cpp) resolves the symbol ggml_backend_<name>_reg from the shared object and calls it to obtain the registry entry, then proceeds identically to static registration.
The directory to scan can be set via the GGML_BACKEND_DIR CMake variable at build time, which bakes the path into the binary as GGML_BACKEND_DIR.
Sources: ggml/src/ggml-backend-reg.cpp102-200
ggml_add_backend_library(backend, sources...) — ggml/src/CMakeLists.txt247-292
Creates a CMake library target for a backend. The output type depends on GGML_BACKEND_DL:
GGML_BACKEND_DL | CMake Library Type | How ggml Links It |
|---|---|---|
OFF (default) | STATIC or SHARED | target_link_libraries(ggml PUBLIC ${backend}) |
ON | MODULE (dlopen-able) | add_dependencies(ggml ${backend}) only (not linked) |
All backend targets link privately to ggml-base for core tensor types and utilities.
ggml_add_backend(backend) — ggml/src/CMakeLists.txt294-305
Checks the GGML_<BACKEND> CMake option, includes the backend's subdirectory, and (in static mode) adds GGML_USE_<BACKEND> as a compile definition on the ggml target so ggml-backend-reg.cpp knows which #ifdef branches to compile.
| CMake Option | Library Target | ggml-backend-reg.cpp Guard | Registration Function |
|---|---|---|---|
GGML_CPU | ggml-cpu | GGML_USE_CPU | ggml_backend_cpu_reg() |
GGML_CUDA | ggml-cuda | GGML_USE_CUDA | ggml_backend_cuda_reg() |
GGML_METAL | ggml-metal | GGML_USE_METAL | ggml_backend_metal_reg() |
GGML_VULKAN | ggml-vulkan | GGML_USE_VULKAN | ggml_backend_vk_reg() |
GGML_HIP | ggml-hip | GGML_USE_HIP | ggml_backend_hip_reg() |
GGML_SYCL | ggml-sycl | GGML_USE_SYCL | ggml_backend_sycl_reg() |
GGML_OPENCL | ggml-opencl | GGML_USE_OPENCL | ggml_backend_opencl_reg() |
GGML_CANN | ggml-cann | GGML_USE_CANN | ggml_backend_cann_reg() |
GGML_BLAS | ggml-blas | GGML_USE_BLAS | ggml_backend_blas_reg() |
GGML_RPC | ggml-rpc | GGML_USE_RPC | ggml_backend_rpc_reg() |
Sources: ggml/src/CMakeLists.txt192-292 ggml/src/CMakeLists.txt448-462
The public API is declared in ggml/include/ggml-backend.h and implemented across ggml/src/ggml-backend.cpp and ggml/src/ggml-backend-reg.cpp
Asynchronous variants (_async suffix) defer the transfer and require a subsequent ggml_backend_synchronize() call.
Sources: ggml/include/ggml-backend.h ggml/src/ggml-backend.cpp213-254
For inference across multiple backends (e.g., some layers on GPU, some on CPU), ggml_backend_sched_t distributes nodes of a ggml_cgraph to the appropriate backends. It calls supports_op() and offload_op() on each device to assign each node, then manages inter-backend tensor copies automatically.
The backends and bufts arrays are in priority order. The first backend in the list whose device returns true from offload_op() for a given node claims that node.
Sources: ggml/include/ggml-backend.h ggml/src/ggml-backend.cpp
Each ggml_backend_buffer_t carries a ggml_backend_buffer_usage tag:
| Value | Meaning |
|---|---|
GGML_BACKEND_BUFFER_USAGE_ANY | General purpose (default) |
GGML_BACKEND_BUFFER_USAGE_WEIGHTS | Model weight storage; may be read-only or mmap-backed |
GGML_BACKEND_BUFFER_USAGE_COMPUTE | Intermediate/scratch tensors |
Set via ggml_backend_buffer_set_usage(). Backends may use this hint to choose between memory-mapped and regular allocations, or to enable GPU read-only caches for weight tensors.
Sources: ggml/src/ggml-backend.cpp178-196
get_proc_addressBackends can expose optional named capabilities through ggml_backend_reg_get_proc_address. This is used in testing and scheduler logic to query features not present in the standard vtable:
Each ggml_backend_feature has a name and value string. For example, a CUDA backend might advertise { "flash_attn", "1" } when Flash Attention kernels are compiled in. The CPU backend uses this mechanism to advertise SIMD capability sets.
Sources: tests/test-backend-ops.cpp459-479
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.