This document describes the Continuous Integration and Continuous Deployment (CI/CD) infrastructure for llama.cpp, including GitHub Actions workflows, build matrices, testing procedures, and release automation. For information about the build system itself, see page 9.1. For testing infrastructure details, see page 9.2.
The CI/CD pipeline ensures code quality and cross-platform compatibility through automated building and testing across 30+ platform/backend configurations. The system uses GitHub Actions for cloud-based CI and supports self-hosted runners for specialized hardware testing.
Sources: .github/workflows/build.yml1-60 ci/README.md1-34
Workflow Trigger Configuration:
The main build workflow triggers on:
master branch for code files (.github/workflows/build.yml6-25)Concurrency control prevents redundant runs: .github/workflows/build.yml48-50
Sources: .github/workflows/build.yml3-58 .github/workflows/release.yml3-22
CI Build Job Map (build.yml)
The build matrix spans multiple dimensions:
| Category | Variants | Key Job(s) |
|---|---|---|
| Architectures | x86_64, ARM64, s390x (big-endian), ppc64le | ubuntu-cpu-cmake .github/workflows/build.yml176-238 |
| GPU Backends | CUDA, Metal, Vulkan, HIP, SYCL, MUSA, WebGPU, OpenCL | Multiple backend-specific jobs |
| Compilers | GCC, Clang (LLVM), MSVC, NVCC, HIPCC, icpx (oneAPI) | Platform-dependent |
| Build Types | Debug, Release, Sanitizers (ADDRESS/THREAD/UNDEFINED) | ubuntu-latest-cmake-sanitizer .github/workflows/build.yml261-318 |
| Mobile Platforms | iOS, tvOS, visionOS | macOS-latest-cmake-ios .github/workflows/build.yml752-782 |
| WebAssembly | WASM + WebGPU via Emscripten | ubuntu-24-wasm-webgpu .github/workflows/build.yml546-585 |
Sources: .github/workflows/build.yml61-1337
macOS ARM64 with Metal (representative job):
The macOS-latest-cmake-arm64 job uses actions/checkout@v6, ggml-org/[email protected], and runs CMake with Metal and RPC enabled. After building, it executes leaks -atExit for memory leak detection and then ctest -L main. .github/workflows/build.yml62-96
Key CMake flags used across jobs:
| Flag | Purpose | Where Used |
|---|---|---|
LLAMA_FATAL_WARNINGS=ON | Treat compiler warnings as errors | Most jobs |
GGML_METAL_USE_BF16=ON | BF16 support in Metal shaders | macOS ARM64 |
GGML_METAL_EMBED_LIBRARY=OFF | Do not embed .metallib | macOS CI (OFF), release (ON) |
GGML_RPC=ON | Build the RPC backend | Most CPU jobs |
GGML_BACKEND_DL=ON | Dynamic backend loading as shared libs | Vulkan, CPU variant jobs |
GGML_CPU_ALL_VARIANTS=ON | Build SSE/AVX/AVX2/AVX512/AMX CPU variants | Vulkan + release jobs |
GGML_SCHED_NO_REALLOC=ON | Disable scheduler realloc (used in ci/run.sh) | Local CI script |
LLAMA_BUILD_BORINGSSL=ON | Link against BoringSSL for TLS | macOS, Windows jobs |
Sources: .github/workflows/build.yml61-96 .github/workflows/build.yml383-415 ci/run.sh48-49
CI Test Organization (ctest labels → executables)
Test Labels and Organization:
Tests are organized using CMake labels defined in tests/CMakeLists.txt:
main - Core tests that run on every build (tests/CMakeLists.txt22-46)model - Tests requiring downloaded model files (tests/CMakeLists.txt238-244)python - Python script tests (tests/CMakeLists.txt194)Test Execution Examples:
Standard test run on all CI jobs:
cd build && ctest -L main --verbose --timeout 900
.github/workflows/build.yml94-96
With model files (local ci/run.sh):
LLAMACPP_TEST_MODELFILE="$model" ctest --output-on-failure -L model
Model Download Fixture:
The test-download-model test is declared as a FIXTURES_SETUP target in CMake. Tests that need a model file declare FIXTURES_REQUIRED test-download-model. The model is tinyllamas/stories15M-q4_0.gguf (s390x uses a big-endian variant).
Sources: .github/workflows/build.yml90-96 tests/CMakeLists.txt119-267 ci/run.sh212-340
The ci/run.sh script allows developers to run the full CI pipeline locally before pushing changes:
ci/run.sh Function and Variable Map
Build Configuration:
The script constructs CMAKE_EXTRA based on environment variables. For CUDA, it auto-detects the GPU architecture via nvidia-smi and falls back to a multi-arch list if detection fails. For SYCL, it requires ONEAPI_ROOT to be set. For ROCm, GG_BUILD_AMDGPU_TARGETS must specify the GPU target (e.g. gfx1100). ci/run.sh48-166
Test Execution Functions:
Each CI stage is implemented as a function pair:
gg_run_<stage> — executes the stage, writes to $OUT/<ci>.log and $OUT/<ci>.exitgg_sum_<stage> — appends a Markdown summary to $OUT/README.md via gg_printfThe gg_run helper calls these functions and OR-accumulates exit codes into $ret.
Full Model Pipeline (gg_run_qwen3_0_6b):
The qwen3_0_6b stage performs a complete model workflow on Qwen3-0.6B-Base:
convert_hf_to_gguf.py (ci/run.sh387-389)llama-quantize (ci/run.sh405-414)llama-completion for each quantization (ci/run.sh418-429)llama-perplexity against Wikitext-2 (ci/run.sh431-444)check_ppl function (ci/run.sh453-479)llama-save-load-state with FA on/off and GPU/CPU offload combinations (ci/run.sh448-451)Sources: ci/run.sh1-709 ci/README.md1-34
The release workflow builds production binaries for distribution across platforms:
Release Artifact Map (release.yml)
Release Trigger:
The release workflow triggers on push to master (same path filters as CI) and can be manually dispatched with an optional create_release boolean. .github/workflows/release.yml3-18
Tag Naming:
The .github/actions/get-tag-name composite action determines the tag:
b${BUILD_NUMBER} where BUILD_NUMBER = git rev-list --count HEAD${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}.github/actions/get-tag-name/action.yml10-22
Artifact Structure:
Unix archives (tar.gz) contain all binaries from build/bin/ plus LICENSE. Windows archives (zip) for GPU backends contain only the backend DLL (e.g. ggml-cuda.dll, ggml-vulkan.dll) plus a separate cudart-*.zip for CUDA runtime DLLs. The SYCL archive includes ggml-sycl.dll and required oneAPI runtime DLLs copied from $ONEAPI_ROOT.
Windows Backend DLLs:
Windows GPU release jobs use -DGGML_BACKEND_DL=ON -DGGML_CPU=OFF and build only the specific backend target (e.g. --target ggml-cuda). This separates GPU backends from the CPU-only executable package. .github/workflows/release.yml352-366
ROCm Linux Release:
The ubuntu-22-rocm job builds against ROCm 7.2 with gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1151;gfx1150;gfx1200;gfx1201 targets and GGML_HIP_ROCWMMA_FATTN=ON. .github/workflows/release.yml519-613
Sources: .github/workflows/release.yml1-700 .github/actions/get-tag-name/action.yml1-23
All CI and release jobs use the ggml-org/[email protected] action:
- name: ccache
uses: ggml-org/[email protected]
with:
key: <unique-per-job-key>
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
.github/workflows/build.yml68-75
Cache Strategy:
key string (e.g. macOS-latest-cmake-arm64, ubuntu-22-cmake-hip).evict-old-files: 1d.master; PRs read from cache but do not write back.variant: ccache to select the Windows ccache variant.Sources: .github/workflows/build.yml68-75 .github/workflows/release.yml34-38
Builds with -DGGML_BACKEND_DL=ON produce backends as separate shared libraries loaded at runtime. Combined with -DGGML_CPU_ALL_VARIANTS=ON, the CPU backend is built in multiple ISA variants (SSE4.2, AVX, AVX2, AVX512, AMX) and the appropriate one is selected at startup.
This is enabled in:
ubuntu-24-cmake-vulkan-deb and ubuntu-24-cmake-vulkan (CI) .github/workflows/build.yml407-410Output structure when GGML_BACKEND_DL=ON:
llama-cli, llama-serverggml-cpu-avx.so, ggml-cpu-avx2.so, etc.ggml-cuda.so / ggml-vulkan.so / ggml-opencl.so etc.Sources: .github/workflows/build.yml405-415 .github/workflows/release.yml155-163
Self-hosted runners provide hardware not available in GitHub's infrastructure:
.github/workflows/build.yml with appropriate labelggml-org maintainersci/run.sh with a new GG_BUILD_* environment variable gateExample (MUSA container job):
The ubuntu-22-cmake-musa job uses a Docker container mthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64 provided directly via container:. .github/workflows/build.yml618-645
MUSA CI Example:
The MUSA backend can be tested locally using Docker per ci/README-MUSA.md19-33:
docker run --privileged -it \
-v $HOME/llama.cpp/ci-cache:/ci-cache \
-v $HOME/llama.cpp/ci-results:/ci-results \
-v $PWD:/ws -w /ws \
mthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64
# Inside container:
apt update -y && apt install -y bc cmake ccache git python3.10-venv time unzip wget
GG_BUILD_MUSA=1 bash ./ci/run.sh /ci-results /ci-cache
Sources: ci/README.md1-34 ci/README-MUSA.md1-36 .github/workflows/build.yml618-645
Memory safety and thread safety are validated using compiler sanitizers:
The ubuntu-latest-cmake-sanitizer job uses a matrix of sanitizer: [ADDRESS, THREAD, UNDEFINED] with build_type: [Debug] and continue-on-error: true. .github/workflows/build.yml261-270
| Sanitizer | CMake Flags | Special Notes |
|---|---|---|
ADDRESS | -DLLAMA_SANITIZE_ADDRESS=ON -DGGML_SANITIZE_ADDRESS=ON | Standard build |
THREAD | -DLLAMA_SANITIZE_THREAD=ON -DGGML_SANITIZE_THREAD=ON -DGGML_OPENMP=OFF | OpenMP disabled (false positives) |
UNDEFINED | -DLLAMA_SANITIZE_UNDEFINED=ON -DGGML_SANITIZE_UNDEFINED=ON | Standard build |
.github/workflows/build.yml287-312
Known Exclusions:
test-opt is excluded from sanitizer-enabled builds because of known memory leaks in the optimizer. This is conditioned on NOT LLAMA_SANITIZE_ADDRESS AND NOT GGML_SCHED_NO_REALLOC in CMake. tests/CMakeLists.txt231-234
Sources: .github/workflows/build.yml261-318 tests/CMakeLists.txt231-234
Key environment variables control CI behavior:
Workflow-level environment variables (set in env: block of build.yml):
| Variable | Value | Purpose |
|---|---|---|
GGML_NLOOP | 3 | Test iteration count for backend ops |
GGML_N_THREADS | 1 | Thread count for tests |
LLAMA_LOG_COLORS | 1 | Enable colored log output |
LLAMA_LOG_PREFIX | 1 | Enable log prefix |
LLAMA_LOG_TIMESTAMPS | 1 | Enable log timestamps |
.github/workflows/build.yml54-59
ci/run.sh control variables (set by caller before invoking the script):
| Variable | Purpose |
|---|---|
GG_BUILD_CUDA | Enable CUDA backend |
GG_BUILD_SYCL | Enable SYCL backend (requires ONEAPI_ROOT) |
GG_BUILD_VULKAN | Enable Vulkan backend |
GG_BUILD_METAL | Enable Metal backend (macOS) |
GG_BUILD_ROCM | Enable HIP/ROCm backend (requires GG_BUILD_AMDGPU_TARGETS) |
GG_BUILD_MUSA | Enable MUSA backend |
GG_BUILD_WEBGPU | Enable WebGPU backend |
GG_BUILD_KLEIDIAI | Enable KleidiAI ARM optimizations |
GG_BUILD_NO_SVE | Disable SVE for older ARM targets |
GG_BUILD_LOW_PERF | Skip model download and heavy tests |
GG_BUILD_HIGH_PERF | Enable test_backend_ops_cpu stage |
GG_BUILD_CLOUD | Skip test_scripts stage |
LLAMACPP_TEST_MODELFILE | Path to model for ctest -L model |
ci/run.sh48-166 ci/run.sh663-705
Sources: .github/workflows/build.yml52-57 ci/run.sh48-166
The complete CI/CD flow:
Sources: .github/workflows/build.yml1-1337 .github/workflows/release.yml1-800 ci/run.sh1-683
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.