CI/CD Pipeline

Relevant source files

This document describes the Continuous Integration and Continuous Deployment (CI/CD) infrastructure for llama.cpp, including GitHub Actions workflows, build matrices, testing procedures, and release automation. For information about the build system itself, see page 9.1. For testing infrastructure details, see page 9.2.

Overview

The CI/CD pipeline ensures code quality and cross-platform compatibility through automated building and testing across 30+ platform/backend configurations. The system uses GitHub Actions for cloud-based CI and supports self-hosted runners for specialized hardware testing.

Sources: .github/workflows/build.yml1-60 ci/README.md1-34

Workflow Architecture

Workflow Trigger Configuration:

The main build workflow triggers on:

Push to master branch for code files (.github/workflows/build.yml6-25)
Pull requests modifying source files (.github/workflows/build.yml27-46)
Manual dispatch via GitHub UI (.github/workflows/build.yml4)

Concurrency control prevents redundant runs: .github/workflows/build.yml48-50

Sources: .github/workflows/build.yml3-58 .github/workflows/release.yml3-22

Build Matrix Structure

Platform Coverage

CI Build Job Map (build.yml)

The build matrix spans multiple dimensions:

Category	Variants	Key Job(s)
Architectures	x86_64, ARM64, s390x (big-endian), ppc64le	`ubuntu-cpu-cmake` .github/workflows/build.yml176-238
GPU Backends	CUDA, Metal, Vulkan, HIP, SYCL, MUSA, WebGPU, OpenCL	Multiple backend-specific jobs
Compilers	GCC, Clang (LLVM), MSVC, NVCC, HIPCC, `icpx` (oneAPI)	Platform-dependent
Build Types	Debug, Release, Sanitizers (ADDRESS/THREAD/UNDEFINED)	`ubuntu-latest-cmake-sanitizer` .github/workflows/build.yml261-318
Mobile Platforms	iOS, tvOS, visionOS	`macOS-latest-cmake-ios` .github/workflows/build.yml752-782
WebAssembly	WASM + WebGPU via Emscripten	`ubuntu-24-wasm-webgpu` .github/workflows/build.yml546-585

Sources: .github/workflows/build.yml61-1337

Job Configuration Examples

macOS ARM64 with Metal (representative job):

The macOS-latest-cmake-arm64 job uses actions/checkout@v6, ggml-org/[email protected], and runs CMake with Metal and RPC enabled. After building, it executes leaks -atExit for memory leak detection and then ctest -L main. .github/workflows/build.yml62-96

Key CMake flags used across jobs:

Flag	Purpose	Where Used
`LLAMA_FATAL_WARNINGS=ON`	Treat compiler warnings as errors	Most jobs
`GGML_METAL_USE_BF16=ON`	BF16 support in Metal shaders	macOS ARM64
`GGML_METAL_EMBED_LIBRARY=OFF`	Do not embed `.metallib`	macOS CI (OFF), release (ON)
`GGML_RPC=ON`	Build the RPC backend	Most CPU jobs
`GGML_BACKEND_DL=ON`	Dynamic backend loading as shared libs	Vulkan, CPU variant jobs
`GGML_CPU_ALL_VARIANTS=ON`	Build SSE/AVX/AVX2/AVX512/AMX CPU variants	Vulkan + release jobs
`GGML_SCHED_NO_REALLOC=ON`	Disable scheduler realloc (used in `ci/run.sh`)	Local CI script
`LLAMA_BUILD_BORINGSSL=ON`	Link against BoringSSL for TLS	macOS, Windows jobs

Sources: .github/workflows/build.yml61-96 .github/workflows/build.yml383-415 ci/run.sh48-49

Testing Framework Integration

Test Execution Pipeline

CI Test Organization (ctest labels → executables)

Test Labels and Organization:

Tests are organized using CMake labels defined in tests/CMakeLists.txt:

main - Core tests that run on every build (tests/CMakeLists.txt22-46)
model - Tests requiring downloaded model files (tests/CMakeLists.txt238-244)
python - Python script tests (tests/CMakeLists.txt194)

Test Execution Examples:

Standard test run on all CI jobs:

cd build && ctest -L main --verbose --timeout 900

.github/workflows/build.yml94-96

With model files (local ci/run.sh):

LLAMACPP_TEST_MODELFILE="$model" ctest --output-on-failure -L model

ci/run.sh317-318

Model Download Fixture:

The test-download-model test is declared as a FIXTURES_SETUP target in CMake. Tests that need a model file declare FIXTURES_REQUIRED test-download-model. The model is tinyllamas/stories15M-q4_0.gguf (s390x uses a big-endian variant).

tests/CMakeLists.txt218-227

Sources: .github/workflows/build.yml90-96 tests/CMakeLists.txt119-267 ci/run.sh212-340

Local CI Execution

The ci/run.sh script allows developers to run the full CI pipeline locally before pushing changes:

CI Script Architecture

ci/run.sh Function and Variable Map

Build Configuration:

The script constructs CMAKE_EXTRA based on environment variables. For CUDA, it auto-detects the GPU architecture via nvidia-smi and falls back to a multi-arch list if detection fails. For SYCL, it requires ONEAPI_ROOT to be set. For ROCm, GG_BUILD_AMDGPU_TARGETS must specify the GPU target (e.g. gfx1100). ci/run.sh48-166

Test Execution Functions:

Each CI stage is implemented as a function pair:

gg_run_<stage> — executes the stage, writes to $OUT/<ci>.log and $OUT/<ci>.exit
gg_sum_<stage> — appends a Markdown summary to $OUT/README.md via gg_printf

The gg_run helper calls these functions and OR-accumulates exit codes into $ret.

ci/run.sh190-206

Full Model Pipeline (gg_run_qwen3_0_6b):

The qwen3_0_6b stage performs a complete model workflow on Qwen3-0.6B-Base:

Download model weights and Wikitext-2 test set (ci/run.sh367-375)
Convert to GGUF (f16, bf16) with convert_hf_to_gguf.py (ci/run.sh387-389)
Quantize to q8_0, q4_0, q4_1, q5_0, q5_1, q2_k, q3_k, q4_k, q5_k, q6_k using llama-quantize (ci/run.sh405-414)
Run llama-completion for each quantization (ci/run.sh418-429)
Run llama-perplexity against Wikitext-2 (ci/run.sh431-444)
Validate perplexity < 20.0 per quantization via check_ppl function (ci/run.sh453-479)
Run llama-save-load-state with FA on/off and GPU/CPU offload combinations (ci/run.sh448-451)

Sources: ci/run.sh1-709 ci/README.md1-34

Release Workflow

The release workflow builds production binaries for distribution across platforms:

Release Build Matrix

Release Artifact Map (release.yml)

Release Trigger:

The release workflow triggers on push to master (same path filters as CI) and can be manually dispatched with an optional create_release boolean. .github/workflows/release.yml3-18

Tag Naming:

The .github/actions/get-tag-name composite action determines the tag:

Master branch: b${BUILD_NUMBER} where BUILD_NUMBER = git rev-list --count HEAD
Other branches: ${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}

.github/actions/get-tag-name/action.yml10-22

Artifact Structure:

Unix archives (tar.gz) contain all binaries from build/bin/ plus LICENSE. Windows archives (zip) for GPU backends contain only the backend DLL (e.g. ggml-cuda.dll, ggml-vulkan.dll) plus a separate cudart-*.zip for CUDA runtime DLLs. The SYCL archive includes ggml-sycl.dll and required oneAPI runtime DLLs copied from $ONEAPI_ROOT.

Windows Backend DLLs:

Windows GPU release jobs use -DGGML_BACKEND_DL=ON -DGGML_CPU=OFF and build only the specific backend target (e.g. --target ggml-cuda). This separates GPU backends from the CPU-only executable package. .github/workflows/release.yml352-366

ROCm Linux Release:

The ubuntu-22-rocm job builds against ROCm 7.2 with gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1151;gfx1150;gfx1200;gfx1201 targets and GGML_HIP_ROCWMMA_FATTN=ON. .github/workflows/release.yml519-613

Sources: .github/workflows/release.yml1-700 .github/actions/get-tag-name/action.yml1-23

Build Optimization

ccache Integration

All CI and release jobs use the ggml-org/[email protected] action:

- name: ccache
  uses: ggml-org/[email protected]
  with:
    key: <unique-per-job-key>
    evict-old-files: 1d
    save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

.github/workflows/build.yml68-75

Cache Strategy:

Each job has a unique key string (e.g. macOS-latest-cmake-arm64, ubuntu-22-cmake-hip).
Caches are evicted after 1 day of inactivity via evict-old-files: 1d.
Cache saves only occur on pushes to master; PRs read from cache but do not write back.
Windows jobs use variant: ccache to select the Windows ccache variant.

Sources: .github/workflows/build.yml68-75 .github/workflows/release.yml34-38

Backend Dynamic Loading

Builds with -DGGML_BACKEND_DL=ON produce backends as separate shared libraries loaded at runtime. Combined with -DGGML_CPU_ALL_VARIANTS=ON, the CPU backend is built in multiple ISA variants (SSE4.2, AVX, AVX2, AVX512, AMX) and the appropriate one is selected at startup.

This is enabled in:

ubuntu-24-cmake-vulkan-deb and ubuntu-24-cmake-vulkan (CI) .github/workflows/build.yml407-410
All Linux and Windows CPU/GPU release jobs .github/workflows/release.yml158-161

Output structure when GGML_BACKEND_DL=ON:

Main executable: llama-cli, llama-server
CPU backends: ggml-cpu-avx.so, ggml-cpu-avx2.so, etc.
GPU backends: ggml-cuda.so / ggml-vulkan.so / ggml-opencl.so etc.

Sources: .github/workflows/build.yml405-415 .github/workflows/release.yml155-163

Self-Hosted Runners

Self-hosted runners provide hardware not available in GitHub's infrastructure:

Adding Self-Hosted Runners

Add a job to .github/workflows/build.yml with appropriate label
Request runner token from ggml-org maintainers
Configure machine with token per GitHub's documentation
Update ci/run.sh with a new GG_BUILD_* environment variable gate

Example (MUSA container job):

The ubuntu-22-cmake-musa job uses a Docker container mthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64 provided directly via container:. .github/workflows/build.yml618-645

MUSA CI Example:

The MUSA backend can be tested locally using Docker per ci/README-MUSA.md19-33:

docker run --privileged -it \
    -v $HOME/llama.cpp/ci-cache:/ci-cache \
    -v $HOME/llama.cpp/ci-results:/ci-results \
    -v $PWD:/ws -w /ws \
    mthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64

# Inside container:
apt update -y && apt install -y bc cmake ccache git python3.10-venv time unzip wget
GG_BUILD_MUSA=1 bash ./ci/run.sh /ci-results /ci-cache

Sources: ci/README.md1-34 ci/README-MUSA.md1-36 .github/workflows/build.yml618-645

Sanitizer Testing

Memory safety and thread safety are validated using compiler sanitizers:

Sanitizer Configuration

The ubuntu-latest-cmake-sanitizer job uses a matrix of sanitizer: [ADDRESS, THREAD, UNDEFINED] with build_type: [Debug] and continue-on-error: true. .github/workflows/build.yml261-270

Sanitizer	CMake Flags	Special Notes
`ADDRESS`	`-DLLAMA_SANITIZE_ADDRESS=ON -DGGML_SANITIZE_ADDRESS=ON`	Standard build
`THREAD`	`-DLLAMA_SANITIZE_THREAD=ON -DGGML_SANITIZE_THREAD=ON -DGGML_OPENMP=OFF`	OpenMP disabled (false positives)
`UNDEFINED`	`-DLLAMA_SANITIZE_UNDEFINED=ON -DGGML_SANITIZE_UNDEFINED=ON`	Standard build

.github/workflows/build.yml287-312

Known Exclusions:

test-opt is excluded from sanitizer-enabled builds because of known memory leaks in the optimizer. This is conditioned on NOT LLAMA_SANITIZE_ADDRESS AND NOT GGML_SCHED_NO_REALLOC in CMake. tests/CMakeLists.txt231-234

Sources: .github/workflows/build.yml261-318 tests/CMakeLists.txt231-234

Environment Variables

Key environment variables control CI behavior:

Workflow-level environment variables (set in env: block of build.yml):

Variable	Value	Purpose
`GGML_NLOOP`	`3`	Test iteration count for backend ops
`GGML_N_THREADS`	`1`	Thread count for tests
`LLAMA_LOG_COLORS`	`1`	Enable colored log output
`LLAMA_LOG_PREFIX`	`1`	Enable log prefix
`LLAMA_LOG_TIMESTAMPS`	`1`	Enable log timestamps

.github/workflows/build.yml54-59

ci/run.sh control variables (set by caller before invoking the script):

Variable	Purpose
`GG_BUILD_CUDA`	Enable CUDA backend
`GG_BUILD_SYCL`	Enable SYCL backend (requires `ONEAPI_ROOT`)
`GG_BUILD_VULKAN`	Enable Vulkan backend
`GG_BUILD_METAL`	Enable Metal backend (macOS)
`GG_BUILD_ROCM`	Enable HIP/ROCm backend (requires `GG_BUILD_AMDGPU_TARGETS`)
`GG_BUILD_MUSA`	Enable MUSA backend
`GG_BUILD_WEBGPU`	Enable WebGPU backend
`GG_BUILD_KLEIDIAI`	Enable KleidiAI ARM optimizations
`GG_BUILD_NO_SVE`	Disable SVE for older ARM targets
`GG_BUILD_LOW_PERF`	Skip model download and heavy tests
`GG_BUILD_HIGH_PERF`	Enable `test_backend_ops_cpu` stage
`GG_BUILD_CLOUD`	Skip `test_scripts` stage
`LLAMACPP_TEST_MODELFILE`	Path to model for `ctest -L model`

ci/run.sh48-166 ci/run.sh663-705

Sources: .github/workflows/build.yml52-57 ci/run.sh48-166

Summary Workflow

The complete CI/CD flow:

Sources: .github/workflows/build.yml1-1337 .github/workflows/release.yml1-800 ci/run.sh1-683

CI/CD Pipeline

Relevant source files

Overview

Sources: .github/workflows/build.yml1-60 ci/README.md1-34

Workflow Architecture

Workflow Trigger Configuration:

The main build workflow triggers on:

Push to master branch for code files (.github/workflows/build.yml6-25)
Pull requests modifying source files (.github/workflows/build.yml27-46)
Manual dispatch via GitHub UI (.github/workflows/build.yml4)

Concurrency control prevents redundant runs: .github/workflows/build.yml48-50

Sources: .github/workflows/build.yml3-58 .github/workflows/release.yml3-22

Build Matrix Structure

Platform Coverage

CI Build Job Map (build.yml)

The build matrix spans multiple dimensions:

Category	Variants	Key Job(s)
Architectures	x86_64, ARM64, s390x (big-endian), ppc64le	`ubuntu-cpu-cmake` .github/workflows/build.yml176-238
GPU Backends	CUDA, Metal, Vulkan, HIP, SYCL, MUSA, WebGPU, OpenCL	Multiple backend-specific jobs
Compilers	GCC, Clang (LLVM), MSVC, NVCC, HIPCC, `icpx` (oneAPI)	Platform-dependent
Build Types	Debug, Release, Sanitizers (ADDRESS/THREAD/UNDEFINED)	`ubuntu-latest-cmake-sanitizer` .github/workflows/build.yml261-318
Mobile Platforms	iOS, tvOS, visionOS	`macOS-latest-cmake-ios` .github/workflows/build.yml752-782
WebAssembly	WASM + WebGPU via Emscripten	`ubuntu-24-wasm-webgpu` .github/workflows/build.yml546-585

Sources: .github/workflows/build.yml61-1337

Job Configuration Examples

macOS ARM64 with Metal (representative job):

Key CMake flags used across jobs:

Flag	Purpose	Where Used
`LLAMA_FATAL_WARNINGS=ON`	Treat compiler warnings as errors	Most jobs
`GGML_METAL_USE_BF16=ON`	BF16 support in Metal shaders	macOS ARM64
`GGML_METAL_EMBED_LIBRARY=OFF`	Do not embed `.metallib`	macOS CI (OFF), release (ON)
`GGML_RPC=ON`	Build the RPC backend	Most CPU jobs
`GGML_BACKEND_DL=ON`	Dynamic backend loading as shared libs	Vulkan, CPU variant jobs
`GGML_CPU_ALL_VARIANTS=ON`	Build SSE/AVX/AVX2/AVX512/AMX CPU variants	Vulkan + release jobs
`GGML_SCHED_NO_REALLOC=ON`	Disable scheduler realloc (used in `ci/run.sh`)	Local CI script
`LLAMA_BUILD_BORINGSSL=ON`	Link against BoringSSL for TLS	macOS, Windows jobs

Sources: .github/workflows/build.yml61-96 .github/workflows/build.yml383-415 ci/run.sh48-49

Testing Framework Integration

Test Execution Pipeline

CI Test Organization (ctest labels → executables)

Test Labels and Organization:

Tests are organized using CMake labels defined in tests/CMakeLists.txt:

main - Core tests that run on every build (tests/CMakeLists.txt22-46)
model - Tests requiring downloaded model files (tests/CMakeLists.txt238-244)
python - Python script tests (tests/CMakeLists.txt194)

Test Execution Examples:

Standard test run on all CI jobs:

cd build && ctest -L main --verbose --timeout 900

.github/workflows/build.yml94-96

With model files (local ci/run.sh):

LLAMACPP_TEST_MODELFILE="$model" ctest --output-on-failure -L model

ci/run.sh317-318

Model Download Fixture:

tests/CMakeLists.txt218-227

Sources: .github/workflows/build.yml90-96 tests/CMakeLists.txt119-267 ci/run.sh212-340

Local CI Execution

The ci/run.sh script allows developers to run the full CI pipeline locally before pushing changes:

CI Script Architecture

ci/run.sh Function and Variable Map

Build Configuration:

Test Execution Functions:

Each CI stage is implemented as a function pair:

gg_run_<stage> — executes the stage, writes to $OUT/<ci>.log and $OUT/<ci>.exit
gg_sum_<stage> — appends a Markdown summary to $OUT/README.md via gg_printf

The gg_run helper calls these functions and OR-accumulates exit codes into $ret.

ci/run.sh190-206

Full Model Pipeline (gg_run_qwen3_0_6b):

The qwen3_0_6b stage performs a complete model workflow on Qwen3-0.6B-Base:

Download model weights and Wikitext-2 test set (ci/run.sh367-375)
Convert to GGUF (f16, bf16) with convert_hf_to_gguf.py (ci/run.sh387-389)
Quantize to q8_0, q4_0, q4_1, q5_0, q5_1, q2_k, q3_k, q4_k, q5_k, q6_k using llama-quantize (ci/run.sh405-414)
Run llama-completion for each quantization (ci/run.sh418-429)
Run llama-perplexity against Wikitext-2 (ci/run.sh431-444)
Validate perplexity < 20.0 per quantization via check_ppl function (ci/run.sh453-479)
Run llama-save-load-state with FA on/off and GPU/CPU offload combinations (ci/run.sh448-451)

Sources: ci/run.sh1-709 ci/README.md1-34

Release Workflow

The release workflow builds production binaries for distribution across platforms:

Release Build Matrix

Release Artifact Map (release.yml)

Release Trigger:

The release workflow triggers on push to master (same path filters as CI) and can be manually dispatched with an optional create_release boolean. .github/workflows/release.yml3-18

Tag Naming:

The .github/actions/get-tag-name composite action determines the tag:

Master branch: b${BUILD_NUMBER} where BUILD_NUMBER = git rev-list --count HEAD
Other branches: ${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}

.github/actions/get-tag-name/action.yml10-22

Artifact Structure:

Windows Backend DLLs:

ROCm Linux Release:

Sources: .github/workflows/release.yml1-700 .github/actions/get-tag-name/action.yml1-23

Build Optimization

ccache Integration

All CI and release jobs use the ggml-org/[email protected] action:

- name: ccache
  uses: ggml-org/[email protected]
  with:
    key: <unique-per-job-key>
    evict-old-files: 1d
    save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

.github/workflows/build.yml68-75

Cache Strategy:

Each job has a unique key string (e.g. macOS-latest-cmake-arm64, ubuntu-22-cmake-hip).
Caches are evicted after 1 day of inactivity via evict-old-files: 1d.
Cache saves only occur on pushes to master; PRs read from cache but do not write back.
Windows jobs use variant: ccache to select the Windows ccache variant.

Sources: .github/workflows/build.yml68-75 .github/workflows/release.yml34-38

Backend Dynamic Loading

This is enabled in:

ubuntu-24-cmake-vulkan-deb and ubuntu-24-cmake-vulkan (CI) .github/workflows/build.yml407-410
All Linux and Windows CPU/GPU release jobs .github/workflows/release.yml158-161

Output structure when GGML_BACKEND_DL=ON:

Main executable: llama-cli, llama-server
CPU backends: ggml-cpu-avx.so, ggml-cpu-avx2.so, etc.
GPU backends: ggml-cuda.so / ggml-vulkan.so / ggml-opencl.so etc.

Sources: .github/workflows/build.yml405-415 .github/workflows/release.yml155-163

Self-Hosted Runners

Self-hosted runners provide hardware not available in GitHub's infrastructure:

Adding Self-Hosted Runners

Add a job to .github/workflows/build.yml with appropriate label
Request runner token from ggml-org maintainers
Configure machine with token per GitHub's documentation
Update ci/run.sh with a new GG_BUILD_* environment variable gate

Example (MUSA container job):

The ubuntu-22-cmake-musa job uses a Docker container mthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64 provided directly via container:. .github/workflows/build.yml618-645

MUSA CI Example:

The MUSA backend can be tested locally using Docker per ci/README-MUSA.md19-33:

docker run --privileged -it \
    -v $HOME/llama.cpp/ci-cache:/ci-cache \
    -v $HOME/llama.cpp/ci-results:/ci-results \
    -v $PWD:/ws -w /ws \
    mthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64

# Inside container:
apt update -y && apt install -y bc cmake ccache git python3.10-venv time unzip wget
GG_BUILD_MUSA=1 bash ./ci/run.sh /ci-results /ci-cache

Sources: ci/README.md1-34 ci/README-MUSA.md1-36 .github/workflows/build.yml618-645

Sanitizer Testing

Memory safety and thread safety are validated using compiler sanitizers:

Sanitizer Configuration

The ubuntu-latest-cmake-sanitizer job uses a matrix of sanitizer: [ADDRESS, THREAD, UNDEFINED] with build_type: [Debug] and continue-on-error: true. .github/workflows/build.yml261-270

Sanitizer	CMake Flags	Special Notes
`ADDRESS`	`-DLLAMA_SANITIZE_ADDRESS=ON -DGGML_SANITIZE_ADDRESS=ON`	Standard build
`THREAD`	`-DLLAMA_SANITIZE_THREAD=ON -DGGML_SANITIZE_THREAD=ON -DGGML_OPENMP=OFF`	OpenMP disabled (false positives)
`UNDEFINED`	`-DLLAMA_SANITIZE_UNDEFINED=ON -DGGML_SANITIZE_UNDEFINED=ON`	Standard build

.github/workflows/build.yml287-312

Known Exclusions:

Sources: .github/workflows/build.yml261-318 tests/CMakeLists.txt231-234

Environment Variables

Key environment variables control CI behavior:

Workflow-level environment variables (set in env: block of build.yml):

Variable	Value	Purpose
`GGML_NLOOP`	`3`	Test iteration count for backend ops
`GGML_N_THREADS`	`1`	Thread count for tests
`LLAMA_LOG_COLORS`	`1`	Enable colored log output
`LLAMA_LOG_PREFIX`	`1`	Enable log prefix
`LLAMA_LOG_TIMESTAMPS`	`1`	Enable log timestamps

.github/workflows/build.yml54-59

ci/run.sh control variables (set by caller before invoking the script):

Variable	Purpose
`GG_BUILD_CUDA`	Enable CUDA backend
`GG_BUILD_SYCL`	Enable SYCL backend (requires `ONEAPI_ROOT`)
`GG_BUILD_VULKAN`	Enable Vulkan backend
`GG_BUILD_METAL`	Enable Metal backend (macOS)
`GG_BUILD_ROCM`	Enable HIP/ROCm backend (requires `GG_BUILD_AMDGPU_TARGETS`)
`GG_BUILD_MUSA`	Enable MUSA backend
`GG_BUILD_WEBGPU`	Enable WebGPU backend
`GG_BUILD_KLEIDIAI`	Enable KleidiAI ARM optimizations
`GG_BUILD_NO_SVE`	Disable SVE for older ARM targets
`GG_BUILD_LOW_PERF`	Skip model download and heavy tests
`GG_BUILD_HIGH_PERF`	Enable `test_backend_ops_cpu` stage
`GG_BUILD_CLOUD`	Skip `test_scripts` stage
`LLAMACPP_TEST_MODELFILE`	Path to model for `ctest -L model`

ci/run.sh48-166 ci/run.sh663-705

Sources: .github/workflows/build.yml52-57 ci/run.sh48-166

Summary Workflow

The complete CI/CD flow:

Sources: .github/workflows/build.yml1-1337 .github/workflows/release.yml1-800 ci/run.sh1-683

CI/CD Pipeline

Overview

Workflow Architecture

Build Matrix Structure

Platform Coverage

Job Configuration Examples

Testing Framework Integration

Test Execution Pipeline

Local CI Execution

CI Script Architecture

Release Workflow

Release Build Matrix

Build Optimization

ccache Integration

Backend Dynamic Loading

Self-Hosted Runners

Adding Self-Hosted Runners

Sanitizer Testing

Sanitizer Configuration

Environment Variables

Summary Workflow

On this page

CI/CD Pipeline

Overview

Workflow Architecture

Build Matrix Structure

Platform Coverage

Job Configuration Examples

Testing Framework Integration

Test Execution Pipeline

Local CI Execution

CI Script Architecture

Release Workflow

Release Build Matrix

Build Optimization

ccache Integration

Backend Dynamic Loading

Self-Hosted Runners

Adding Self-Hosted Runners

Sanitizer Testing

Sanitizer Configuration

Environment Variables

Summary Workflow

On this page