This page documents the RenderingDevice abstraction layer, the RenderingDeviceGraph command recording system, and the concrete GPU API driver implementations (Vulkan, D3D12, Metal). It covers the internal architecture and data flow from high-level rendering calls down to native API commands.
For the higher-level scene rendering pipelines that use RenderingDevice as a backend, see Scene Rendering Pipelines. For the RenderingServer API that sits above RenderingDevice, see RenderingServer. For the GLES3 path (which does not use RenderingDevice), see GLES3 Backend.
The rendering subsystem is split into three layers:
Architecture: Class Hierarchy
Sources: servers/rendering/rendering_device.h61-82 servers/rendering/rendering_device_driver.h91-152 drivers/vulkan/rendering_device_driver_vulkan.h52-53 drivers/d3d12/rendering_device_driver_d3d12.h83-84 drivers/metal/rendering_device_driver_metal.h58-59
| Class | Role | Location |
|---|---|---|
RenderingDevice | Public API, resource ownership, draw/compute lists | servers/rendering/rendering_device.h |
RenderingDeviceCommons | Shared enums, types, DataFormat | servers/rendering/rendering_device_commons.h |
RenderingDeviceDriver (RDD) | Abstract driver interface, opaque ID types | servers/rendering/rendering_device_driver.h |
RenderingContextDriver | Swapchain, surface, device enumeration | servers/rendering/rendering_context_driver.h |
RenderingDeviceGraph (RDG) | Command recording, reordering, barrier insertion | servers/rendering/rendering_device_graph.h |
RenderingDeviceDriverVulkan | Vulkan implementation | drivers/vulkan/ |
RenderingDeviceDriverD3D12 | D3D12 implementation | drivers/d3d12/ |
RenderingDeviceDriverMetal | Metal implementation | drivers/metal/ |
RenderingDevice is the sole public entry point for low-level GPU access. It is a singleton (RenderingDevice::singleton) and is exposed to GDScript/C# via the class database. A second "local" device can be created via create_local_device() for off-screen compute.
It holds two internal pointers set at initialization:
RenderingContextDriver *context — manages the OS/windowing surface and device enumerationRenderingDeviceDriver *driver — the active GPU API backendAll methods that record GPU work are restricted to the render thread and are guarded by ERR_RENDER_THREAD_GUARD().
Sources: servers/rendering/rendering_device.h61-85 servers/rendering/rendering_device.cpp142-146
All GPU resources (textures, buffers, shaders, pipelines, framebuffers, uniform sets) are referenced by RID. RenderingDevice manages these through typed RID_Owner<T> members. A dependency graph (dependency_map, reverse_dependency_map) tracks which resources depend on which, so that freeing a parent resource also frees its children.
_add_dependency(RID p_id, RID p_depends_on)
_free_dependencies(RID p_id)
Sources: servers/rendering/rendering_device.h116-121 servers/rendering/rendering_device.cpp152-194
RenderingDevice owns these buffer categories, each backed by a Buffer struct:
| RID Owner | Driver Usage Bits |
|---|---|
uniform_buffer_owner | Uniform data read in shaders |
storage_buffer_owner | Shader read/write storage |
texture_buffer_owner | Texel-formatted buffer views |
vertex_buffer_owner | Vertex attribute data |
index_buffer_owner | Index data |
The internal Buffer struct stores:
RDD::BufferID driver_id — opaque native handleuint32_t sizeBitField<RDD::BufferUsageBits> usageRDG::ResourceTracker *draw_tracker — tracks current usage in the command graphSources: servers/rendering/rendering_device.h177-184
CPU-to-GPU data transfers use a pool of CPU-mapped staging buffers. StagingBuffers contains a list of StagingBufferBlock entries, each with a fixed block size and a frame-use timestamp.
Diagram: Staging Buffer Allocation Flow
Sources: servers/rendering/rendering_device.h148-174 servers/rendering/rendering_device.cpp509-663
The StagingRequiredAction enum controls what happens when no block is available:
STAGING_REQUIRED_ACTION_NONE — proceed normallySTAGING_REQUIRED_ACTION_STALL_PREVIOUS — wait for previous frames to finishSTAGING_REQUIRED_ACTION_FLUSH_AND_STALL_ALL — stall the entire GPU pipelineRecording is done through list objects. Only one draw list or compute list may be open at a time.
| List Type | Begin Method | End Method | Submit Operation |
|---|---|---|---|
| Draw | draw_list_begin() | draw_list_end() | draw_list_draw() |
| Compute | compute_list_begin() | compute_list_end() | compute_list_dispatch() |
| Raytracing | raytracing_list_begin() | raytracing_list_end() | raytracing_list_trace_rays() |
List instructions are appended to the RenderingDeviceGraph as DrawListInstruction, ComputeListInstruction, or RaytracingListInstruction entries. Execution is deferred to graph submission.
Sources: servers/rendering/rendering_device_graph.h51-101
GLSL source is compiled to SPIR-V via the glslang module, then passed to the driver for final translation.
shader_compile_spirv_from_source() --> compile_glslang_shader() (modules/glslang)
shader_create_from_spirv() --> shader_compile_binary_from_spirv() --> shader_create_from_bytecode()
The driver's get_shader_container_format() controls which SPIRV version and language version to target.
Sources: servers/rendering/rendering_device.cpp200-218
RenderingDeviceGraph (aliased as RDG throughout the codebase) is a deferred command recorder. Instead of immediately calling driver functions, RenderingDevice appends commands to the graph. At frame end (_end_frame()), the graph is compiled and submitted to the driver.
Defined in rendering_device.cpp:
| Macro | Default | Effect |
|---|---|---|
RENDER_GRAPH_REORDER | 1 | Topologically sorts commands by dependency |
RENDER_GRAPH_FULL_BARRIERS | 0 | Debug: emits full barriers between every level |
SECONDARY_COMMAND_BUFFERS_PER_FRAME | 0 | Experimental background command buffer recording |
Sources: servers/rendering/rendering_device.cpp127-140
Every tracked GPU resource has a ResourceTracker object:
ResourceUsage values include: RESOURCE_USAGE_COPY_FROM, RESOURCE_USAGE_TEXTURE_SAMPLE, RESOURCE_USAGE_STORAGE_BUFFER_READ_WRITE, RESOURCE_USAGE_ATTACHMENT_COLOR_READ_WRITE, and so on. The graph maps each usage to the appropriate pipeline stage bits and access bits for barrier insertion.
Sources: servers/rendering/rendering_device_graph.h154-220 servers/rendering/rendering_device_graph.cpp49-214
The graph records commands as tagged binary blobs. The RecordedCommand::Type enum lists all supported operations:
TYPE_DRAW_LIST, TYPE_COMPUTE_LIST, TYPE_RAYTRACING_LIST,
TYPE_BUFFER_UPDATE, TYPE_BUFFER_COPY, TYPE_BUFFER_CLEAR,
TYPE_BUFFER_GET_DATA, TYPE_TEXTURE_UPDATE, TYPE_TEXTURE_COPY,
TYPE_TEXTURE_CLEAR_COLOR, TYPE_TEXTURE_RESOLVE, TYPE_TEXTURE_GET_DATA,
TYPE_ACCELERATION_STRUCTURE_BUILD, TYPE_CAPTURE_TIMESTAMP,
TYPE_DRIVER_CALLBACK
Each RecordedCommand stores adjacency information (adjacent_command_list_index) used to build the dependency DAG, along with accumulated stage bits (self_stages, previous_stages, next_stages).
Sources: servers/rendering/rendering_device_graph.h103-142
Diagram: RenderingDeviceGraph Submission
Sources: servers/rendering/rendering_device_graph.cpp217-270 servers/rendering/rendering_device_graph.cpp160-214
RenderingDeviceDriver (abbreviated RDD) defines the contract every backend must implement. It contains no state of its own and performs minimal validation. Its design principles (from the file header) are:
uint64_tDEV_ENABLED/DEBUG_ENABLED buildsalloca() to avoid heap allocation in hot pathsVectorView<T> is used for array arguments to avoid copiesEvery resource category has a strongly-typed ID struct generated by the DEFINE_ID macro:
BufferID, TextureID, SamplerID, VertexFormatID, CommandQueueID,
CommandQueueFamilyID, CommandPoolID, CommandBufferID, SwapChainID,
FramebufferID, ShaderID, UniformSetID, PipelineID, RenderPassID,
QueryPoolID, FenceID, SemaphoreID, AccelerationStructureID,
RaytracingPipelineID
IDs evaluate to false when zero (uninitialized). This is used for null-checks throughout RenderingDevice.
Sources: servers/rendering/rendering_device_driver.h95-146
| Group | Representative Methods |
|---|---|
| Initialization | initialize(device_index, frame_count) |
| Buffers | buffer_create(), buffer_free(), buffer_map(), buffer_unmap() |
| Textures | texture_create(), texture_free(), texture_copy_from_buffer() |
| Shaders | shader_create_from_container(), shader_free() |
| Pipelines | render_pipeline_create(), compute_pipeline_create() |
| Uniform sets | uniform_set_create(), uniform_set_free() |
| Command buffers | command_buffer_begin(), command_buffer_end(), command_buffer_submit() |
| Barriers | command_pipeline_barrier() |
| Draw | command_render_pass_begin(), command_draw(), command_draw_indexed() |
| Compute | command_dispatch() |
| Synchronization | fence_create(), fence_wait(), semaphore_create() |
| Swap chain | swap_chain_create(), swap_chain_acquire_image(), swap_chain_present() |
| Capabilities | get_capabilities(), get_shader_container_format() |
Sources: servers/rendering/rendering_device_driver.h152
RenderingDeviceDriverVulkan in drivers/vulkan/ implements RenderingDeviceDriver for Vulkan 1.0+ with optional extensions.
Uses Vulkan Memory Allocator (vk_mem_alloc.h) via a VmaAllocator allocator member. Small allocations use per-memory-type pools (small_allocs_pools).
SPIR-V binaries are stored in the RenderingShaderContainerVulkan format (via rendering_shader_container_vulkan.h). When specialization constants are used, the driver optionally runs re-spirv (at thirdparty/re-spirv/) to optimize the SPIR-V before creating the VkPipeline. This is controlled by RESPV_ENABLED 1.
Sources: drivers/vulkan/rendering_device_driver_vulkan.cpp61-75
| Struct | Purpose |
|---|---|
SubgroupCapabilities | size, supported_stages, supported_operations |
ShaderCapabilities | shader_float16_is_supported, shader_int8_is_supported |
StorageBufferCapabilities | 16-bit storage access flags |
AccelerationStructureCapabilities | RT geometry support |
RaytracingCapabilities | shader_group_handle_size, alignment values |
Sources: drivers/vulkan/rendering_device_driver_vulkan.h69-108
CmdDebugMarkerBeginEXT, CmdDebugMarkerEndEXT (requires VK_EXT_DEBUG_UTILS_EXTENSION_NAME)breadcrumb_buffer, 512 entries in debug builds) to identify the last command before a GPU crashGetDeviceFaultInfoEXT when device_fault_support is setSources: drivers/vulkan/rendering_device_driver_vulkan.cpp86-87 drivers/vulkan/rendering_device_driver_vulkan.h217-222
RD_TO_VK_FORMAT[] is a compile-time array mapping RDD::DataFormat → VkFormat (all 218 entries). Several RDD enums are asserted to be numerically identical to Vulkan enums via static_assert + ENUM_MEMBERS_EQUAL.
Sources: drivers/vulkan/rendering_device_driver_vulkan.cpp92-325 drivers/vulkan/rendering_device_driver_vulkan.cpp386-396
RenderingDeviceDriverD3D12 in drivers/d3d12/ implements RenderingDeviceDriver for Direct3D 12.
D3D12 shaders require DXIL bytecode. The driver uses Mesa's NIR pipeline:
spirv_to_dxil() / nir_to_dxil() from mesa/nir_to_dxil.hdxil_spirv_nir.h bridging libraryd3d12_godot_nir_bridge.h for Godot-specific glueSources: drivers/d3d12/rendering_device_driver_d3d12.cpp64-69
Uses D3D12 Memory Allocator (thirdparty/d3d12ma/D3D12MemAlloc.h) via ComPtr<D3D12MA::Allocator> allocator.
D3D12 requires explicit descriptor heap management. The driver uses two systems:
| Type | Class | Purpose |
|---|---|---|
| Shader-visible GPU heap | DescriptorHeap | CBV/SRV/UAV, Samplers — for binding to shaders |
| CPU-only pool | CPUDescriptorHeapPool | RTV, DSV, and staging descriptors |
DescriptorHeap internally uses a D3D12MA::VirtualBlock for sub-allocation within a fixed ID3D12DescriptorHeap. CPUDescriptorHeapPool maintains a list of DescriptorHeap instances, growing on demand up to a maximum.
Sources: drivers/d3d12/rendering_device_driver_d3d12.h154-196 drivers/d3d12/rendering_device_driver_d3d12.cpp368-475
D3D12 requires explicit resource state transitions (D3D12_RESOURCE_STATES). The driver tracks per-subresource states in ResourceInfo::States:
_resource_transition_batch() batches pending transitions and _resource_transitions_flush() emits them as D3D12_RESOURCE_BARRIER arrays before each draw/dispatch.
Sources: drivers/d3d12/rendering_device_driver_d3d12.h229-262 drivers/d3d12/rendering_device_driver_d3d12.cpp587-625
RD_TO_D3D12_FORMAT[] maps each RDD::DataFormat to a D3D12Format struct containing:
DXGI_FORMAT family — typeless base formatDXGI_FORMAT general_format — typed SRV/UAV formatUINT swizzle — component mapping for channels not matching RD's conventionDXGI_FORMAT dsv_format — depth/stencil view format (when applicable)Sources: drivers/d3d12/rendering_device_driver_d3d12.cpp118-351
RenderingDeviceDriverMetal in drivers/metal/ implements RenderingDeviceDriver for Apple Metal (macOS 11+, iOS 14+). It uses the metal-cpp C++ bindings from thirdparty/metal-cpp/.
The driver queries MetalDeviceProperties (in metal_device_properties.h) which mirrors the logic from MoltenVK for deriving GPU capabilities from MTL::Device. A MetalDeviceProfile selects known per-device workarounds.
Sources: drivers/metal/rendering_device_driver_metal.h69-89 drivers/metal/metal_device_properties.h1-60
Compiled pipelines are stored in an MTL::BinaryArchive (archive member). On subsequent launches the archive is re-loaded to avoid re-compilation. A archive_fail_on_miss debug flag forces failures if a pipeline miss occurs.
Sources: drivers/metal/rendering_device_driver_metal.h91-95
An internal copy_queue (MTL::CommandQueue) handles CPU-GPU transfers independently from the main render queue. Access is protected by copy_queue_mutex.
Sources: drivers/metal/rendering_device_driver_metal.h101-106
Metal 4 (macOS 26 / iOS 26) introduced significant API changes. The implementation is split:
rendering_device_driver_metal.cpp — base class and common Metal 3 functionalityrendering_device_driver_metal3.cpp — Metal 3 specific command buffer recording via MTL3::MDCommandBufferMTL4::MDCommandBufferSources: drivers/metal/rendering_device_driver_metal.h56-60 drivers/metal/rendering_device_driver_metal3.cpp1-30
PixelFormats (in pixel_formats.h) handles the mapping between RDD::DataFormat and MTL::PixelFormat. This is a separate class (not a static table) because Metal pixel format availability varies per GPU family.
Sources: drivers/metal/pixel_formats.h1-50 drivers/metal/rendering_device_driver_metal.h83
Diagram: Buffer Update Path (CPU to GPU)
Sources: servers/rendering/rendering_device.cpp708-798 servers/rendering/rendering_device_graph.cpp217-270
At startup, RenderingDevice scores available devices by type (from RenderingContextDriver::Device) via _get_device_type_score(). Discrete GPUs score 5, integrated 4 (unless the user prefers integrated via OS::get_singleton()->get_user_prefers_integrated_gpu()), virtual 3, CPU 2, other 1.
Sources: servers/rendering/rendering_device.cpp100-115
RenderingDevice uses _THREAD_SAFE_CLASS_ (a per-instance mutex) for operations that touch shared data structures from multiple threads (e.g., dependency_map, buffer_memory counter). The render_thread_id is stored at initialization; all command-recording methods verify the calling thread via ERR_RENDER_THREAD_GUARD().
Sources: servers/rendering/rendering_device.h64-67 servers/rendering/rendering_device.cpp53-55
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.