LibMedia is the media playback framework used by Ladybird to decode and render audio and video content embedded in web pages. It covers container parsing (Matroska/WebM and any format supported by FFmpeg), codec decoding via FFmpeg, synchronized audio and video output, and a state-machine-driven playback controller.
This page documents the internal architecture of LibMedia. For how <video> and <audio> HTML elements interact with this library through the DOM, see the HTML subsystem documentation (5).
Component Relationships
Sources: Libraries/LibMedia/PlaybackManager.h1-228 Libraries/LibMedia/Demuxer.h1-64 Libraries/LibMedia/CMakeLists.txt1-60
IncrementallyPopulatedStreamLibraries/LibMedia/IncrementallyPopulatedStream.h1-124 Libraries/LibMedia/IncrementallyPopulatedStream.cpp1-293
IncrementallyPopulatedStream implements the MediaStream interface and serves as the bridge between network-delivered byte data and the demuxer layer. It stores data in non-contiguous DataChunk objects keyed by byte offset in an AK::RedBlackTree.
Multiple independent Cursor objects may read from the same stream simultaneously — one per track context — each maintaining its own position.
Key behaviors:
| Method | Description |
|---|---|
create_empty() | Creates an empty stream, populated later via add_chunk_at() |
create_from_data(ReadonlyBytes) | Creates a fully-populated, closed stream for static data |
add_chunk_at(offset, data) | Adds a data chunk; merges adjacent or overlapping chunks |
set_data_request_callback(cb) | Called when a cursor is blocked waiting for data at an offset |
close() | Signals end-of-stream and sets m_expected_size |
create_cursor() | Returns a new Cursor that can seek and read independently |
When a Cursor::read_into() is called and the required data is not yet available, the cursor sets m_blocked = true and waits on m_state_changed. The stream triggers the DataRequestCallback to request more data from the network. The constants PRECEDING_DATA_SIZE (1 KiB) and FORWARD_REQUEST_THRESHOLD (1 MiB) govern prefetch heuristics.
Cursor::abort() is used to interrupt a blocking read (e.g. when a seek is initiated): it sets m_aborted = true and broadcasts to unblock the thread.
Sources: Libraries/LibMedia/IncrementallyPopulatedStream.cpp13-16 Libraries/LibMedia/IncrementallyPopulatedStream.cpp160-190
Demuxer InterfaceLibraries/LibMedia/Demuxer.h1-64
Demuxer is the abstract base class for all container parsers. It is reference-counted via AtomicRefCounted.
| Virtual Method | Purpose |
|---|---|
create_context_for_track(Track) | Allocates per-track state (e.g. a seek iterator) |
get_tracks_for_type(TrackType) | Lists all tracks of a given type |
get_preferred_track_for_type(TrackType) | Returns the container's recommended track |
get_next_sample_for_track(Track) | Returns the next CodedFrame (blocks until data available) |
seek_to_most_recent_keyframe(Track, Duration, Options) | Seeks the read position to the last keyframe at or before the timestamp |
get_codec_id_for_track(Track) | Returns the CodecID for decoder initialization |
get_codec_initialization_data_for_track(Track) | Returns codec-private data (e.g. SPS/PPS for H.264) |
set_blocking_reads_aborted_for_track(Track) | Aborts any blocking read on a track's cursor |
is_read_blocked_for_track(Track) | Returns whether a track's cursor is currently waiting for data |
DemuxerSeekResult is either MovedPosition (decoder needs a flush) or KeptCurrentPosition (already positioned close enough to avoid a flush).
MatroskaDemuxer and the Matroska ReaderLibraries/LibMedia/Containers/Matroska/MatroskaDemuxer.h1-69 Libraries/LibMedia/Containers/Matroska/MatroskaDemuxer.cpp1-226
MatroskaDemuxer wraps the Matroska Reader and adds per-track TrackStatus (containing a SampleIterator, current Block, decoded frames vector, and frame_index).
Reader (Libraries/LibMedia/Containers/Matroska/Reader.h1-152 Libraries/LibMedia/Containers/Matroska/Reader.cpp1-873) is the EBML/Matroska parser. Parsing proceeds as follows:
from_stream()
→ parse_initial_data()
→ parse_ebml_header() (DocType, DocTypeVersion)
→ parse_segment_information() (TimestampScale, Duration)
→ parse_tracks() (TrackEntry list)
→ find_first_top_level_element_with_id(CLUSTER)
→ parse_cues() (seek index, if present)
parse_master_element() is the core recursive-descent function, reading EBML variable-length integers and dispatching to element consumers by ID.
Reader::fix_ffmpeg_webm_quirk() corrects timestamp offsets in files muxed by libavformat ≤ 59.30.100, which incorrectly applied codec delay scaling.
SampleIterator maintains a position in the stream for one track and provides next_block() / get_frames(Block). Block lacing modes supported: None, XIPH, FixedSize, EBML.
Reader::seek_to_random_access_point() uses the Cues index when available; otherwise performs a linear scan forward from the current position.
Matroska track data structures:
| Type | Description |
|---|---|
EBMLHeader | doc_type string, doc_type_version |
SegmentInformation | timestamp_scale, duration_unscaled, muxing_app |
TrackEntry | Track number, UID, type, codec ID, codec private data, optional VideoTrack / AudioTrack sub-structs |
Block | Timestamp, duration, lacing type, keyframe flag, data position/size |
Cluster | Cluster-relative timestamp base |
CuePoint / CueTrackPosition | Seek index entries mapping timestamps to cluster+block positions |
Sources: Libraries/LibMedia/Containers/Matroska/Document.h1-247 Libraries/LibMedia/Containers/Matroska/Reader.cpp99-250
FFmpegDemuxerLibraries/LibMedia/FFmpeg/FFmpegDemuxer.h1-88 Libraries/LibMedia/FFmpeg/FFmpegDemuxer.cpp1-333
FFmpegDemuxer wraps libavformat. It creates a separate AVFormatContext per track (in create_context_for_track()) so each track can seek independently without interfering with others.
FFmpegIOContext (Libraries/LibMedia/FFmpeg/FFmpegIOContext.cpp1-100) bridges a MediaStreamCursor to FFmpeg's AVIOContext, providing read and seek callbacks that delegate to MediaStreamCursor::read_into() and seek().
At construction time (from_stream()), FFmpegDemuxer reads all stream metadata from one shared AVFormatContext, closes it, then creates per-track contexts on demand. av_find_best_stream() determines the preferred track for each TrackType.
get_next_sample_for_track() calls av_read_frame() in a loop, discarding packets for other streams, and returns a CodedFrame with packet data copied out.
Both providers follow the same structural pattern: they spawn a dedicated background thread that loops between decoding and waiting, using a ThreadData inner class (which is separately ref-counted so it can outlive the outer provider object).
Thread lifecycle (shared pattern):
Sources: Libraries/LibMedia/Providers/VideoDataProvider.h100-107 Libraries/LibMedia/Providers/AudioDataProvider.h98-105
VideoDataProviderLibraries/LibMedia/Providers/VideoDataProvider.h1-139 Libraries/LibMedia/Providers/VideoDataProvider.cpp1-556
Spawns a "Video Decoder" thread. Decodes coded frames from the Demuxer using FFmpegVideoDecoder and queues decoded TimedImage objects (up to m_queue_max_size, default 4).
Thread main loop (ThreadData):
wait_for_start()
loop:
handle_suspension() → clears queue, drops decoder, waits
handle_seek() → seeks demuxer, decodes to target frame
push_data_and_decode_some_frames()
→ get_next_sample_for_track()
→ decoder->receive_coded_data()
→ decoder->get_decoded_frame()
→ queue frame if queue not full
→ if queue full: block until frame consumed or seek arrives
Seek modes are handled in handle_seek():
Accurate: decodes all frames until past the target timestamp, returns the frame just before it.FastBefore: seeks to nearest keyframe, decodes one frame.FastAfter: finds the first keyframe after the target.frames_queue_is_full_handler is called on the main thread when the queue fills, used to signal the playback state handler that buffering can stop.
AudioDataProviderLibraries/LibMedia/Providers/AudioDataProvider.h1-134 Libraries/LibMedia/Providers/AudioDataProvider.cpp1-499
Spawns an "Audio Decoder" thread. Uses FFmpegAudioDecoder to decode coded audio frames and FFmpegAudioConverter to resample/convert to the output SampleSpecification. Queues AudioBlock objects (up to m_queue_max_size, default 8).
retrieve_block() is called by the AudioMixingSink on the audio output thread to pull decoded samples.
DisplayingVideoSinkLibraries/LibMedia/Sinks/DisplayingVideoSink.h1-60 Libraries/LibMedia/Sinks/DisplayingVideoSink.cpp1-107
DisplayingVideoSink sits between VideoDataProvider and the renderer. Callers (e.g. the <video> element paint path) call update() on each vsync. update() checks the MediaTimeProvider's current time against m_next_frame.timestamp() and advances m_current_frame as needed.
current_frame() returns a RefPtr<Gfx::ImmutableBitmap> suitable for painting.
pause_updates() / resume_updates() are used during seeking to prevent stale frames from appearing. m_on_start_buffering is a callback invoked when the provider's cursor is blocked (no more frames in the queue and no data available yet).
AudioMixingSinkAudioMixingSink drives audio output and serves as the MediaTimeProvider when audio is present. Audio time is authoritative because the OS audio clock is the most stable time source for A/V sync.
When audio is not present, a GenericTimeProvider (simple wall-clock time) is used instead.
PlaybackManagerLibraries/LibMedia/PlaybackManager.h1-228 Libraries/LibMedia/PlaybackManager.cpp1-307
PlaybackManager is the top-level controller. It is not ref-counted — callers own it via NonnullOwnPtr returned from create().
Initialization sequence:
Sources: Libraries/LibMedia/PlaybackManager.cpp24-156
Format detection uses Matroska::Reader::is_matroska_or_webm() to probe the stream header; if that returns false, FFmpegDemuxer is used as the fallback.
Public API summary:
| Method | Description |
|---|---|
add_media_source(stream) | Starts media init on a background thread |
get_or_create_the_displaying_video_sink_for_track(track) | Creates a DisplayingVideoSink for the track and starts its provider |
remove_the_displaying_video_sink_for_track(track) | Detaches the sink from its provider |
enable_an_audio_track(track) | Connects the track's provider to AudioMixingSink |
disable_an_audio_track(track) | Disconnects the track's provider |
play() / pause() / seek(ts, mode) | Delegates to current PlaybackStateHandler |
state() | Returns the current PlaybackState enum value |
current_time() | Returns min(time_provider->current_time(), duration) |
Thread-safety: background threads hold a WeakPlaybackManager (backed by WeakPlaybackManagerLink, a separately heap-allocated link object). When posting callbacks back to the main thread via deferred_invoke, the callback re-checks is_alive() before accessing the manager.
States and handlers:
PlaybackState | Handler class | Notes |
|---|---|---|
Paused | PausedStateHandler | Providers suspended after DEFAULT_SUSPEND_TIMEOUT_MS (10 s) |
Playing | PlayingStateHandler | Normal decode-and-display loop |
Buffering | BufferingStateHandler | Waiting for the decoder queue to fill |
Seeking | SeekingStateHandler | Coordinates seeks across all active tracks |
Suspended | SuspendedStateHandler | Decoder resources released (long pause) |
Transitions are performed via PlaybackManager::replace_state_handler<T>(), which calls on_exit() on the outgoing handler, swaps the m_handler pointer, calls on_enter() on the new handler, and dispatches on_playback_state_change.
State transition diagram:
Sources: Libraries/LibMedia/PlaybackStates/PlaybackState.h1-21 Libraries/LibMedia/PlaybackStates/Forward.h1-27 Libraries/LibMedia/PlaybackStates/SeekingStateHandler.h1-186 Libraries/LibMedia/PlaybackStates/PausedStateHandler.h1-40
Seeking state detail:
SeekingStateHandler::begin_seek() creates a SeekData ref-counted struct, pauses video sink updates, then launches seeks on every enabled track concurrently. Each seek completion callback increments video_seeks_completed or audio_seeks_completed. When all counts match their in-flight totals, possibly_complete_seek() sets the time provider to the resolved timestamp, resumes video sink updates, and calls resume() to transition out of seeking.
If a new seek() call arrives before the previous one completes, the seek_id counter mechanism ensures that stale completion callbacks are ignored.
Sources: Libraries/LibMedia/PlaybackStates/SeekingStateHandler.h63-178
Threads created by LibMedia:
Sources: Libraries/LibMedia/PlaybackManager.cpp138-156 Libraries/LibMedia/Providers/VideoDataProvider.cpp21-42 Libraries/LibMedia/Providers/AudioDataProvider.cpp21-44
All cross-thread data delivery uses lock-protected queues (m_queue in ThreadData). Decoder threads call invoke_on_main_thread_while_locked() to post error/completion callbacks as deferred_invoke closures on the main event loop.
CodecID (Libraries/LibMedia/CodecID.h1-151) enumerates the codecs LibMedia can represent. All decoding goes through FFmpeg wrappers. The mapping between CodecID and AVCodecID is in FFmpegHelpers.h (Libraries/LibMedia/FFmpeg/FFmpegHelpers.h19-128).
Supported video codecs:
CodecID | Format |
|---|---|
VP8, VP9 | On2 / Google |
H261, MPEG1, H262, H263, H264, H265 | MPEG family |
AV1 | AOMedia |
Theora | Xiph |
Supported audio codecs:
CodecID | Format |
|---|---|
MP3, AAC | MPEG |
Vorbis, Opus, FLAC | Xiph / open |
U8, S16LE, S24LE, S32LE, S64LE, F32LE, F64LE, ALaw, MuLaw | PCM variants |
Matroska codec IDs (e.g. "V_VP9", "A_OPUS") are mapped to CodecID values in Libraries/LibMedia/Containers/Matroska/Utilities.h via codec_id_from_matroska_id_string().
Sources: Libraries/LibMedia/CodecID.h14-45 Libraries/LibMedia/FFmpeg/FFmpegHelpers.h19-128
The audio output layer is platform-specific and selected at build time via LADYBIRD_AUDIO_BACKEND:
| Backend | Source file | Platform |
|---|---|---|
| PulseAudio | Audio/PlaybackStreamPulseAudio.cpp | Linux |
| AudioUnit | Audio/PlaybackStreamAudioUnit.cpp | macOS |
| WASAPI | Audio/PlaybackStreamWasapi.cpp | Windows |
| Stub | Audio/PlaybackStream.cpp | Fallback |
Refresh this wiki