This document covers the media and file processing subsystem in g4f, which handles detection, validation, conversion, storage, and serving of various media types (images, audio, video) and document formats (PDF, DOCX, XLSX, etc.). This includes the response type hierarchy for media, the image processing pipeline, document text extraction, the bucket system for file context, and provider-specific upload/download protocols.
For information about using the client library to generate images, see Image Generation. For details on web GUI file management, see File Management & Buckets. For media serving endpoints in the HTTP API, see API Endpoints.
The media type system provides format detection, validation, and type classification for images, audio, and video files. It supports both file-based and data URI-based media inputs.
The system maintains mappings between file extensions and MIME types:
| Category | Extensions | MIME Types |
|---|---|---|
| Images | jpeg, jpg, png, gif, webp | image/jpeg, image/png, image/gif, image/webp |
| Audio | wav, mp3, flac, opus, ogg, m4a | audio/wav, audio/mpeg, audio/flac, audio/opus, audio/ogg, audio/m4a |
| Video | mkv, webm, mp4 | video/x-matroska, video/webm, video/mp4 |
The bidirectional mapping is defined in EXTENSIONS_MAP and MEDIA_TYPE_MAP g4f/image/__init__.py24-45
Format Detection Flow
Sources: g4f/image/__init__.py85-201
The system detects media types through multiple strategies:
get_extension() extracts the file extension and looks it up in EXTENSIONS_MAPis_accepted_format() reads file headers to identify format (e.g., \xFF\xD8\xFF for JPEG)is_data_uri_an_image() validates and extracts MIME type from data URIsdetect_file_type() uses magic numbers to detect 50+ file types g4f/image/__init__.py204-318The ImageType type accepts multiple input formats, which are normalized through helper functions:
Media Conversion Functions
Sources: g4f/image/__init__.py47-475
Key conversion functions:
to_image(): Converts any input to a PIL Image object, with SVG support via cairosvg g4f/image/__init__.py47-83to_bytes(): Converts any input to raw bytes, handling URLs, data URIs, paths, and PIL Images g4f/image/__init__.py364-412to_data_uri(): Converts to base64-encoded data URI format g4f/image/__init__.py413-418to_input_audio(): Converts audio to OpenAI input format with base64 data and format g4f/image/__init__.py420-437The response type system provides structured representations of media outputs from providers, with specialized classes for different media types.
Response Type Class Hierarchy
Sources: g4f/providers/response.py122-458
MediaResponse is the base class for image and video responses g4f/providers/response.py399-419:
Key Features:
get_list(): Normalizes URLs to a list formatget(key, default): Retrieves options with fallbackThe options dictionary commonly contains:
cookies: Cookies needed to access the media URLheaders: HTTP headers for authenticated requestswidth, height: Media dimensionspreview: Preview/thumbnail URLsource_url: Original source URL before copyingImageResponse extends MediaResponse with image-specific rendering g4f/providers/response.py420-433:
Rendering modes:
width and height are available, generates clickable thumbnails with full image linksThe to_string() method intelligently selects the format based on available metadata.
VideoResponse handles video content with optional poster images g4f/providers/response.py434-445:
Supports preview/poster images for video thumbnails when available in options.
AudioResponse represents audio content with optional transcription g4f/providers/response.py349-370:
Key Features:
/media/ URL pathto_uri(): Converts to data URI format or returns path<audio controls> element with optional transcript textImagePreview is a hidden response type that extends ImageResponse but doesn't display in final output g4f/providers/response.py447-448 Used during generation progress updates.
Sources: g4f/providers/response.py122-466
The image processing pipeline handles image conversion, orientation correction, thumbnail generation, and format validation.
Image Processing Pipeline
Sources: g4f/image/__init__.py334-363
The process_image() function handles comprehensive image processing g4f/image/__init__.py334-363:
Processing steps:
ImageOps.exif_transpose() to fix rotated imagesexif=b"" to remove metadata and reduce file sizeThe use_aspect_ratio() function calculates width and height from aspect ratio strings g4f/image/__init__.py439-453:
Maps common aspect ratios to pixel dimensions via get_width_height() g4f/image/__init__.py454-466:
"1:1": 1024×1024 (square)"16:9": 832×480 (widescreen)"9:16": 480×832 (portrait)Used extensively in provider implementations like PollinationsAI g4f/Provider/PollinationsAI.py356-363
During file upload, the backend API generates thumbnails for images g4f/gui/server/backend_api.py452-464:
process_image() to generate thumbnailbucket_dir/thumbnail/ directorySources: g4f/image/__init__.py334-475 g4f/gui/server/backend_api.py448-464
The file processing subsystem extracts text content from various document formats for use as context in conversations.
Supported Document Types
Sources: g4f/tools/files.py84-122
File Upload and Processing Sequence
Sources: g4f/gui/server/backend_api.py413-486 g4f/tools/files.py149-213
The system uses MarkItDown for universal document conversion g4f/gui/server/backend_api.py429-438:
MarkItDown capabilities:
Converted content is saved as .md files for efficient caching.
The stream_read_files() function provides format-specific extraction g4f/tools/files.py149-213:
PDF extraction (three libraries, fallback order):
extract_text() g4f/tools/files.py187-188DOCX extraction (two libraries):
XLSX extraction:
HTML extraction:
scrape_text() helperZIP archives:
To optimize repeated access, extracted text is cached g4f/tools/files.py215-244:
Caching Strategy
The cache_stream() function implements streaming cache:
plain.cache existsplain.cache when completeChunking respects markdown code block boundaries to avoid breaking syntax g4f/tools/files.py228-229
For advanced text processing, the system supports spacy-based refinement g4f/tools/files.py124-141:
This extracts the two longest sentences per page as a summary. The refined text is cached separately in spacy_NNNN.cache files g4f/tools/files.py262-285
Sources: g4f/tools/files.py1-574 g4f/gui/server/backend_api.py413-486 g4f/integration/markitdown.py
The media storage subsystem manages persistent storage of generated and uploaded media files with organized directory structures and filename generation.
Storage Directory Structure
Sources: g4f/image/copy_images.py22-30 g4f/files.py
The system uses multiple storage directories g4f/image/copy_images.py22-30:
generated_images/: Legacy directory, checked first for backwards compatibilitygenerated_media/: Primary storage for all generated mediahar_and_cookies/.buckets/<bucket_id>/: Bucket-specific storage for uploadsThe get_media_dir() function returns the appropriate directory, preferring generated_images/ if it exists.
Generated filenames follow a structured format g4f/image/copy_images.py110-117:
{timestamp}_{tags}_{alt}_{hash}{extension}
Components:
int(time.time()) - Unix timestamp for uniqueness and sorting+ separator (model name, aspect ratio, etc.).jpg, .png, .mp3, .mp4, etc.)All components are sanitized with secure_filename() to ensure filesystem safety g4f/tools/files.py
Media Copy and Download Flow
Sources: g4f/image/copy_images.py119-216
The copy_media() function is the core media storage utility g4f/image/copy_images.py119-216:
Features:
asyncio.gather() for concurrent processingLast-Modified header for caching(uri, target_path) tupleThe function handles errors gracefully, returning the original URL if download fails.
Media files are served through Flask routes g4f/gui/server/backend_api.py271-283:
All routes map to serve_images() which calls send_from_directory() with the media directory g4f/gui/server/api.py148-150
When copying media, the system can track original URLs via query parameters g4f/image/copy_images.py204:
/media/filename.jpg?url=https%3A%2F%2Foriginal.com%2Fimage.jpg
The bucket file serving endpoint extracts and redirects to source URLs when local files don't exist g4f/gui/server/backend_api.py501-504:
This enables lazy loading: media is served from original URLs until explicitly downloaded.
Sources: g4f/image/copy_images.py1-216 g4f/gui/server/backend_api.py148-504
The system supports multiple media upload and download patterns, including bucket-based uploads for conversation context and provider-specific upload protocols.
Bucket File Upload Sequence
Sources: g4f/gui/server/backend_api.py413-486
The bucket upload endpoint handles file uploads with automatic conversion g4f/gui/server/backend_api.py413-486:
Request:
POST /backend-api/v2/files/<bucket_id>
Content-Type: multipart/form-data
files: [File, File, ...]
x-recognition-language: en (optional, for OCR)
Response:
Processing steps:
get_tempfile().md, .json, or .zip: attempt MarkItDown conversionmedia/, generate thumbnail in thumbnail/.mdfiles.txt with list of document filenamesThe system supports automatic URL downloading for bucket context g4f/tools/files.py413-489:
Bucket Download Flow
Sources: g4f/tools/files.py413-527
Clients can upload a downloads.json file to trigger automatic URL fetching g4f/tools/files.py490-502:
Parameters:
The download_urls() function fetches URLs concurrently g4f/tools/files.py413-489:
max_depth > 0, extracts and follows linksget_filename_from_url() with URL hashis_allowed_extension() and supports_filename()Providers implement different upload patterns for media inputs:
Provider-Specific Upload Patterns
The render_messages() function converts media references to provider-specific formats g4f/tools/media.py:
to_data_uri() or to_bytes() as neededimage_url or inline contentto_input_audio() for audio filesUsed extensively in PollinationsAI g4f/Provider/PollinationsAI.py475 and other providers to normalize media inputs.
The client library handles image generation with automatic downloading g4f/client/__init__.py392-434:
Response format options:
None (default): Downloads media and returns local /media/ paths"url": Returns original URLs without downloading"b64_json": Downloads and encodes as base64 stringsThe _process_image_response() method handles format conversion and downloading g4f/client/__init__.py550-588
Sources: g4f/gui/server/backend_api.py413-486 g4f/tools/files.py413-527 g4f/client/__init__.py392-588 g4f/Provider/PollinationsAI.py321-438
The bucket system provides a mechanism for injecting file context into conversations, allowing uploaded documents and media to be referenced in prompts.
Bucket System Architecture
Sources: g4f/tools/run_tools.py88-110 g4f/tools/files.py246-261 g4f/gui/server/backend_api.py386-505
The bucket tool is one of the built-in tool handlers g4f/tools/run_tools.py88-110:
How it works:
{"bucket_id": "xyz"} patternread_bucket() to load cached textThe BUCKET_INSTRUCTIONS constant g4f/tools/run_tools.py32-34:
Make sure to add the sources of cites using [[domain]](Url) notation
after the reference. Example: [[a-z0-9.]](http://example.com)
The read_bucket() function efficiently streams cached content g4f/tools/files.py246-261:
Cache file priority:
This design supports chunked processing for large documents while maintaining backwards compatibility.
Four endpoints manage bucket operations g4f/gui/server/backend_api.py386-505:
POST /backend-api/v2/files/<bucket_id>
Accepts multipart file uploads, processes with MarkItDown, generates thumbnails. See Media Upload & Download Flows
GET /backend-api/v2/files/<bucket_id>/stream
Streams extracted text with SSE events g4f/gui/server/backend_api.py386-388:
data: {"action": "download", "count": 5}
data: {"action": "load", "size": 12345}
data: {"action": "refine", "size": 8192}
data: {"action": "media", "filename": "photo.jpg"}
data: {"action": "done", "size": 12345}
GET /backend-api/v2/files/<bucket_id>
DELETE /backend-api/v2/files/<bucket_id>
GET returns plain text or SSE stream. DELETE removes entire bucket directory g4f/gui/server/backend_api.py390-411
GET /files/<bucket_id>/media/<filename>
GET /files/<bucket_id>/thumbnail/<filename>
Serves uploaded media and thumbnails, with fallback to source URL g4f/gui/server/backend_api.py487-504
The get_streaming() function orchestrates the complete flow g4f/tools/files.py559-568:
Bucket Content Streaming Flow
The system efficiently handles both initial processing and cached retrieval, with optional spacy refinement for large documents.
Sources: g4f/tools/run_tools.py32-146 g4f/tools/files.py149-574 g4f/gui/server/backend_api.py386-505
The media and file processing subsystem provides comprehensive handling of images, audio, video, and documents throughout the g4f platform:
ImageResponse, AudioResponse, VideoResponse) with HTML/markdown renderingThis infrastructure enables seamless media generation, document understanding, and file management across all interfaces (CLI, GUI, API, MCP).
Refresh this wiki