This document describes the media storage system in g4f, which handles the persistence, organization, and serving of generated and uploaded media files (images, audio, video). The system manages local file storage, filename generation, format detection, and HTTP serving endpoints.
For information about media type detection and validation, see Media Type System. For image processing and transformation, see Image Processing Pipeline. For provider-specific upload protocols, see Media Upload Flows.
The media storage system uses a primary directory for storing all generated and downloaded media files. The storage location follows a fallback pattern prioritized by filesystem access.
Directory Selection Logic
Sources: g4f/image/copy_images.py26-30
The get_media_dir() function determines the active storage directory:
./generated_images (legacy path, checked first)./generated_media (current standard)The system checks read access (os.R_OK) on ./generated_images and uses it if accessible; otherwise, it defaults to ./generated_media. This fallback mechanism maintains backward compatibility with existing deployments.
The ensure_media_dir() function creates the target directory structure if it doesn't exist, using os.makedirs() with exist_ok=True.
Sources: g4f/image/copy_images.py44-47
Media files are stored with composite filenames that encode metadata for identification, deduplication, and organization. The filename structure ensures uniqueness while maintaining searchability.
Filename Structure
Sources: g4f/image/copy_images.py110-117
The get_filename() function constructs filenames with four components:
{int(time.time())}_): Unix epoch for chronological sorting and cache management{secure_filename(tags + alt)}_): Sanitized tags and alt text for human readability{sha256[:16]}): 16-character SHA-256 hash of timestamp or image URL for uniqueness.{ext}): File format indicator from EXTENSIONS_MAPExample filename: 1704123456_flux+anime+character_a1b2c3d4e5f67890.jpg
The secure_filename() function sanitizes the descriptive segment by:
Sources: g4f/tools/files.py77 (imported via g4f/image/copy_images.py17)
Dynamic Filename Updates
For downloaded media, filenames may be updated based on HTTP response headers:
Sources: g4f/image/copy_images.py57-60
This replaces the initial timestamp with the server's Last-Modified or Date header timestamp, enabling proper caching behavior.
The storage system supports multiple media formats across images, audio, and video categories. Format detection occurs through both MIME type mapping and binary signature analysis.
| Category | Extensions | MIME Types |
|---|---|---|
| Images | jpeg, jpg, png, gif, webp | image/jpeg, image/png, image/gif, image/webp |
| Audio | wav, mp3, flac, opus, ogg, m4a | audio/wav, audio/mpeg, audio/flac, audio/opus, audio/ogg, audio/m4a |
| Video | mkv, webm, mp4 | video/x-matroska, video/webm, video/mp4 |
Sources: g4f/image/__init__.py24-45
Extension Resolution Strategy
Sources: g4f/image/copy_images.py32-42
The get_media_extension() function attempts to extract the extension from:
urlparse().path)os.path.splitext())For downloaded content without clear extensions, the system reads the file header and uses is_accepted_format() to detect the format from binary signatures (magic numbers).
Sources: g4f/image/__init__.py177-201 g4f/image/copy_images.py189-200
The save_response_media() function handles streaming media responses from AI providers and persists them to disk.
Sources: g4f/image/copy_images.py62-108
Function Signature and Behavior
The function:
content_type from response dict or headersget_filename(tags, prompt, extension, prompt)Response Object Construction
The function yields different response types based on content:
AudioResponse(media_url, transcript, source_url=source_url)VideoResponse(media_url, prompt, source_url=source_url)ImageResponse(media_url, prompt, source_url=source_url)The media_url format is /media/{filename}, which maps to the Flask serving endpoint.
Sources: g4f/image/copy_images.py103-108
The copy_media() function downloads external media URLs and stores them locally, with support for authentication, proxying, and format detection.
Sources: g4f/image/copy_images.py119-215
Function Parameters
| Parameter | Type | Purpose |
|---|---|---|
images | list[str] | URLs or data URIs to download |
cookies | Optional[Cookies] | Authentication cookies |
headers | Optional[dict] | HTTP headers for requests |
proxy | Optional[str] | Proxy server URL |
alt | str | Alt text for filename generation |
tags | list[str] | Tags for filename generation |
add_url | Union[bool, str] | Append source URL as query param |
target | str | Explicit target path (optional) |
thumbnail | bool | Store in thumbnails subdirectory |
ssl | bool | SSL verification setting |
timeout | Optional[int] | Request timeout in seconds |
return_target | bool | Return (url, filepath) tuples |
Source URL Preservation
When add_url=True and the media is not a data URI, the original source URL is appended as a query parameter:
/media/1704123456_image_abc123.jpg?url=https://example.com/original.jpg
This enables reconstruction of the original URL for attribution or re-downloading.
Sources: g4f/image/copy_images.py204
The Flask backend provides multiple endpoints for serving stored media files, with support for thumbnails, search, and source URL redirection.
Sources: g4f/gui/server/backend_api.py271-282 g4f/gui/server/api.py148-150
All three routes (/images/, /media/, /thumbnail/) map to the same serve_images() handler:
Flask's send_from_directory() provides secure file serving with:
Sources: g4f/gui/server/api.py148-150
The bucket system provides structured storage for uploaded files with separate media and thumbnail directories.
Sources: g4f/gui/server/backend_api.py487-504
Fallback to Source URL
When a requested file doesn't exist in the bucket, the endpoint attempts to extract the original source URL from the query string:
The get_source_url() function parses url= parameters and validates that the decoded URL starts with http:// or https://.
Sources: g4f/image/copy_images.py49-55
The /search/<search> endpoint provides tag-based media discovery across the entire media directory.
Sources: g4f/gui/server/backend_api.py508-537
Search Query Format
Searches use +-separated tokens: /search/image+anime+character
The search algorithm:
Query Parameters
| Parameter | Type | Purpose |
|---|---|---|
min | int | Minimum match count threshold |
skip | int | Offset for pagination |
random | bool/str | Random selection (optionally seeded) |
Example: /search/image+flux?skip=5&random=seed123 returns a random image matching "flux" from the 6th result onward, using "seed123" for reproducibility.
Sources: g4f/gui/server/backend_api.py529-537
The bucket system provides isolated storage contexts for file uploads, document processing, and bucket-specific media management. Each bucket is identified by a sanitized bucket ID.
har_and_cookies/buckets/
├── {bucket_id}/
│ ├── files.txt # List of processed filenames
│ ├── {filename}.md # Extracted document text
│ ├── media/
│ │ └── {image}.jpg # Uploaded media files
│ ├── thumbnail/
│ │ └── {image}.jpg # Generated thumbnails
│ ├── plain.cache # Cached processed text
│ └── plain_0001.cache # Split cache chunks
Sources: g4f/gui/server/backend_api.py413-486
Bucket Creation and File Upload
Sources: g4f/gui/server/backend_api.py413-485
File Processing Logic
The upload endpoint processes files based on their type:
Media files (images, audio, video):
is_allowed_extension(filename)media/ subdirectoryDocument files (PDF, DOCX, XLSX, etc):
{filename}.mdfiles.txt for text extractionUnsupported files:
The files.txt manifest tracks all processed documents for later retrieval by the bucket streaming system.
Sources: g4f/gui/server/backend_api.py482-485
| Endpoint | Method | Purpose |
|---|---|---|
POST /files/{bucket_id} | POST | Upload files to bucket |
GET /files/{bucket_id} | GET | Retrieve processed text from bucket |
GET /files/{bucket_id}/stream | GET | Stream bucket contents as SSE |
DELETE /files/{bucket_id} | DELETE | Remove entire bucket directory |
Sources: g4f/gui/server/backend_api.py386-411
Streaming Bucket Contents
The GET /files/{bucket_id} endpoint streams processed document text, with optional file deletion after processing:
Query parameters:
delete_files: Remove files after streaming (default: True)refine_chunks_with_spacy: Use spaCy for text refinement (default: False)The response is sent as text/event-stream for real-time progress updates or text/plain for direct text output.
Sources: g4f/gui/server/backend_api.py386-411 g4f/tools/files.py559-568
The storage system uses multiple caching strategies for document processing and text extraction:
| Cache File | Purpose |
|---|---|
plain.cache | Full concatenated text from all processed files |
plain_NNNN.cache | Split cache chunks (~1MB each) |
spacy_NNNN.cache | Refined text chunks after spaCy processing |
Sources: g4f/tools/files.py216-227 g4f/tools/files.py262-284
Cache Generation Flow
Sources: g4f/tools/files.py216-227 g4f/tools/files.py262-284 g4f/tools/files.py286-319
Cache File Splitting
The split_file_by_size_and_newline() function divides large cache files into manageable chunks:
is_complete() (checks for closed code blocks)This chunking enables:
Sources: g4f/tools/files.py286-319
Source URL Tracking
Media files preserve their original source URLs through two mechanisms:
Query parameter encoding: Appended to /media/ URLs when add_url=True
/media/1704123456_image_abc.jpg?url=https://example.com/image.jpg
Response object storage: ImageResponse, AudioResponse, and VideoResponse objects include a source_url option in their constructor.
Sources: g4f/image/copy_images.py98-108
Response Timestamp Updates
When downloading media from external sources, the system updates file timestamps based on HTTP headers:
This ensures that:
Sources: g4f/image/copy_images.py57-60
Downloads Manifest
The bucket system maintains a downloads.json file for tracking external URL downloads:
This manifest drives the download_urls() function, which crawls and stores linked content recursively.
Sources: g4f/tools/files.py490-502
The media storage system integrates with the response processing pipeline to automatically persist media from provider responses.
Sources: g4f/gui/server/api.py226-254
Conditional Media Download
The backend API conditionally downloads media based on:
download_media parameter: Explicitly controls download behaviorSources: g4f/gui/server/api.py228-242
Target Path Returns
When return_target=True, copy_media() returns (url, filepath) tuples instead of just URLs. This enables:
Sources: g4f/gui/server/api.py244-253
The g4f media storage system provides a comprehensive solution for managing generated and uploaded media files:
./generated_media with automatic directory creationThe system integrates seamlessly with the provider response pipeline, automatically persisting media while maintaining original source URLs for attribution and cache validation.
Refresh this wiki