Media & File Processing

Relevant source files

This document covers the media and file processing subsystem in g4f, which handles detection, validation, conversion, storage, and serving of various media types (images, audio, video) and document formats (PDF, DOCX, XLSX, etc.). This includes the response type hierarchy for media, the image processing pipeline, document text extraction, the bucket system for file context, and provider-specific upload/download protocols.

For information about using the client library to generate images, see Image Generation. For details on web GUI file management, see File Management & Buckets. For media serving endpoints in the HTTP API, see API Endpoints.

Media Type System

The media type system provides format detection, validation, and type classification for images, audio, and video files. It supports both file-based and data URI-based media inputs.

Supported Formats

The system maintains mappings between file extensions and MIME types:

Category	Extensions	MIME Types
Images	`jpeg`, `jpg`, `png`, `gif`, `webp`	`image/jpeg`, `image/png`, `image/gif`, `image/webp`
Audio	`wav`, `mp3`, `flac`, `opus`, `ogg`, `m4a`	`audio/wav`, `audio/mpeg`, `audio/flac`, `audio/opus`, `audio/ogg`, `audio/m4a`
Video	`mkv`, `webm`, `mp4`	`video/x-matroska`, `video/webm`, `video/mp4`

The bidirectional mapping is defined in EXTENSIONS_MAP and MEDIA_TYPE_MAP g4f/image/__init__.py24-45

Format Detection

Format Detection Flow

Sources: g4f/image/__init__.py85-201

The system detects media types through multiple strategies:

Extension-based: get_extension() extracts the file extension and looks it up in EXTENSIONS_MAP
Magic number: is_accepted_format() reads file headers to identify format (e.g., \xFF\xD8\xFF for JPEG)
Data URI parsing: is_data_uri_an_image() validates and extracts MIME type from data URIs
Comprehensive detection: detect_file_type() uses magic numbers to detect 50+ file types g4f/image/__init__.py204-318

Media Input Handling

The ImageType type accepts multiple input formats, which are normalized through helper functions:

Media Conversion Functions

Sources: g4f/image/__init__.py47-475

Key conversion functions:

to_image(): Converts any input to a PIL Image object, with SVG support via cairosvg g4f/image/__init__.py47-83
to_bytes(): Converts any input to raw bytes, handling URLs, data URIs, paths, and PIL Images g4f/image/__init__.py364-412
to_data_uri(): Converts to base64-encoded data URI format g4f/image/__init__.py413-418
to_input_audio(): Converts audio to OpenAI input format with base64 data and format g4f/image/__init__.py420-437

Response Type Hierarchy

The response type system provides structured representations of media outputs from providers, with specialized classes for different media types.

Response Type Architecture

Response Type Class Hierarchy

Sources: g4f/providers/response.py122-458

MediaResponse Base Class

MediaResponse is the base class for image and video responses g4f/providers/response.py399-419:

Key Features:

URLs: Single URL string or list of URLs
Alt text: Description/prompt used to generate the media
Options dictionary: Stores metadata like cookies, headers, width, height, thumbnails
get_list(): Normalizes URLs to a list format
get(key, default): Retrieves options with fallback

The options dictionary commonly contains:

cookies: Cookies needed to access the media URL
headers: HTTP headers for authenticated requests
width, height: Media dimensions
preview: Preview/thumbnail URL
source_url: Original source URL before copying

ImageResponse

ImageResponse extends MediaResponse with image-specific rendering g4f/providers/response.py420-433:

Rendering modes:

HTML with metadata: When width and height are available, generates clickable thumbnails with full image links
Markdown: Falls back to standard markdown image syntax with optional preview URLs

The to_string() method intelligently selects the format based on available metadata.

VideoResponse

VideoResponse handles video content with optional poster images g4f/providers/response.py434-445:

Supports preview/poster images for video thumbnails when available in options.

AudioResponse

AudioResponse represents audio content with optional transcription g4f/providers/response.py349-370:

Key Features:

Data: Base64-encoded audio or /media/ URL path
Transcript: Optional text transcription of the audio
to_uri(): Converts to data URI format or returns path
HTML rendering: Generates <audio controls> element with optional transcript text

ImagePreview

ImagePreview is a hidden response type that extends ImageResponse but doesn't display in final output g4f/providers/response.py447-448 Used during generation progress updates.

Sources: g4f/providers/response.py122-466

Image Processing Pipeline

The image processing pipeline handles image conversion, orientation correction, thumbnail generation, and format validation.

Image Processing Flow

Image Processing Pipeline

Sources: g4f/image/__init__.py334-363

process_image() Function

The process_image() function handles comprehensive image processing g4f/image/__init__.py334-363:

Processing steps:

EXIF orientation correction: Uses ImageOps.exif_transpose() to fix rotated images
Mode handling:
- RGBA: Preserves transparency (commented code shows white background alternative)
- Non-RGB: Converts to RGB for JPEG compatibility
Thumbnailing: Resizes image to fit within bounds while maintaining aspect ratio
EXIF stripping: Saves with exif=b"" to remove metadata and reduce file size
Return: Returns original size tuple if saving, otherwise returns processed Image

Aspect Ratio Handling

The use_aspect_ratio() function calculates width and height from aspect ratio strings g4f/image/__init__.py439-453:

Maps common aspect ratios to pixel dimensions via get_width_height() g4f/image/__init__.py454-466:

"1:1": 1024×1024 (square)
"16:9": 832×480 (widescreen)
"9:16": 480×832 (portrait)

Used extensively in provider implementations like PollinationsAI g4f/Provider/PollinationsAI.py356-363

Thumbnail Generation in Upload Flow

During file upload, the backend API generates thumbnails for images g4f/gui/server/backend_api.py452-464:

Opens image with PIL
Extracts original dimensions
Calls process_image() to generate thumbnail
Saves to bucket_dir/thumbnail/ directory
Returns dimensions for client-side rendering

Sources: g4f/image/__init__.py334-475 g4f/gui/server/backend_api.py448-464

File Processing & Document Handling

The file processing subsystem extracts text content from various document formats for use as context in conversations.

Supported File Types

Supported Document Types

Sources: g4f/tools/files.py84-122

Document Processing Flow

File Upload and Processing Sequence

Sources: g4f/gui/server/backend_api.py413-486 g4f/tools/files.py149-213

MarkItDown Integration

The system uses MarkItDown for universal document conversion g4f/gui/server/backend_api.py429-438:

MarkItDown capabilities:

Converts Office documents (DOCX, XLSX, PPTX)
Extracts text from PDFs
Processes images with OCR (when language parameter provided)
Handles audio transcription
Converts HTML and other formats to markdown

Converted content is saved as .md files for efficient caching.

Format-Specific Extraction

The stream_read_files() function provides format-specific extraction g4f/tools/files.py149-213:

PDF extraction (three libraries, fallback order):

pdfplumber: Page-by-page extraction with layout preservation g4f/tools/files.py183-186
PyPDF2: Basic text extraction g4f/tools/files.py175-182
pdfminer: High-quality extraction with extract_text() g4f/tools/files.py187-188

DOCX extraction (two libraries):

python-docx: Paragraph-by-paragraph extraction g4f/tools/files.py189-192
docx2txt: Single-pass extraction g4f/tools/files.py193-194

XLSX extraction:

Uses pandas to read Excel files
Iterates rows as tuples, joins cells with spaces g4f/tools/files.py205-208

HTML extraction:

Uses BeautifulSoup4 with scrape_text() helper
Extracts clean text without HTML tags g4f/tools/files.py209-210

ZIP archives:

Recursively extracts and processes supported files
Cleans up extracted files after processing g4f/tools/files.py157-173

Bucket Text Caching

To optimize repeated access, extracted text is cached g4f/tools/files.py215-244:

Caching Strategy

The cache_stream() function implements streaming cache:

Checks if plain.cache exists
If yes: streams from cache in 4-8KB chunks
If no: writes to temporary file while streaming to client
Renames temp file to plain.cache when complete

Chunking respects markdown code block boundaries to avoid breaking syntax g4f/tools/files.py228-229

For advanced text processing, the system supports spacy-based refinement g4f/tools/files.py124-141:

This extracts the two longest sentences per page as a summary. The refined text is cached separately in spacy_NNNN.cache files g4f/tools/files.py262-285

Sources: g4f/tools/files.py1-574 g4f/gui/server/backend_api.py413-486 g4f/integration/markitdown.py

Media Storage

The media storage subsystem manages persistent storage of generated and uploaded media files with organized directory structures and filename generation.

Storage Directory Structure

Storage Directory Structure

Sources: g4f/image/copy_images.py22-30 g4f/files.py

Storage Locations

The system uses multiple storage directories g4f/image/copy_images.py22-30:

generated_images/: Legacy directory, checked first for backwards compatibility
generated_media/: Primary storage for all generated media
har_and_cookies/.buckets/<bucket_id>/: Bucket-specific storage for uploads

The get_media_dir() function returns the appropriate directory, preferring generated_images/ if it exists.

Filename Generation

Generated filenames follow a structured format g4f/image/copy_images.py110-117:

{timestamp}_{tags}_{alt}_{hash}{extension}

Components:

Timestamp: int(time.time()) - Unix timestamp for uniqueness and sorting
Tags: Concatenated with + separator (model name, aspect ratio, etc.)
Alt text: Sanitized prompt/description
Hash: First 16 chars of SHA256 hash (image URL or timestamp)
Extension: File format (.jpg, .png, .mp3, .mp4, etc.)

All components are sanitized with secure_filename() to ensure filesystem safety g4f/tools/files.py

Copy and Download Flow

Media Copy and Download Flow

Sources: g4f/image/copy_images.py119-216

copy_media() Function

The copy_media() function is the core media storage utility g4f/image/copy_images.py119-216:

Features:

Parallel downloads: Uses asyncio.gather() for concurrent processing
Cookie/header forwarding: Passes authentication to aiohttp session
Timestamp updates: Updates filename with Last-Modified header for caching
Format detection: Validates content-type and uses magic numbers as fallback
Source URL preservation: Optionally appends original URL as query parameter
Target path support: Can save to specific paths (used for thumbnails)
Return options: Returns URI only or (uri, target_path) tuple

The function handles errors gracefully, returning the original URL if download fails.

Media Serving Endpoints

Media files are served through Flask routes g4f/gui/server/backend_api.py271-283:

All routes map to serve_images() which calls send_from_directory() with the media directory g4f/gui/server/api.py148-150

Source URL Tracking

When copying media, the system can track original URLs via query parameters g4f/image/copy_images.py204:

/media/filename.jpg?url=https%3A%2F%2Foriginal.com%2Fimage.jpg

The bucket file serving endpoint extracts and redirects to source URLs when local files don't exist g4f/gui/server/backend_api.py501-504:

This enables lazy loading: media is served from original URLs until explicitly downloaded.

Sources: g4f/image/copy_images.py1-216 g4f/gui/server/backend_api.py148-504

Media Upload & Download Flows

The system supports multiple media upload and download patterns, including bucket-based uploads for conversation context and provider-specific upload protocols.

Bucket Upload Flow

Bucket File Upload Sequence

Sources: g4f/gui/server/backend_api.py413-486

Bucket Upload Endpoint

The bucket upload endpoint handles file uploads with automatic conversion g4f/gui/server/backend_api.py413-486:

Request:

POST /backend-api/v2/files/<bucket_id>
Content-Type: multipart/form-data

files: [File, File, ...]
x-recognition-language: en  (optional, for OCR)

Response:

Processing steps:

Sanitize bucket_id and create directory structure
For each uploaded file:
- Save to temporary location with get_tempfile()
- If not .md, .json, or .zip: attempt MarkItDown conversion
- If media: save to media/, generate thumbnail in thumbnail/
- If document: save original to bucket root, save converted to .md
Write files.txt with list of document filenames
Return metadata including media dimensions

Bucket Download Flow

The system supports automatic URL downloading for bucket context g4f/tools/files.py413-489:

Bucket Download Flow

Sources: g4f/tools/files.py413-527

downloads.json Format

Clients can upload a downloads.json file to trigger automatic URL fetching g4f/tools/files.py490-502:

Parameters:

url/urls: Single URL or list of URLs to download
max_depth: Depth for link crawling (HTML pages only)
timeout: Request timeout in seconds
proxy: Proxy server URL

The download_urls() function fetches URLs concurrently g4f/tools/files.py413-489:

MarkItDown conversion: Attempts URL-to-text conversion first
Direct download: Falls back to HTTP GET with streaming
Link extraction: For HTML at max_depth > 0, extracts and follows links
File naming: Uses get_filename_from_url() with URL hash
Content validation: Checks is_allowed_extension() and supports_filename()
Canonical links: Injects canonical link into HTML for source tracking

Provider Upload Protocols

Providers implement different upload patterns for media inputs:

Provider-Specific Upload Patterns

Media Rendering in Tools

The render_messages() function converts media references to provider-specific formats g4f/tools/media.py:

URL detection: Checks for HTTP URLs, file paths, bucket references
Format conversion: Calls to_data_uri() or to_bytes() as needed
Vision message format: Constructs OpenAI-compatible message with image_url or inline content
Audio handling: Uses to_input_audio() for audio files

Used extensively in PollinationsAI g4f/Provider/PollinationsAI.py475 and other providers to normalize media inputs.

Client-Side Image Generation

The client library handles image generation with automatic downloading g4f/client/__init__.py392-434:

Response format options:

None (default): Downloads media and returns local /media/ paths
"url": Returns original URLs without downloading
"b64_json": Downloads and encodes as base64 strings

The _process_image_response() method handles format conversion and downloading g4f/client/__init__.py550-588

Sources: g4f/gui/server/backend_api.py413-486 g4f/tools/files.py413-527 g4f/client/__init__.py392-588 g4f/Provider/PollinationsAI.py321-438

Bucket System

The bucket system provides a mechanism for injecting file context into conversations, allowing uploaded documents and media to be referenced in prompts.

Bucket Architecture

Bucket System Architecture

Sources: g4f/tools/run_tools.py88-110 g4f/tools/files.py246-261 g4f/gui/server/backend_api.py386-505

Bucket Tool Handler

The bucket tool is one of the built-in tool handlers g4f/tools/run_tools.py88-110:

How it works:

Searches message content for {"bucket_id": "xyz"} pattern
Calls read_bucket() to load cached text
Replaces pattern with actual document content
Appends citation instructions if bucket was used

The BUCKET_INSTRUCTIONS constant g4f/tools/run_tools.py32-34:

Make sure to add the sources of cites using [[domain]](Url) notation
after the reference. Example: [[a-z0-9.]](http://example.com)

Bucket Content Reading

The read_bucket() function efficiently streams cached content g4f/tools/files.py246-261:

Cache file priority:

spacy_NNNN.cache: Refined/summarized chunks (if available)
plain_NNNN.cache: Split chunks (if spacy not available)
plain.cache: Full cached text (single file, legacy)

This design supports chunked processing for large documents while maintaining backwards compatibility.

Bucket Management Endpoints

Four endpoints manage bucket operations g4f/gui/server/backend_api.py386-505:

1. Upload Files

POST /backend-api/v2/files/<bucket_id>

Accepts multipart file uploads, processes with MarkItDown, generates thumbnails. See Media Upload & Download Flows

2. Stream Content

GET /backend-api/v2/files/<bucket_id>/stream

Streams extracted text with SSE events g4f/gui/server/backend_api.py386-388:

data: {"action": "download", "count": 5}
data: {"action": "load", "size": 12345}
data: {"action": "refine", "size": 8192}
data: {"action": "media", "filename": "photo.jpg"}
data: {"action": "done", "size": 12345}

3. Get/Delete Bucket

GET /backend-api/v2/files/<bucket_id>
DELETE /backend-api/v2/files/<bucket_id>

GET returns plain text or SSE stream. DELETE removes entire bucket directory g4f/gui/server/backend_api.py390-411

4. Get Media

GET /files/<bucket_id>/media/<filename>
GET /files/<bucket_id>/thumbnail/<filename>

Serves uploaded media and thumbnails, with fallback to source URL g4f/gui/server/backend_api.py487-504

Bucket Content Streaming

The get_streaming() function orchestrates the complete flow g4f/tools/files.py559-568:

Bucket Content Streaming Flow

The system efficiently handles both initial processing and cached retrieval, with optional spacy refinement for large documents.

Sources: g4f/tools/run_tools.py32-146 g4f/tools/files.py149-574 g4f/gui/server/backend_api.py386-505

Summary

The media and file processing subsystem provides comprehensive handling of images, audio, video, and documents throughout the g4f platform:

Media Type System: Detects and validates 40+ file formats using extensions, magic numbers, and data URIs
Response Types: Structured classes (ImageResponse, AudioResponse, VideoResponse) with HTML/markdown rendering
Image Processing: PIL-based pipeline with EXIF correction, format conversion, and thumbnail generation
Document Processing: Extracts text from PDFs, DOCX, XLSX, HTML, and more using MarkItDown and format-specific libraries
Storage: Organized directory structure with timestamp-based filenames, caching, and source URL tracking
Upload/Download: Supports bucket uploads, automatic URL fetching, provider-specific protocols, and parallel downloads
Bucket System: Tool-based file context injection with regex replacement and citation instructions

This infrastructure enables seamless media generation, document understanding, and file management across all interfaces (CLI, GUI, API, MCP).

Media & File Processing

Relevant source files

Media Type System

The media type system provides format detection, validation, and type classification for images, audio, and video files. It supports both file-based and data URI-based media inputs.

Supported Formats

The system maintains mappings between file extensions and MIME types:

Category	Extensions	MIME Types
Images	`jpeg`, `jpg`, `png`, `gif`, `webp`	`image/jpeg`, `image/png`, `image/gif`, `image/webp`
Audio	`wav`, `mp3`, `flac`, `opus`, `ogg`, `m4a`	`audio/wav`, `audio/mpeg`, `audio/flac`, `audio/opus`, `audio/ogg`, `audio/m4a`
Video	`mkv`, `webm`, `mp4`	`video/x-matroska`, `video/webm`, `video/mp4`

The bidirectional mapping is defined in EXTENSIONS_MAP and MEDIA_TYPE_MAP g4f/image/__init__.py24-45

Format Detection

Format Detection Flow

Sources: g4f/image/__init__.py85-201

The system detects media types through multiple strategies:

Extension-based: get_extension() extracts the file extension and looks it up in EXTENSIONS_MAP
Magic number: is_accepted_format() reads file headers to identify format (e.g., \xFF\xD8\xFF for JPEG)
Data URI parsing: is_data_uri_an_image() validates and extracts MIME type from data URIs
Comprehensive detection: detect_file_type() uses magic numbers to detect 50+ file types g4f/image/__init__.py204-318

Media Input Handling

The ImageType type accepts multiple input formats, which are normalized through helper functions:

Media Conversion Functions

Sources: g4f/image/__init__.py47-475

Key conversion functions:

to_image(): Converts any input to a PIL Image object, with SVG support via cairosvg g4f/image/__init__.py47-83
to_bytes(): Converts any input to raw bytes, handling URLs, data URIs, paths, and PIL Images g4f/image/__init__.py364-412
to_data_uri(): Converts to base64-encoded data URI format g4f/image/__init__.py413-418
to_input_audio(): Converts audio to OpenAI input format with base64 data and format g4f/image/__init__.py420-437

Response Type Hierarchy

The response type system provides structured representations of media outputs from providers, with specialized classes for different media types.

Response Type Architecture

Response Type Class Hierarchy

Sources: g4f/providers/response.py122-458

MediaResponse Base Class

MediaResponse is the base class for image and video responses g4f/providers/response.py399-419:

Key Features:

URLs: Single URL string or list of URLs
Alt text: Description/prompt used to generate the media
Options dictionary: Stores metadata like cookies, headers, width, height, thumbnails
get_list(): Normalizes URLs to a list format
get(key, default): Retrieves options with fallback

The options dictionary commonly contains:

cookies: Cookies needed to access the media URL
headers: HTTP headers for authenticated requests
width, height: Media dimensions
preview: Preview/thumbnail URL
source_url: Original source URL before copying

ImageResponse

ImageResponse extends MediaResponse with image-specific rendering g4f/providers/response.py420-433:

Rendering modes:

HTML with metadata: When width and height are available, generates clickable thumbnails with full image links
Markdown: Falls back to standard markdown image syntax with optional preview URLs

The to_string() method intelligently selects the format based on available metadata.

VideoResponse

VideoResponse handles video content with optional poster images g4f/providers/response.py434-445:

Supports preview/poster images for video thumbnails when available in options.

AudioResponse

AudioResponse represents audio content with optional transcription g4f/providers/response.py349-370:

Key Features:

Data: Base64-encoded audio or /media/ URL path
Transcript: Optional text transcription of the audio
to_uri(): Converts to data URI format or returns path
HTML rendering: Generates <audio controls> element with optional transcript text

ImagePreview

ImagePreview is a hidden response type that extends ImageResponse but doesn't display in final output g4f/providers/response.py447-448 Used during generation progress updates.

Sources: g4f/providers/response.py122-466

Image Processing Pipeline

The image processing pipeline handles image conversion, orientation correction, thumbnail generation, and format validation.

Image Processing Flow

Image Processing Pipeline

Sources: g4f/image/__init__.py334-363

process_image() Function

The process_image() function handles comprehensive image processing g4f/image/__init__.py334-363:

Processing steps:

EXIF orientation correction: Uses ImageOps.exif_transpose() to fix rotated images
Mode handling:
- RGBA: Preserves transparency (commented code shows white background alternative)
- Non-RGB: Converts to RGB for JPEG compatibility
Thumbnailing: Resizes image to fit within bounds while maintaining aspect ratio
EXIF stripping: Saves with exif=b"" to remove metadata and reduce file size
Return: Returns original size tuple if saving, otherwise returns processed Image

Aspect Ratio Handling

The use_aspect_ratio() function calculates width and height from aspect ratio strings g4f/image/__init__.py439-453:

Maps common aspect ratios to pixel dimensions via get_width_height() g4f/image/__init__.py454-466:

"1:1": 1024×1024 (square)
"16:9": 832×480 (widescreen)
"9:16": 480×832 (portrait)

Used extensively in provider implementations like PollinationsAI g4f/Provider/PollinationsAI.py356-363

Thumbnail Generation in Upload Flow

During file upload, the backend API generates thumbnails for images g4f/gui/server/backend_api.py452-464:

Opens image with PIL
Extracts original dimensions
Calls process_image() to generate thumbnail
Saves to bucket_dir/thumbnail/ directory
Returns dimensions for client-side rendering

Sources: g4f/image/__init__.py334-475 g4f/gui/server/backend_api.py448-464

File Processing & Document Handling

The file processing subsystem extracts text content from various document formats for use as context in conversations.

Supported File Types

Supported Document Types

Sources: g4f/tools/files.py84-122

Document Processing Flow

File Upload and Processing Sequence

Sources: g4f/gui/server/backend_api.py413-486 g4f/tools/files.py149-213

MarkItDown Integration

The system uses MarkItDown for universal document conversion g4f/gui/server/backend_api.py429-438:

MarkItDown capabilities:

Converts Office documents (DOCX, XLSX, PPTX)
Extracts text from PDFs
Processes images with OCR (when language parameter provided)
Handles audio transcription
Converts HTML and other formats to markdown

Converted content is saved as .md files for efficient caching.

Format-Specific Extraction

The stream_read_files() function provides format-specific extraction g4f/tools/files.py149-213:

PDF extraction (three libraries, fallback order):

pdfplumber: Page-by-page extraction with layout preservation g4f/tools/files.py183-186
PyPDF2: Basic text extraction g4f/tools/files.py175-182
pdfminer: High-quality extraction with extract_text() g4f/tools/files.py187-188

DOCX extraction (two libraries):

python-docx: Paragraph-by-paragraph extraction g4f/tools/files.py189-192
docx2txt: Single-pass extraction g4f/tools/files.py193-194

XLSX extraction:

Uses pandas to read Excel files
Iterates rows as tuples, joins cells with spaces g4f/tools/files.py205-208

HTML extraction:

Uses BeautifulSoup4 with scrape_text() helper
Extracts clean text without HTML tags g4f/tools/files.py209-210

ZIP archives:

Recursively extracts and processes supported files
Cleans up extracted files after processing g4f/tools/files.py157-173

Bucket Text Caching

To optimize repeated access, extracted text is cached g4f/tools/files.py215-244:

Caching Strategy

The cache_stream() function implements streaming cache:

Checks if plain.cache exists
If yes: streams from cache in 4-8KB chunks
If no: writes to temporary file while streaming to client
Renames temp file to plain.cache when complete

Chunking respects markdown code block boundaries to avoid breaking syntax g4f/tools/files.py228-229

For advanced text processing, the system supports spacy-based refinement g4f/tools/files.py124-141:

This extracts the two longest sentences per page as a summary. The refined text is cached separately in spacy_NNNN.cache files g4f/tools/files.py262-285

Sources: g4f/tools/files.py1-574 g4f/gui/server/backend_api.py413-486 g4f/integration/markitdown.py

Media Storage

The media storage subsystem manages persistent storage of generated and uploaded media files with organized directory structures and filename generation.

Storage Directory Structure

Storage Directory Structure

Sources: g4f/image/copy_images.py22-30 g4f/files.py

Storage Locations

The system uses multiple storage directories g4f/image/copy_images.py22-30:

generated_images/: Legacy directory, checked first for backwards compatibility
generated_media/: Primary storage for all generated media
har_and_cookies/.buckets/<bucket_id>/: Bucket-specific storage for uploads

The get_media_dir() function returns the appropriate directory, preferring generated_images/ if it exists.

Filename Generation

Generated filenames follow a structured format g4f/image/copy_images.py110-117:

{timestamp}_{tags}_{alt}_{hash}{extension}

Components:

Timestamp: int(time.time()) - Unix timestamp for uniqueness and sorting
Tags: Concatenated with + separator (model name, aspect ratio, etc.)
Alt text: Sanitized prompt/description
Hash: First 16 chars of SHA256 hash (image URL or timestamp)
Extension: File format (.jpg, .png, .mp3, .mp4, etc.)

All components are sanitized with secure_filename() to ensure filesystem safety g4f/tools/files.py

Copy and Download Flow

Media Copy and Download Flow

Sources: g4f/image/copy_images.py119-216

copy_media() Function

The copy_media() function is the core media storage utility g4f/image/copy_images.py119-216:

Features:

Parallel downloads: Uses asyncio.gather() for concurrent processing
Cookie/header forwarding: Passes authentication to aiohttp session
Timestamp updates: Updates filename with Last-Modified header for caching
Format detection: Validates content-type and uses magic numbers as fallback
Source URL preservation: Optionally appends original URL as query parameter
Target path support: Can save to specific paths (used for thumbnails)
Return options: Returns URI only or (uri, target_path) tuple

The function handles errors gracefully, returning the original URL if download fails.

Media Serving Endpoints

Media files are served through Flask routes g4f/gui/server/backend_api.py271-283:

All routes map to serve_images() which calls send_from_directory() with the media directory g4f/gui/server/api.py148-150

Source URL Tracking

When copying media, the system can track original URLs via query parameters g4f/image/copy_images.py204:

/media/filename.jpg?url=https%3A%2F%2Foriginal.com%2Fimage.jpg

The bucket file serving endpoint extracts and redirects to source URLs when local files don't exist g4f/gui/server/backend_api.py501-504:

This enables lazy loading: media is served from original URLs until explicitly downloaded.

Sources: g4f/image/copy_images.py1-216 g4f/gui/server/backend_api.py148-504

Media Upload & Download Flows

The system supports multiple media upload and download patterns, including bucket-based uploads for conversation context and provider-specific upload protocols.

Bucket Upload Flow

Bucket File Upload Sequence

Sources: g4f/gui/server/backend_api.py413-486

Bucket Upload Endpoint

The bucket upload endpoint handles file uploads with automatic conversion g4f/gui/server/backend_api.py413-486:

Request:

POST /backend-api/v2/files/<bucket_id>
Content-Type: multipart/form-data

files: [File, File, ...]
x-recognition-language: en  (optional, for OCR)

Response:

Processing steps:

Sanitize bucket_id and create directory structure
For each uploaded file:
- Save to temporary location with get_tempfile()
- If not .md, .json, or .zip: attempt MarkItDown conversion
- If media: save to media/, generate thumbnail in thumbnail/
- If document: save original to bucket root, save converted to .md
Write files.txt with list of document filenames
Return metadata including media dimensions

Bucket Download Flow

The system supports automatic URL downloading for bucket context g4f/tools/files.py413-489:

Bucket Download Flow

Sources: g4f/tools/files.py413-527

downloads.json Format

Clients can upload a downloads.json file to trigger automatic URL fetching g4f/tools/files.py490-502:

Parameters:

url/urls: Single URL or list of URLs to download
max_depth: Depth for link crawling (HTML pages only)
timeout: Request timeout in seconds
proxy: Proxy server URL

The download_urls() function fetches URLs concurrently g4f/tools/files.py413-489:

MarkItDown conversion: Attempts URL-to-text conversion first
Direct download: Falls back to HTTP GET with streaming
Link extraction: For HTML at max_depth > 0, extracts and follows links
File naming: Uses get_filename_from_url() with URL hash
Content validation: Checks is_allowed_extension() and supports_filename()
Canonical links: Injects canonical link into HTML for source tracking

Provider Upload Protocols

Providers implement different upload patterns for media inputs:

Provider-Specific Upload Patterns

Media Rendering in Tools

The render_messages() function converts media references to provider-specific formats g4f/tools/media.py:

URL detection: Checks for HTTP URLs, file paths, bucket references
Format conversion: Calls to_data_uri() or to_bytes() as needed
Vision message format: Constructs OpenAI-compatible message with image_url or inline content
Audio handling: Uses to_input_audio() for audio files

Used extensively in PollinationsAI g4f/Provider/PollinationsAI.py475 and other providers to normalize media inputs.

Client-Side Image Generation

The client library handles image generation with automatic downloading g4f/client/__init__.py392-434:

Response format options:

None (default): Downloads media and returns local /media/ paths
"url": Returns original URLs without downloading
"b64_json": Downloads and encodes as base64 strings

The _process_image_response() method handles format conversion and downloading g4f/client/__init__.py550-588

Sources: g4f/gui/server/backend_api.py413-486 g4f/tools/files.py413-527 g4f/client/__init__.py392-588 g4f/Provider/PollinationsAI.py321-438

Bucket System

The bucket system provides a mechanism for injecting file context into conversations, allowing uploaded documents and media to be referenced in prompts.

Bucket Architecture

Bucket System Architecture

Sources: g4f/tools/run_tools.py88-110 g4f/tools/files.py246-261 g4f/gui/server/backend_api.py386-505

Bucket Tool Handler

The bucket tool is one of the built-in tool handlers g4f/tools/run_tools.py88-110:

How it works:

Searches message content for {"bucket_id": "xyz"} pattern
Calls read_bucket() to load cached text
Replaces pattern with actual document content
Appends citation instructions if bucket was used

The BUCKET_INSTRUCTIONS constant g4f/tools/run_tools.py32-34:

Make sure to add the sources of cites using [[domain]](Url) notation
after the reference. Example: [[a-z0-9.]](http://example.com)

Bucket Content Reading

The read_bucket() function efficiently streams cached content g4f/tools/files.py246-261:

Cache file priority:

spacy_NNNN.cache: Refined/summarized chunks (if available)
plain_NNNN.cache: Split chunks (if spacy not available)
plain.cache: Full cached text (single file, legacy)

This design supports chunked processing for large documents while maintaining backwards compatibility.

Bucket Management Endpoints

Four endpoints manage bucket operations g4f/gui/server/backend_api.py386-505:

1. Upload Files

POST /backend-api/v2/files/<bucket_id>

Accepts multipart file uploads, processes with MarkItDown, generates thumbnails. See Media Upload & Download Flows

2. Stream Content

GET /backend-api/v2/files/<bucket_id>/stream

Streams extracted text with SSE events g4f/gui/server/backend_api.py386-388:

data: {"action": "download", "count": 5}
data: {"action": "load", "size": 12345}
data: {"action": "refine", "size": 8192}
data: {"action": "media", "filename": "photo.jpg"}
data: {"action": "done", "size": 12345}

3. Get/Delete Bucket

GET /backend-api/v2/files/<bucket_id>
DELETE /backend-api/v2/files/<bucket_id>

GET returns plain text or SSE stream. DELETE removes entire bucket directory g4f/gui/server/backend_api.py390-411

4. Get Media

GET /files/<bucket_id>/media/<filename>
GET /files/<bucket_id>/thumbnail/<filename>

Serves uploaded media and thumbnails, with fallback to source URL g4f/gui/server/backend_api.py487-504

Bucket Content Streaming

The get_streaming() function orchestrates the complete flow g4f/tools/files.py559-568:

Bucket Content Streaming Flow

The system efficiently handles both initial processing and cached retrieval, with optional spacy refinement for large documents.

Sources: g4f/tools/run_tools.py32-146 g4f/tools/files.py149-574 g4f/gui/server/backend_api.py386-505

Summary

The media and file processing subsystem provides comprehensive handling of images, audio, video, and documents throughout the g4f platform:

Media Type System: Detects and validates 40+ file formats using extensions, magic numbers, and data URIs
Response Types: Structured classes (ImageResponse, AudioResponse, VideoResponse) with HTML/markdown rendering
Image Processing: PIL-based pipeline with EXIF correction, format conversion, and thumbnail generation
Document Processing: Extracts text from PDFs, DOCX, XLSX, HTML, and more using MarkItDown and format-specific libraries
Storage: Organized directory structure with timestamp-based filenames, caching, and source URL tracking
Upload/Download: Supports bucket uploads, automatic URL fetching, provider-specific protocols, and parallel downloads
Bucket System: Tool-based file context injection with regex replacement and citation instructions

This infrastructure enables seamless media generation, document understanding, and file management across all interfaces (CLI, GUI, API, MCP).

Media & File Processing

Media Type System

Supported Formats

Format Detection

Media Input Handling

Response Type Hierarchy

Response Type Architecture

MediaResponse Base Class

ImageResponse

VideoResponse

AudioResponse

ImagePreview

Image Processing Pipeline

Image Processing Flow

process_image() Function

Aspect Ratio Handling

Thumbnail Generation in Upload Flow

File Processing & Document Handling

Supported File Types

Document Processing Flow

MarkItDown Integration

Format-Specific Extraction

Bucket Text Caching

Spacy Refinement (Optional)

Media Storage

Storage Directory Structure

Storage Locations

Filename Generation

Copy and Download Flow

copy_media() Function

Media Serving Endpoints

Source URL Tracking

Media Upload & Download Flows

Bucket Upload Flow

Bucket Upload Endpoint

Bucket Download Flow

downloads.json Format

Provider Upload Protocols

Media Rendering in Tools

Client-Side Image Generation

Bucket System

Bucket Architecture

Bucket Tool Handler

Bucket Content Reading

Bucket Management Endpoints

1. Upload Files

2. Stream Content

3. Get/Delete Bucket

4. Get Media

Bucket Content Streaming

Summary

On this page

Media & File Processing

Media Type System

Supported Formats

Format Detection

Media Input Handling

Response Type Hierarchy

Response Type Architecture

MediaResponse Base Class

ImageResponse

VideoResponse

AudioResponse

ImagePreview

Image Processing Pipeline

Image Processing Flow

process_image() Function

Aspect Ratio Handling

Thumbnail Generation in Upload Flow

File Processing & Document Handling

Supported File Types

Document Processing Flow

MarkItDown Integration

Format-Specific Extraction

Bucket Text Caching

Spacy Refinement (Optional)

Media Storage

Storage Directory Structure

Storage Locations

Filename Generation