Local Inference Providers

Relevant source files

Purpose

This page documents local inference providers that run AI models on local infrastructure rather than cloud APIs. The primary local provider is Ollama, located at g4f/Provider/local/Ollama.py which supports both locally-installed models and remote Ollama registry models through a dual-mode architecture. For cloud-based OpenAI-compatible providers, see OpenAI-Compatible Providers. For implementing new local providers, see Creating Custom Providers.

Ollama Provider Architecture

The Ollama class (g4f/Provider/local/Ollama.py13-99) extends OpenaiTemplate (g4f/Provider/template/OpenaiTemplate.py17) to provide local model inference capabilities. Unlike cloud providers that require API keys and external network calls, Ollama connects to a locally-running Ollama server instance.

Class Structure

Key Attributes:

Attribute	Value	Purpose
`label`	`"Ollama 🦙"`	Display name in provider selection
`base_url`	`"https://g4f.space/api/ollama"`	Backup URL for remote models
`needs_auth`	`False`	No authentication required
`working`	`True`	Provider is operational
`active_by_default`	`True`	Enabled in provider selection
`local_models`	`[]`	List of locally-installed models
`model_aliases`	`{"gpt-oss-120b": "gpt-oss:120b", ...}`	Model name mappings

Sources: g4f/Provider/local/Ollama.py13-25 g4f/Provider/template/OpenaiTemplate.py17-33

Dual-Mode Architecture

The Ollama provider implements two distinct execution paths:

Local Mode (g4f/Provider/local/Ollama.py68-89): For models in cls.local_models, uses Ollama's native /api/chat endpoint
Remote Mode (g4f/Provider/local/Ollama.py91-99): For other models, delegates to OpenaiTemplate with backup_url pointing to g4f.space/api/ollama

Sources: g4f/Provider/local/Ollama.py54-99

Model Discovery System

The get_models() method (g4f/Provider/local/Ollama.py27-52) implements a two-phase discovery system that queries both remote Ollama registry and local Ollama server to populate the available models list.

Discovery Flow

Ollama.get_models() Method Flow:

Sources: g4f/Provider/local/Ollama.py27-52

Model Lists

The provider maintains two separate model lists:

List	Variable	Population Source	Purpose
Remote Models	`cls.models`	`https://ollama.com/api/tags`	Models available via Ollama registry
Local Models	`cls.local_models`	`http://{host}:{port}/api/tags`	Models installed on local Ollama server
Combined	`cls.models` (updated)	Both sources merged	Total available models for provider selection

Code Implementation (g4f/Provider/local/Ollama.py33-51):

Sources: g4f/Provider/local/Ollama.py27-52

Local Model Execution Path

When model in cls.local_models evaluates to True, the provider bypasses the OpenAI-compatible API and uses Ollama's native /api/chat endpoint.

Request Construction

Base URL Derivation (g4f/Provider/local/Ollama.py64-68):

Endpoint Transformation (g4f/Provider/local/Ollama.py70-73):

The method replaces the OpenAI-style /v1 path with Ollama's native /api/chat:

This transforms http://localhost:11434/v1 → http://localhost:11434/api/chat.

Sources: g4f/Provider/local/Ollama.py64-73

Response Processing

Native Ollama Format:

Unlike OpenAI's choices[0].delta.content structure, Ollama returns a custom format (g4f/Provider/local/Ollama.py76-89):

Streaming Response Processing:

Response Type Conversion (g4f/Provider/local/Ollama.py79-89):

Ollama Field	g4f Response Type	Code
`message.thinking`	`Reasoning`	`yield Reasoning(thinking)`
`message.content`	Plain string	`yield content`
`prompt_eval_count` + `eval_count`	`Usage`	`yield Usage(prompt_tokens=..., completion_tokens=..., total_tokens=...)`

Sources: g4f/Provider/local/Ollama.py76-89 g4f/providers/response.py g4f/providers/response.py

Remote Model Fallback Path

When model not in cls.local_models, the provider delegates to the parent OpenaiTemplate.create_async_generator() method (g4f/Provider/local/Ollama.py91-99).

Fallback Mechanism

Code Path (g4f/Provider/local/Ollama.py90-99):

Request Flow Comparison

Local vs Remote Execution:

Aspect	Local Path	Remote Path
Condition	`model in cls.local_models`	`model not in cls.local_models`
Method	Custom implementation	`super().create_async_generator()`
Base URL	`http://{OLLAMA_HOST}:{OLLAMA_PORT}`	`https://g4f.space/api/ollama`
Endpoint	`/api/chat`	`/v1/chat/completions`
Protocol	Ollama native JSON streaming	OpenAI-compatible SSE
Response Format	`{"message": {"thinking": ..., "content": ...}}`	`{"choices": [{"delta": {"content": ...}}]}`
Response Handler	Custom parser (g4f/Provider/local/Ollama.py76-89)	`OpenaiTemplate.read_response()` (g4f/Provider/template/OpenaiTemplate.py174-253)

Sources: g4f/Provider/local/Ollama.py90-99 g4f/Provider/template/OpenaiTemplate.py74-161

Sequence Diagram

Remote Model Request Flow:

Sources: g4f/Provider/local/Ollama.py90-99 g4f/Provider/template/OpenaiTemplate.py74-161

Configuration

Environment Variables

The Ollama provider reads configuration from environment variables to locate the local Ollama server.

Variable	Default	Usage Location	Purpose
`OLLAMA_HOST`	`"localhost"`	g4f/Provider/local/Ollama.py38 g4f/Provider/local/Ollama.py65	Local server hostname or IP
`OLLAMA_PORT`	`"11434"`	g4f/Provider/local/Ollama.py39 g4f/Provider/local/Ollama.py66	Local server port

Usage in Code:

Model Discovery (g4f/Provider/local/Ollama.py38-40):

Request Execution (g4f/Provider/local/Ollama.py65-67):

Model Aliases

The model_aliases dictionary (g4f/Provider/local/Ollama.py22-25) provides convenient name mappings:

This allows users to request gpt-oss-120b (without colon) and have it automatically resolve to gpt-oss:120b (with colon) via the get_model() method inherited from ProviderModelMixin.

Sources: g4f/Provider/local/Ollama.py22-25 g4f/Provider/local/Ollama.py38-40 g4f/Provider/local/Ollama.py65-67

Docker Configuration

When running g4f in Docker, the local Ollama server is typically accessed via host.docker.internal:

This allows the containerized g4f instance to reach the Ollama server running on the host machine. See Docker Deployment for details.

Sources: High-level architecture diagrams

Integration with Provider System

Inheritance Chain

The Ollama class inherits from OpenaiTemplate, gaining access to OpenAI-compatible functionality while overriding specific methods for local execution.

Class Hierarchy:

Inherited Capabilities:

Source Class	Inherited Feature	Purpose
`BaseProvider`	`working`, `url`, `label`	Provider metadata
`AsyncGeneratorProvider`	`create_async_generator()` signature	Async streaming interface
`ProviderModelMixin`	`get_model()`, `model_aliases`	Model name resolution
`RaiseErrorMixin`	`raise_error()`	Error handling
`OpenaiTemplate`	`get_headers()`, response parsing	OpenAI-compatible API utilities

Sources: g4f/Provider/local/Ollama.py13 g4f/Provider/template/OpenaiTemplate.py17 g4f/Provider/base_provider.py

Provider Registry Integration

When Ollama.get_models() is called, the models are registered in the global provider system:

Registration Flow:

The cls.live counter (g4f/Provider/local/Ollama.py35 g4f/Provider/local/Ollama.py48) indicates provider health:

0: Provider not working
1: Remote models available
2: Both remote and local models available

This allows AnyProvider to prefer providers with live > 0 during provider selection. See Provider Selection & AnyProvider for routing logic.

Sources: g4f/Provider/local/Ollama.py27-52 g4f/Provider/template/OpenaiTemplate.py41-72

Usage Examples

Basic Local Model Usage

Using Client API:

If llama2 is in cls.local_models, this routes to http://localhost:11434/api/chat. Otherwise, it routes to https://g4f.space/api/ollama/v1/chat/completions.

Streaming with Reasoning

Local Ollama responses include message.thinking fields that are yielded as Reasoning objects (g4f/Provider/local/Ollama.py79-81).

Remote Registry Models

Custom Ollama Server Location

Using Environment Variables:

This constructs base_url = "http://192.168.1.100:8080/v1" and endpoint http://192.168.1.100:8080/api/chat.

Direct Provider Usage

Bypassing Client Wrapper:

Sources: g4f/client/client.py g4f/Provider/local/Ollama.py54-99

Future Local Providers

The architecture supports additional local providers. The high-level architecture diagram mentions gpt4all as a potential local provider. To implement a new local provider:

Create file in g4f/Provider/local/
Extend OpenaiTemplate or AsyncGeneratorProvider
Implement model discovery from local runtime
Override create_async_generator() for local execution
Set needs_auth = False for authentication-free operation

See Creating Custom Providers for detailed implementation guidance.

Sources: High-level architecture diagrams, g4f/Provider/local/Ollama.py1-99

Local Inference Providers

Relevant source files

Purpose

Ollama Provider Architecture

Class Structure

Key Attributes:

Attribute	Value	Purpose
`label`	`"Ollama 🦙"`	Display name in provider selection
`base_url`	`"https://g4f.space/api/ollama"`	Backup URL for remote models
`needs_auth`	`False`	No authentication required
`working`	`True`	Provider is operational
`active_by_default`	`True`	Enabled in provider selection
`local_models`	`[]`	List of locally-installed models
`model_aliases`	`{"gpt-oss-120b": "gpt-oss:120b", ...}`	Model name mappings

Sources: g4f/Provider/local/Ollama.py13-25 g4f/Provider/template/OpenaiTemplate.py17-33

Dual-Mode Architecture

The Ollama provider implements two distinct execution paths:

Local Mode (g4f/Provider/local/Ollama.py68-89): For models in cls.local_models, uses Ollama's native /api/chat endpoint
Remote Mode (g4f/Provider/local/Ollama.py91-99): For other models, delegates to OpenaiTemplate with backup_url pointing to g4f.space/api/ollama

Sources: g4f/Provider/local/Ollama.py54-99

Model Discovery System

Discovery Flow

Ollama.get_models() Method Flow:

Sources: g4f/Provider/local/Ollama.py27-52

Model Lists

The provider maintains two separate model lists:

List	Variable	Population Source	Purpose
Remote Models	`cls.models`	`https://ollama.com/api/tags`	Models available via Ollama registry
Local Models	`cls.local_models`	`http://{host}:{port}/api/tags`	Models installed on local Ollama server
Combined	`cls.models` (updated)	Both sources merged	Total available models for provider selection

Code Implementation (g4f/Provider/local/Ollama.py33-51):

Sources: g4f/Provider/local/Ollama.py27-52

Local Model Execution Path

When model in cls.local_models evaluates to True, the provider bypasses the OpenAI-compatible API and uses Ollama's native /api/chat endpoint.

Request Construction

Base URL Derivation (g4f/Provider/local/Ollama.py64-68):

Endpoint Transformation (g4f/Provider/local/Ollama.py70-73):

The method replaces the OpenAI-style /v1 path with Ollama's native /api/chat:

This transforms http://localhost:11434/v1 → http://localhost:11434/api/chat.

Sources: g4f/Provider/local/Ollama.py64-73

Response Processing

Native Ollama Format:

Unlike OpenAI's choices[0].delta.content structure, Ollama returns a custom format (g4f/Provider/local/Ollama.py76-89):

Streaming Response Processing:

Response Type Conversion (g4f/Provider/local/Ollama.py79-89):

Ollama Field	g4f Response Type	Code
`message.thinking`	`Reasoning`	`yield Reasoning(thinking)`
`message.content`	Plain string	`yield content`
`prompt_eval_count` + `eval_count`	`Usage`	`yield Usage(prompt_tokens=..., completion_tokens=..., total_tokens=...)`

Sources: g4f/Provider/local/Ollama.py76-89 g4f/providers/response.py g4f/providers/response.py

Remote Model Fallback Path

When model not in cls.local_models, the provider delegates to the parent OpenaiTemplate.create_async_generator() method (g4f/Provider/local/Ollama.py91-99).

Fallback Mechanism

Code Path (g4f/Provider/local/Ollama.py90-99):

Request Flow Comparison

Local vs Remote Execution:

Aspect	Local Path	Remote Path
Condition	`model in cls.local_models`	`model not in cls.local_models`
Method	Custom implementation	`super().create_async_generator()`
Base URL	`http://{OLLAMA_HOST}:{OLLAMA_PORT}`	`https://g4f.space/api/ollama`
Endpoint	`/api/chat`	`/v1/chat/completions`
Protocol	Ollama native JSON streaming	OpenAI-compatible SSE
Response Format	`{"message": {"thinking": ..., "content": ...}}`	`{"choices": [{"delta": {"content": ...}}]}`
Response Handler	Custom parser (g4f/Provider/local/Ollama.py76-89)	`OpenaiTemplate.read_response()` (g4f/Provider/template/OpenaiTemplate.py174-253)

Sources: g4f/Provider/local/Ollama.py90-99 g4f/Provider/template/OpenaiTemplate.py74-161

Sequence Diagram

Remote Model Request Flow:

Sources: g4f/Provider/local/Ollama.py90-99 g4f/Provider/template/OpenaiTemplate.py74-161

Configuration

Environment Variables

The Ollama provider reads configuration from environment variables to locate the local Ollama server.

Variable	Default	Usage Location	Purpose
`OLLAMA_HOST`	`"localhost"`	g4f/Provider/local/Ollama.py38 g4f/Provider/local/Ollama.py65	Local server hostname or IP
`OLLAMA_PORT`	`"11434"`	g4f/Provider/local/Ollama.py39 g4f/Provider/local/Ollama.py66	Local server port

Usage in Code:

Model Discovery (g4f/Provider/local/Ollama.py38-40):

Request Execution (g4f/Provider/local/Ollama.py65-67):

Model Aliases

The model_aliases dictionary (g4f/Provider/local/Ollama.py22-25) provides convenient name mappings:

This allows users to request gpt-oss-120b (without colon) and have it automatically resolve to gpt-oss:120b (with colon) via the get_model() method inherited from ProviderModelMixin.

Sources: g4f/Provider/local/Ollama.py22-25 g4f/Provider/local/Ollama.py38-40 g4f/Provider/local/Ollama.py65-67

Docker Configuration

When running g4f in Docker, the local Ollama server is typically accessed via host.docker.internal:

This allows the containerized g4f instance to reach the Ollama server running on the host machine. See Docker Deployment for details.

Sources: High-level architecture diagrams

Integration with Provider System

Inheritance Chain

The Ollama class inherits from OpenaiTemplate, gaining access to OpenAI-compatible functionality while overriding specific methods for local execution.

Class Hierarchy:

Inherited Capabilities:

Source Class	Inherited Feature	Purpose
`BaseProvider`	`working`, `url`, `label`	Provider metadata
`AsyncGeneratorProvider`	`create_async_generator()` signature	Async streaming interface
`ProviderModelMixin`	`get_model()`, `model_aliases`	Model name resolution
`RaiseErrorMixin`	`raise_error()`	Error handling
`OpenaiTemplate`	`get_headers()`, response parsing	OpenAI-compatible API utilities

Sources: g4f/Provider/local/Ollama.py13 g4f/Provider/template/OpenaiTemplate.py17 g4f/Provider/base_provider.py

Provider Registry Integration

When Ollama.get_models() is called, the models are registered in the global provider system:

Registration Flow:

The cls.live counter (g4f/Provider/local/Ollama.py35 g4f/Provider/local/Ollama.py48) indicates provider health:

0: Provider not working
1: Remote models available
2: Both remote and local models available

This allows AnyProvider to prefer providers with live > 0 during provider selection. See Provider Selection & AnyProvider for routing logic.

Sources: g4f/Provider/local/Ollama.py27-52 g4f/Provider/template/OpenaiTemplate.py41-72

Usage Examples

Basic Local Model Usage

Using Client API:

If llama2 is in cls.local_models, this routes to http://localhost:11434/api/chat. Otherwise, it routes to https://g4f.space/api/ollama/v1/chat/completions.

Streaming with Reasoning

Local Ollama responses include message.thinking fields that are yielded as Reasoning objects (g4f/Provider/local/Ollama.py79-81).

Remote Registry Models

Custom Ollama Server Location

Using Environment Variables:

This constructs base_url = "http://192.168.1.100:8080/v1" and endpoint http://192.168.1.100:8080/api/chat.

Direct Provider Usage

Bypassing Client Wrapper:

Sources: g4f/client/client.py g4f/Provider/local/Ollama.py54-99

Future Local Providers

The architecture supports additional local providers. The high-level architecture diagram mentions gpt4all as a potential local provider. To implement a new local provider:

Create file in g4f/Provider/local/
Extend OpenaiTemplate or AsyncGeneratorProvider
Implement model discovery from local runtime
Override create_async_generator() for local execution
Set needs_auth = False for authentication-free operation

See Creating Custom Providers for detailed implementation guidance.

Sources: High-level architecture diagrams, g4f/Provider/local/Ollama.py1-99

Local Inference Providers

Purpose

Ollama Provider Architecture

Class Structure

Dual-Mode Architecture

Model Discovery System

Discovery Flow

Model Lists

Local Model Execution Path

Request Construction

Response Processing

Remote Model Fallback Path

Fallback Mechanism

Request Flow Comparison

Sequence Diagram

Configuration

Environment Variables

Model Aliases

Docker Configuration

Integration with Provider System

Inheritance Chain

Provider Registry Integration

Usage Examples

Basic Local Model Usage

Streaming with Reasoning

Remote Registry Models

Custom Ollama Server Location

Direct Provider Usage

Future Local Providers

On this page

Local Inference Providers

Purpose

Ollama Provider Architecture

Class Structure

Dual-Mode Architecture

Model Discovery System

Discovery Flow

Model Lists

Local Model Execution Path

Request Construction

Response Processing

Remote Model Fallback Path

Fallback Mechanism

Request Flow Comparison

Sequence Diagram

Configuration

Environment Variables

Model Aliases

Docker Configuration

Integration with Provider System

Inheritance Chain

Provider Registry Integration

Usage Examples

Basic Local Model Usage

Streaming with Reasoning

Remote Registry Models

Custom Ollama Server Location

Direct Provider Usage

Future Local Providers

On this page