This page documents local inference providers that run AI models on local infrastructure rather than cloud APIs. The primary local provider is Ollama, located at g4f/Provider/local/Ollama.py which supports both locally-installed models and remote Ollama registry models through a dual-mode architecture. For cloud-based OpenAI-compatible providers, see OpenAI-Compatible Providers. For implementing new local providers, see Creating Custom Providers.
The Ollama class (g4f/Provider/local/Ollama.py13-99) extends OpenaiTemplate (g4f/Provider/template/OpenaiTemplate.py17) to provide local model inference capabilities. Unlike cloud providers that require API keys and external network calls, Ollama connects to a locally-running Ollama server instance.
Key Attributes:
| Attribute | Value | Purpose |
|---|---|---|
label | "Ollama 🦙" | Display name in provider selection |
base_url | "https://g4f.space/api/ollama" | Backup URL for remote models |
needs_auth | False | No authentication required |
working | True | Provider is operational |
active_by_default | True | Enabled in provider selection |
local_models | [] | List of locally-installed models |
model_aliases | {"gpt-oss-120b": "gpt-oss:120b", ...} | Model name mappings |
Sources: g4f/Provider/local/Ollama.py13-25 g4f/Provider/template/OpenaiTemplate.py17-33
The Ollama provider implements two distinct execution paths:
cls.local_models, uses Ollama's native /api/chat endpointOpenaiTemplate with backup_url pointing to g4f.space/api/ollamaSources: g4f/Provider/local/Ollama.py54-99
The get_models() method (g4f/Provider/local/Ollama.py27-52) implements a two-phase discovery system that queries both remote Ollama registry and local Ollama server to populate the available models list.
Ollama.get_models() Method Flow:
Sources: g4f/Provider/local/Ollama.py27-52
The provider maintains two separate model lists:
| List | Variable | Population Source | Purpose |
|---|---|---|---|
| Remote Models | cls.models | https://ollama.com/api/tags | Models available via Ollama registry |
| Local Models | cls.local_models | http://{host}:{port}/api/tags | Models installed on local Ollama server |
| Combined | cls.models (updated) | Both sources merged | Total available models for provider selection |
Code Implementation (g4f/Provider/local/Ollama.py33-51):
Sources: g4f/Provider/local/Ollama.py27-52
When model in cls.local_models evaluates to True, the provider bypasses the OpenAI-compatible API and uses Ollama's native /api/chat endpoint.
Base URL Derivation (g4f/Provider/local/Ollama.py64-68):
Endpoint Transformation (g4f/Provider/local/Ollama.py70-73):
The method replaces the OpenAI-style /v1 path with Ollama's native /api/chat:
This transforms http://localhost:11434/v1 → http://localhost:11434/api/chat.
Sources: g4f/Provider/local/Ollama.py64-73
Native Ollama Format:
Unlike OpenAI's choices[0].delta.content structure, Ollama returns a custom format (g4f/Provider/local/Ollama.py76-89):
Streaming Response Processing:
Response Type Conversion (g4f/Provider/local/Ollama.py79-89):
| Ollama Field | g4f Response Type | Code |
|---|---|---|
message.thinking | Reasoning | yield Reasoning(thinking) |
message.content | Plain string | yield content |
prompt_eval_count + eval_count | Usage | yield Usage(prompt_tokens=..., completion_tokens=..., total_tokens=...) |
Sources: g4f/Provider/local/Ollama.py76-89 g4f/providers/response.py g4f/providers/response.py
When model not in cls.local_models, the provider delegates to the parent OpenaiTemplate.create_async_generator() method (g4f/Provider/local/Ollama.py91-99).
Code Path (g4f/Provider/local/Ollama.py90-99):
Local vs Remote Execution:
| Aspect | Local Path | Remote Path |
|---|---|---|
| Condition | model in cls.local_models | model not in cls.local_models |
| Method | Custom implementation | super().create_async_generator() |
| Base URL | http://{OLLAMA_HOST}:{OLLAMA_PORT} | https://g4f.space/api/ollama |
| Endpoint | /api/chat | /v1/chat/completions |
| Protocol | Ollama native JSON streaming | OpenAI-compatible SSE |
| Response Format | {"message": {"thinking": ..., "content": ...}} | {"choices": [{"delta": {"content": ...}}]} |
| Response Handler | Custom parser (g4f/Provider/local/Ollama.py76-89) | OpenaiTemplate.read_response() (g4f/Provider/template/OpenaiTemplate.py174-253) |
Sources: g4f/Provider/local/Ollama.py90-99 g4f/Provider/template/OpenaiTemplate.py74-161
Remote Model Request Flow:
Sources: g4f/Provider/local/Ollama.py90-99 g4f/Provider/template/OpenaiTemplate.py74-161
The Ollama provider reads configuration from environment variables to locate the local Ollama server.
| Variable | Default | Usage Location | Purpose |
|---|---|---|---|
OLLAMA_HOST | "localhost" | g4f/Provider/local/Ollama.py38 g4f/Provider/local/Ollama.py65 | Local server hostname or IP |
OLLAMA_PORT | "11434" | g4f/Provider/local/Ollama.py39 g4f/Provider/local/Ollama.py66 | Local server port |
Usage in Code:
Model Discovery (g4f/Provider/local/Ollama.py38-40):
Request Execution (g4f/Provider/local/Ollama.py65-67):
The model_aliases dictionary (g4f/Provider/local/Ollama.py22-25) provides convenient name mappings:
This allows users to request gpt-oss-120b (without colon) and have it automatically resolve to gpt-oss:120b (with colon) via the get_model() method inherited from ProviderModelMixin.
Sources: g4f/Provider/local/Ollama.py22-25 g4f/Provider/local/Ollama.py38-40 g4f/Provider/local/Ollama.py65-67
When running g4f in Docker, the local Ollama server is typically accessed via host.docker.internal:
This allows the containerized g4f instance to reach the Ollama server running on the host machine. See Docker Deployment for details.
Sources: High-level architecture diagrams
The Ollama class inherits from OpenaiTemplate, gaining access to OpenAI-compatible functionality while overriding specific methods for local execution.
Class Hierarchy:
Inherited Capabilities:
| Source Class | Inherited Feature | Purpose |
|---|---|---|
BaseProvider | working, url, label | Provider metadata |
AsyncGeneratorProvider | create_async_generator() signature | Async streaming interface |
ProviderModelMixin | get_model(), model_aliases | Model name resolution |
RaiseErrorMixin | raise_error() | Error handling |
OpenaiTemplate | get_headers(), response parsing | OpenAI-compatible API utilities |
Sources: g4f/Provider/local/Ollama.py13 g4f/Provider/template/OpenaiTemplate.py17 g4f/Provider/base_provider.py
When Ollama.get_models() is called, the models are registered in the global provider system:
Registration Flow:
The cls.live counter (g4f/Provider/local/Ollama.py35 g4f/Provider/local/Ollama.py48) indicates provider health:
0: Provider not working1: Remote models available2: Both remote and local models availableThis allows AnyProvider to prefer providers with live > 0 during provider selection. See Provider Selection & AnyProvider for routing logic.
Sources: g4f/Provider/local/Ollama.py27-52 g4f/Provider/template/OpenaiTemplate.py41-72
Using Client API:
If llama2 is in cls.local_models, this routes to http://localhost:11434/api/chat. Otherwise, it routes to https://g4f.space/api/ollama/v1/chat/completions.
Local Ollama responses include message.thinking fields that are yielded as Reasoning objects (g4f/Provider/local/Ollama.py79-81).
Using Environment Variables:
This constructs base_url = "http://192.168.1.100:8080/v1" and endpoint http://192.168.1.100:8080/api/chat.
Bypassing Client Wrapper:
Sources: g4f/client/client.py g4f/Provider/local/Ollama.py54-99
The architecture supports additional local providers. The high-level architecture diagram mentions gpt4all as a potential local provider. To implement a new local provider:
g4f/Provider/local/OpenaiTemplate or AsyncGeneratorProvidercreate_async_generator() for local executionneeds_auth = False for authentication-free operationSee Creating Custom Providers for detailed implementation guidance.
Sources: High-level architecture diagrams, g4f/Provider/local/Ollama.py1-99
Refresh this wiki