This page documents the langchain-ollama partner package, covering the ChatOllama, OllamaLLM, and OllamaEmbeddings classes, how LangChain messages are converted to Ollama's wire format, model validation, reasoning mode, and tool/structured output support. For general information on how partner packages are structured and tested, see 1.1 and 5.1. For patterns around selecting and swapping providers, see 3.5.
The langchain-ollama package lives at libs/partners/ollama/ and depends on langchain-core>=1.0.0 and the ollama>=0.6.0 Python client library.
Package metadata: libs/partners/ollama/pyproject.toml1-28
The three public classes are exported from the top-level __init__.py:
| Export | Class | Module |
|---|---|---|
ChatOllama | Chat model (messages in, message out) | langchain_ollama.chat_models |
OllamaLLM | Text-completion LLM (string in, string out) | langchain_ollama.llms |
OllamaEmbeddings | Embedding model | langchain_ollama.embeddings |
libs/partners/ollama/langchain_ollama/__init__.py1-42
Internal helpers are in langchain_ollama._utils (validate_model, parse_url_with_auth, merge_auth_headers).
Diagram: langchain-ollama class hierarchy
Sources: libs/partners/ollama/langchain_ollama/chat_models.py260-265 libs/partners/ollama/langchain_ollama/llms.py25-30 libs/partners/ollama/langchain_ollama/embeddings.py19-20
ChatOllama is the primary integration class. It extends BaseChatModel and communicates with a locally running Ollama server.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | required | Ollama model name (e.g. "llama3.1") |
reasoning | bool | str | None | None | Controls reasoning/thinking mode |
validate_model_on_init | bool | False | Validate model exists locally on construction |
temperature | float | None | None | Sampling temperature |
num_predict | int | None | None | Max tokens to generate |
num_ctx | int | None | None | Context window size |
format | Literal["", "json"] | JsonSchemaValue | None | None | Output format constraint |
keep_alive | int | str | None | None | How long to keep model in memory |
base_url | str | None | None | Ollama server URL |
client_kwargs | dict | None | {} | Shared httpx client kwargs |
sync_client_kwargs | dict | None | {} | Sync-only httpx client kwargs |
async_client_kwargs | dict | None | {} | Async-only httpx client kwargs |
stop | list[str] | None | None | Stop tokens |
seed | int | None | None | Random seed for reproducibility |
Sampling-specific options (mirostat, mirostat_eta, mirostat_tau, top_k, top_p, tfs_z, repeat_last_n, repeat_penalty, num_gpu, num_thread) are all None by default and only forwarded to Ollama when explicitly set.
libs/partners/ollama/langchain_ollama/chat_models.py524-718
On construction, _set_clients() (a Pydantic model_validator) creates both a synchronous ollama.Client and an asynchronous ollama.AsyncClient. If validate_model_on_init=True, it immediately calls validate_model() from langchain_ollama._utils.
libs/partners/ollama/langchain_ollama/chat_models.py790-810
The base_url field supports basic auth credentials embedded in the URL (http://user:password@host:port). The parse_url_with_auth() utility strips credentials and injects them as an Authorization: Basic ... header.
libs/partners/ollama/langchain_ollama/_utils.py50-98
Diagram: LangChain to Ollama message conversion
Sources: libs/partners/ollama/langchain_ollama/chat_models.py812-929
The method _convert_messages_to_ollama_messages() iterates over a list of BaseMessage objects and maps them to dictionaries that the ollama client can consume:
content string.image_url content blocks) is extracted into an images list.AIMessage.tool_calls are converted to the OpenAI-compatible format via _lc_tool_call_to_openai_tool_call().ToolMessage.tool_call_id is forwarded as tool_call_id.response_metadata["output_version"] == "v1" are re-serialized via _convert_from_v1_to_ollama() from langchain_ollama._compat before conversion.libs/partners/ollama/langchain_ollama/chat_models.py812-929
_chat_params() combines the converted messages with model-level settings into the dict passed to ollama.Client.chat():
None-valued keys are excluded by default. If the caller passes an explicit options dict, it is used as-is.reasoning field maps to Ollama's think parameter.strict key is stripped since Ollama does not support it.libs/partners/ollama/langchain_ollama/chat_models.py720-788
Diagram: Ollama stream response to LangChain output
Sources: libs/partners/ollama/langchain_ollama/chat_models.py970-1050
Each Ollama chunk contains:
message.content — text outputmessage.tool_calls — tool call objects (parsed by _get_tool_calls_from_response())message.thinking — reasoning text (only present when think is enabled)done_reason — one of stop, length, or loadprompt_eval_count / eval_count — token counts used to build UsageMetadataChunks with done_reason == "load" and empty content are skipped with a warning.
libs/partners/ollama/langchain_ollama/chat_models.py101-115
_get_tool_calls_from_response() extracts tool_calls from the Ollama response. Each tool call goes through _parse_arguments_from_tool_call(), which handles Ollama's inconsistent argument formats:
dict arguments: each value is checked; string values are tried against json.loads, then ast.literal_eval. Metadata fields like functionName that echo the function name are filtered out._parse_json_string(), which raises OutputParserException on failure unless skip=True.libs/partners/ollama/langchain_ollama/chat_models.py118-225
ChatOllama exposes a reasoning parameter that maps to Ollama's think parameter. Its behavior:
reasoning value | think sent to Ollama | LangChain behavior |
|---|---|---|
True | True | Reasoning captured in AIMessage.additional_kwargs["reasoning_content"]; <think> tags absent from content |
False | False | No reasoning content |
None (default) | None | Model default; <think> tags may appear in content if the model uses them by default; reasoning_content not populated |
str (e.g. "low") | "low" | Enables reasoning with a named intensity (model-specific) |
reasoning can be set at construction time or overridden per-call in invoke() / stream() kwargs.
libs/partners/ollama/langchain_ollama/chat_models.py527-544
When reasoning=True, the stream iterator separates the thinking field from content and accumulates it into reasoning_content in additional_kwargs. The content_blocks property on the resulting AIMessage will include blocks with type == "reasoning".
libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_reasoning.py78-119
ChatOllama inherits bind_tools() and with_structured_output() from BaseChatModel.
bind_tools()Tools are converted to the OpenAI tool schema via convert_to_openai_tool() and attached to the tools key in _chat_params().
with_structured_output()Supports two methods:
| Method | Mechanism |
|---|---|
"function_calling" | Binds a tool representing the schema; uses PydanticToolsParser or JsonOutputKeyToolsParser |
"json_schema" | Sets format to the JSON schema on the request; uses PydanticOutputParser or JsonOutputParser |
Input schemas can be Pydantic BaseModel subclasses, TypedDicts, or raw JSON schema dicts.
libs/partners/ollama/langchain_ollama/chat_models.py1030-1200 (approximately; the with_structured_output method)
Note: Ollama does not support tool_choice, so has_tool_choice is False in the standard integration tests. Tool calling with Ollama can occasionally produce arguments as strings instead of numbers or have inconsistent key structures due to upstream issues.
libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_standard.py26-62
OllamaLLM extends BaseLLM and wraps the Ollama /api/generate endpoint for plain text-completion workflows.
ChatOllama| Feature | ChatOllama | OllamaLLM |
|---|---|---|
| Input | list[BaseMessage] | str prompt |
| Output | AIMessage | str |
| Endpoint | /api/chat | /api/generate |
| Tool calling | Yes | No |
reasoning param | bool | str | None | bool | None |
format default | None | "" |
When reasoning=True, the _stream() and _astream() methods check each chunk for a thinking field and place it in generation_info["reasoning_content"]. The full generate() and agenerate() flow aggregates thinking chunks in _stream_with_aggregation() / _astream_with_aggregation() via the generation_info["thinking"] key.
libs/partners/ollama/langchain_ollama/llms.py377-460
OllamaEmbeddings implements the Embeddings protocol by calling the Ollama /api/embed endpoint.
| Method | Description |
|---|---|
embed_documents(texts) | Embeds a list of strings; returns list[list[float]] |
embed_query(text) | Wraps embed_documents for a single string |
aembed_documents(texts) | Async equivalent |
aembed_query(text) | Async equivalent |
The constructor accepts the same model, base_url, validate_model_on_init, client_kwargs, sync_client_kwargs, async_client_kwargs, and sampling option parameters as the other classes. All sampling options are bundled into the options dict passed to Client.embed().
libs/partners/ollama/langchain_ollama/embeddings.py297-332
validate_model() in langchain_ollama._utils calls Client.list() and checks if the provided model name appears in the locally available models (matching exactly or by prefix with a colon tag).
Error handling:
| Exception | Cause | Raised as |
|---|---|---|
ConnectError (httpx) | Ollama server unreachable | ValueError |
ResponseError (ollama) | API-level error | ValueError |
| Model not in list | Model not pulled | ValueError |
These ValueErrors are re-raised by Pydantic as ValidationError when triggered during model_validator on construction.
libs/partners/ollama/langchain_ollama/_utils.py12-47
libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models.py32-58
The base_url field supports embedding credentials in the URL using the userinfo format (http://username:password@host:port). The parse_url_with_auth() utility:
username and password.merge_auth_headers() injects the returned headers into client_kwargs before the ollama.Client and ollama.AsyncClient are constructed.
libs/partners/ollama/langchain_ollama/_utils.py50-114
For different settings per sync and async client, use sync_client_kwargs and async_client_kwargs. The base client_kwargs is merged into both.
Diagram: Full request and response flow through ChatOllama
Sources: libs/partners/ollama/langchain_ollama/chat_models.py720-788 libs/partners/ollama/langchain_ollama/chat_models.py951-969
Unit tests use pytest-socket's --disable-socket flag to prevent network access. Integration tests require a locally running Ollama server.
| Test file | Scope |
|---|---|
tests/unit_tests/test_chat_models.py | ChatOllama unit tests; argument parsing; reasoning param forwarding; load-response handling |
tests/unit_tests/test_embeddings.py | OllamaEmbeddings initialization; options forwarding |
tests/unit_tests/test_llms.py | OllamaLLM initialization; reasoning aggregation |
tests/integration_tests/chat_models/test_chat_models_standard.py | Standard ChatModelIntegrationTests suite |
tests/integration_tests/chat_models/test_chat_models.py | Ollama-specific: structured output, tool streaming, agent loop |
tests/integration_tests/chat_models/test_chat_models_reasoning.py | Reasoning mode behavior across True/False/None |
tests/integration_tests/test_llms.py | OllamaLLM generate/stream/batch; reasoning in stream |
The standard integration test class TestChatOllama (in test_chat_models_standard.py) sets supports_json_mode = True, has_tool_choice = False, and supports_image_inputs = True. Several tool-calling tests are marked xfail due to upstream Ollama inconsistencies with argument types.
libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_standard.py12-62
Default models used in tests:
| Variable | Default value |
|---|---|
OLLAMA_TEST_MODEL | llama3.1 |
OLLAMA_REASONING_TEST_MODEL | deepseek-r1:1.5b |
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.