Ollama Integration

Relevant source files

This page documents the langchain-ollama partner package, covering the ChatOllama, OllamaLLM, and OllamaEmbeddings classes, how LangChain messages are converted to Ollama's wire format, model validation, reasoning mode, and tool/structured output support. For general information on how partner packages are structured and tested, see 1.1 and 5.1. For patterns around selecting and swapping providers, see 3.5.

Package Structure

The langchain-ollama package lives at libs/partners/ollama/ and depends on langchain-core>=1.0.0 and the ollama>=0.6.0 Python client library.

Package metadata: libs/partners/ollama/pyproject.toml1-28

The three public classes are exported from the top-level __init__.py:

Export	Class	Module
`ChatOllama`	Chat model (messages in, message out)	`langchain_ollama.chat_models`
`OllamaLLM`	Text-completion LLM (string in, string out)	`langchain_ollama.llms`
`OllamaEmbeddings`	Embedding model	`langchain_ollama.embeddings`

libs/partners/ollama/langchain_ollama/__init__.py1-42

Internal helpers are in langchain_ollama._utils (validate_model, parse_url_with_auth, merge_auth_headers).

Class Hierarchy

Diagram: langchain-ollama class hierarchy

Sources: libs/partners/ollama/langchain_ollama/chat_models.py260-265 libs/partners/ollama/langchain_ollama/llms.py25-30 libs/partners/ollama/langchain_ollama/embeddings.py19-20

ChatOllama

ChatOllama is the primary integration class. It extends BaseChatModel and communicates with a locally running Ollama server.

Constructor Parameters

Parameter	Type	Default	Description
`model`	`str`	required	Ollama model name (e.g. `"llama3.1"`)
`reasoning`	`bool \| str \| None`	`None`	Controls reasoning/thinking mode
`validate_model_on_init`	`bool`	`False`	Validate model exists locally on construction
`temperature`	`float \| None`	`None`	Sampling temperature
`num_predict`	`int \| None`	`None`	Max tokens to generate
`num_ctx`	`int \| None`	`None`	Context window size
`format`	`Literal["", "json"] \| JsonSchemaValue \| None`	`None`	Output format constraint
`keep_alive`	`int \| str \| None`	`None`	How long to keep model in memory
`base_url`	`str \| None`	`None`	Ollama server URL
`client_kwargs`	`dict \| None`	`{}`	Shared httpx client kwargs
`sync_client_kwargs`	`dict \| None`	`{}`	Sync-only httpx client kwargs
`async_client_kwargs`	`dict \| None`	`{}`	Async-only httpx client kwargs
`stop`	`list[str] \| None`	`None`	Stop tokens
`seed`	`int \| None`	`None`	Random seed for reproducibility

Sampling-specific options (mirostat, mirostat_eta, mirostat_tau, top_k, top_p, tfs_z, repeat_last_n, repeat_penalty, num_gpu, num_thread) are all None by default and only forwarded to Ollama when explicitly set.

libs/partners/ollama/langchain_ollama/chat_models.py524-718

Client Initialization

On construction, _set_clients() (a Pydantic model_validator) creates both a synchronous ollama.Client and an asynchronous ollama.AsyncClient. If validate_model_on_init=True, it immediately calls validate_model() from langchain_ollama._utils.

libs/partners/ollama/langchain_ollama/chat_models.py790-810

The base_url field supports basic auth credentials embedded in the URL (http://user:password@host:port). The parse_url_with_auth() utility strips credentials and injects them as an Authorization: Basic ... header.

libs/partners/ollama/langchain_ollama/_utils.py50-98

Message Format Conversion

Diagram: LangChain to Ollama message conversion

Sources: libs/partners/ollama/langchain_ollama/chat_models.py812-929

The method _convert_messages_to_ollama_messages() iterates over a list of BaseMessage objects and maps them to dictionaries that the ollama client can consume:

Text content is extracted into a content string.
Image data (base64 or image_url content blocks) is extracted into an images list.
AIMessage.tool_calls are converted to the OpenAI-compatible format via _lc_tool_call_to_openai_tool_call().
ToolMessage.tool_call_id is forwarded as tool_call_id.
Messages with response_metadata["output_version"] == "v1" are re-serialized via _convert_from_v1_to_ollama() from langchain_ollama._compat before conversion.

libs/partners/ollama/langchain_ollama/chat_models.py812-929

Assembling Request Parameters

_chat_params() combines the converted messages with model-level settings into the dict passed to ollama.Client.chat():

Options (sampling parameters) are built as a dict; None-valued keys are excluded by default. If the caller passes an explicit options dict, it is used as-is.
The reasoning field maps to Ollama's think parameter.
The strict key is stripped since Ollama does not support it.

libs/partners/ollama/langchain_ollama/chat_models.py720-788

Response Processing

Diagram: Ollama stream response to LangChain output

Sources: libs/partners/ollama/langchain_ollama/chat_models.py970-1050

Each Ollama chunk contains:

message.content — text output
message.tool_calls — tool call objects (parsed by _get_tool_calls_from_response())
message.thinking — reasoning text (only present when think is enabled)
done_reason — one of stop, length, or load
prompt_eval_count / eval_count — token counts used to build UsageMetadata

Chunks with done_reason == "load" and empty content are skipped with a warning.

libs/partners/ollama/langchain_ollama/chat_models.py101-115

Tool Call Parsing

_get_tool_calls_from_response() extracts tool_calls from the Ollama response. Each tool call goes through _parse_arguments_from_tool_call(), which handles Ollama's inconsistent argument formats:

dict arguments: each value is checked; string values are tried against json.loads, then ast.literal_eval. Metadata fields like functionName that echo the function name are filtered out.
Non-dict (raw string) arguments: parsed via _parse_json_string(), which raises OutputParserException on failure unless skip=True.

libs/partners/ollama/langchain_ollama/chat_models.py118-225

Reasoning / Thinking Mode

ChatOllama exposes a reasoning parameter that maps to Ollama's think parameter. Its behavior:

`reasoning` value	`think` sent to Ollama	LangChain behavior
`True`	`True`	Reasoning captured in `AIMessage.additional_kwargs["reasoning_content"]`; `<think>` tags absent from `content`
`False`	`False`	No reasoning content
`None` (default)	`None`	Model default; `<think>` tags may appear in `content` if the model uses them by default; `reasoning_content` not populated
`str` (e.g. `"low"`)	`"low"`	Enables reasoning with a named intensity (model-specific)

reasoning can be set at construction time or overridden per-call in invoke() / stream() kwargs.

libs/partners/ollama/langchain_ollama/chat_models.py527-544

When reasoning=True, the stream iterator separates the thinking field from content and accumulates it into reasoning_content in additional_kwargs. The content_blocks property on the resulting AIMessage will include blocks with type == "reasoning".

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_reasoning.py78-119

Structured Output and Tool Binding

ChatOllama inherits bind_tools() and with_structured_output() from BaseChatModel.

`bind_tools()`

Tools are converted to the OpenAI tool schema via convert_to_openai_tool() and attached to the tools key in _chat_params().

`with_structured_output()`

Supports two methods:

Method	Mechanism
`"function_calling"`	Binds a tool representing the schema; uses `PydanticToolsParser` or `JsonOutputKeyToolsParser`
`"json_schema"`	Sets `format` to the JSON schema on the request; uses `PydanticOutputParser` or `JsonOutputParser`

Input schemas can be Pydantic BaseModel subclasses, TypedDicts, or raw JSON schema dicts.

libs/partners/ollama/langchain_ollama/chat_models.py1030-1200 (approximately; the with_structured_output method)

Note: Ollama does not support tool_choice, so has_tool_choice is False in the standard integration tests. Tool calling with Ollama can occasionally produce arguments as strings instead of numbers or have inconsistent key structures due to upstream issues.

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_standard.py26-62

OllamaLLM

OllamaLLM extends BaseLLM and wraps the Ollama /api/generate endpoint for plain text-completion workflows.

Key differences from `ChatOllama`

Feature	`ChatOllama`	`OllamaLLM`
Input	`list[BaseMessage]`	`str` prompt
Output	`AIMessage`	`str`
Endpoint	`/api/chat`	`/api/generate`
Tool calling	Yes	No
`reasoning` param	`bool \| str \| None`	`bool \| None`
`format` default	`None`	`""`

Reasoning in OllamaLLM

When reasoning=True, the _stream() and _astream() methods check each chunk for a thinking field and place it in generation_info["reasoning_content"]. The full generate() and agenerate() flow aggregates thinking chunks in _stream_with_aggregation() / _astream_with_aggregation() via the generation_info["thinking"] key.

libs/partners/ollama/langchain_ollama/llms.py377-460

OllamaEmbeddings

OllamaEmbeddings implements the Embeddings protocol by calling the Ollama /api/embed endpoint.

Methods

Method	Description
`embed_documents(texts)`	Embeds a list of strings; returns `list[list[float]]`
`embed_query(text)`	Wraps `embed_documents` for a single string
`aembed_documents(texts)`	Async equivalent
`aembed_query(text)`	Async equivalent

The constructor accepts the same model, base_url, validate_model_on_init, client_kwargs, sync_client_kwargs, async_client_kwargs, and sampling option parameters as the other classes. All sampling options are bundled into the options dict passed to Client.embed().

libs/partners/ollama/langchain_ollama/embeddings.py297-332

Model Validation

validate_model() in langchain_ollama._utils calls Client.list() and checks if the provided model name appears in the locally available models (matching exactly or by prefix with a colon tag).

Error handling:

Exception	Cause	Raised as
`ConnectError` (httpx)	Ollama server unreachable	`ValueError`
`ResponseError` (ollama)	API-level error	`ValueError`
Model not in list	Model not pulled	`ValueError`

These ValueErrors are re-raised by Pydantic as ValidationError when triggered during model_validator on construction.

libs/partners/ollama/langchain_ollama/_utils.py12-47

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models.py32-58

Authentication and Client Configuration

The base_url field supports embedding credentials in the URL using the userinfo format (http://username:password@host:port). The parse_url_with_auth() utility:

Parses the URL to extract username and password.
Base64-encodes them as a Basic Auth header.
Returns the cleaned URL (without credentials) and the header dict.

merge_auth_headers() injects the returned headers into client_kwargs before the ollama.Client and ollama.AsyncClient are constructed.

libs/partners/ollama/langchain_ollama/_utils.py50-114

For different settings per sync and async client, use sync_client_kwargs and async_client_kwargs. The base client_kwargs is merged into both.

Request / Response Data Flow

Diagram: Full request and response flow through ChatOllama

Sources: libs/partners/ollama/langchain_ollama/chat_models.py720-788 libs/partners/ollama/langchain_ollama/chat_models.py951-969

Testing

Unit tests use pytest-socket's --disable-socket flag to prevent network access. Integration tests require a locally running Ollama server.

Test file	Scope
`tests/unit_tests/test_chat_models.py`	`ChatOllama` unit tests; argument parsing; reasoning param forwarding; load-response handling
`tests/unit_tests/test_embeddings.py`	`OllamaEmbeddings` initialization; options forwarding
`tests/unit_tests/test_llms.py`	`OllamaLLM` initialization; reasoning aggregation
`tests/integration_tests/chat_models/test_chat_models_standard.py`	Standard `ChatModelIntegrationTests` suite
`tests/integration_tests/chat_models/test_chat_models.py`	Ollama-specific: structured output, tool streaming, agent loop
`tests/integration_tests/chat_models/test_chat_models_reasoning.py`	Reasoning mode behavior across `True`/`False`/`None`
`tests/integration_tests/test_llms.py`	`OllamaLLM` generate/stream/batch; reasoning in stream

The standard integration test class TestChatOllama (in test_chat_models_standard.py) sets supports_json_mode = True, has_tool_choice = False, and supports_image_inputs = True. Several tool-calling tests are marked xfail due to upstream Ollama inconsistencies with argument types.

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_standard.py12-62

Default models used in tests:

Variable	Default value
`OLLAMA_TEST_MODEL`	`llama3.1`
`OLLAMA_REASONING_TEST_MODEL`	`deepseek-r1:1.5b`

libs/partners/ollama/Makefile15-16

Ollama Integration

Relevant source files

Package Structure

The langchain-ollama package lives at libs/partners/ollama/ and depends on langchain-core>=1.0.0 and the ollama>=0.6.0 Python client library.

Package metadata: libs/partners/ollama/pyproject.toml1-28

The three public classes are exported from the top-level __init__.py:

Export	Class	Module
`ChatOllama`	Chat model (messages in, message out)	`langchain_ollama.chat_models`
`OllamaLLM`	Text-completion LLM (string in, string out)	`langchain_ollama.llms`
`OllamaEmbeddings`	Embedding model	`langchain_ollama.embeddings`

libs/partners/ollama/langchain_ollama/__init__.py1-42

Internal helpers are in langchain_ollama._utils (validate_model, parse_url_with_auth, merge_auth_headers).

Class Hierarchy

Diagram: langchain-ollama class hierarchy

Sources: libs/partners/ollama/langchain_ollama/chat_models.py260-265 libs/partners/ollama/langchain_ollama/llms.py25-30 libs/partners/ollama/langchain_ollama/embeddings.py19-20

ChatOllama

ChatOllama is the primary integration class. It extends BaseChatModel and communicates with a locally running Ollama server.

Constructor Parameters

Parameter	Type	Default	Description
`model`	`str`	required	Ollama model name (e.g. `"llama3.1"`)
`reasoning`	`bool \| str \| None`	`None`	Controls reasoning/thinking mode
`validate_model_on_init`	`bool`	`False`	Validate model exists locally on construction
`temperature`	`float \| None`	`None`	Sampling temperature
`num_predict`	`int \| None`	`None`	Max tokens to generate
`num_ctx`	`int \| None`	`None`	Context window size
`format`	`Literal["", "json"] \| JsonSchemaValue \| None`	`None`	Output format constraint
`keep_alive`	`int \| str \| None`	`None`	How long to keep model in memory
`base_url`	`str \| None`	`None`	Ollama server URL
`client_kwargs`	`dict \| None`	`{}`	Shared httpx client kwargs
`sync_client_kwargs`	`dict \| None`	`{}`	Sync-only httpx client kwargs
`async_client_kwargs`	`dict \| None`	`{}`	Async-only httpx client kwargs
`stop`	`list[str] \| None`	`None`	Stop tokens
`seed`	`int \| None`	`None`	Random seed for reproducibility

libs/partners/ollama/langchain_ollama/chat_models.py524-718

Client Initialization

libs/partners/ollama/langchain_ollama/chat_models.py790-810

libs/partners/ollama/langchain_ollama/_utils.py50-98

Message Format Conversion

Diagram: LangChain to Ollama message conversion

Sources: libs/partners/ollama/langchain_ollama/chat_models.py812-929

The method _convert_messages_to_ollama_messages() iterates over a list of BaseMessage objects and maps them to dictionaries that the ollama client can consume:

Text content is extracted into a content string.
Image data (base64 or image_url content blocks) is extracted into an images list.
AIMessage.tool_calls are converted to the OpenAI-compatible format via _lc_tool_call_to_openai_tool_call().
ToolMessage.tool_call_id is forwarded as tool_call_id.
Messages with response_metadata["output_version"] == "v1" are re-serialized via _convert_from_v1_to_ollama() from langchain_ollama._compat before conversion.

libs/partners/ollama/langchain_ollama/chat_models.py812-929

Assembling Request Parameters

_chat_params() combines the converted messages with model-level settings into the dict passed to ollama.Client.chat():

Options (sampling parameters) are built as a dict; None-valued keys are excluded by default. If the caller passes an explicit options dict, it is used as-is.
The reasoning field maps to Ollama's think parameter.
The strict key is stripped since Ollama does not support it.

libs/partners/ollama/langchain_ollama/chat_models.py720-788

Response Processing

Diagram: Ollama stream response to LangChain output

Sources: libs/partners/ollama/langchain_ollama/chat_models.py970-1050

Each Ollama chunk contains:

message.content — text output
message.tool_calls — tool call objects (parsed by _get_tool_calls_from_response())
message.thinking — reasoning text (only present when think is enabled)
done_reason — one of stop, length, or load
prompt_eval_count / eval_count — token counts used to build UsageMetadata

Chunks with done_reason == "load" and empty content are skipped with a warning.

libs/partners/ollama/langchain_ollama/chat_models.py101-115

Tool Call Parsing

dict arguments: each value is checked; string values are tried against json.loads, then ast.literal_eval. Metadata fields like functionName that echo the function name are filtered out.
Non-dict (raw string) arguments: parsed via _parse_json_string(), which raises OutputParserException on failure unless skip=True.

libs/partners/ollama/langchain_ollama/chat_models.py118-225

Reasoning / Thinking Mode

ChatOllama exposes a reasoning parameter that maps to Ollama's think parameter. Its behavior:

`reasoning` value	`think` sent to Ollama	LangChain behavior
`True`	`True`	Reasoning captured in `AIMessage.additional_kwargs["reasoning_content"]`; `<think>` tags absent from `content`
`False`	`False`	No reasoning content
`None` (default)	`None`	Model default; `<think>` tags may appear in `content` if the model uses them by default; `reasoning_content` not populated
`str` (e.g. `"low"`)	`"low"`	Enables reasoning with a named intensity (model-specific)

reasoning can be set at construction time or overridden per-call in invoke() / stream() kwargs.

libs/partners/ollama/langchain_ollama/chat_models.py527-544

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_reasoning.py78-119

Structured Output and Tool Binding

ChatOllama inherits bind_tools() and with_structured_output() from BaseChatModel.

`bind_tools()`

Tools are converted to the OpenAI tool schema via convert_to_openai_tool() and attached to the tools key in _chat_params().

`with_structured_output()`

Supports two methods:

Method	Mechanism
`"function_calling"`	Binds a tool representing the schema; uses `PydanticToolsParser` or `JsonOutputKeyToolsParser`
`"json_schema"`	Sets `format` to the JSON schema on the request; uses `PydanticOutputParser` or `JsonOutputParser`

Input schemas can be Pydantic BaseModel subclasses, TypedDicts, or raw JSON schema dicts.

libs/partners/ollama/langchain_ollama/chat_models.py1030-1200 (approximately; the with_structured_output method)

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_standard.py26-62

OllamaLLM

OllamaLLM extends BaseLLM and wraps the Ollama /api/generate endpoint for plain text-completion workflows.

Key differences from `ChatOllama`

Feature	`ChatOllama`	`OllamaLLM`
Input	`list[BaseMessage]`	`str` prompt
Output	`AIMessage`	`str`
Endpoint	`/api/chat`	`/api/generate`
Tool calling	Yes	No
`reasoning` param	`bool \| str \| None`	`bool \| None`
`format` default	`None`	`""`

Reasoning in OllamaLLM

libs/partners/ollama/langchain_ollama/llms.py377-460

OllamaEmbeddings

OllamaEmbeddings implements the Embeddings protocol by calling the Ollama /api/embed endpoint.

Methods

Method	Description
`embed_documents(texts)`	Embeds a list of strings; returns `list[list[float]]`
`embed_query(text)`	Wraps `embed_documents` for a single string
`aembed_documents(texts)`	Async equivalent
`aembed_query(text)`	Async equivalent

libs/partners/ollama/langchain_ollama/embeddings.py297-332

Model Validation

validate_model() in langchain_ollama._utils calls Client.list() and checks if the provided model name appears in the locally available models (matching exactly or by prefix with a colon tag).

Error handling:

Exception	Cause	Raised as
`ConnectError` (httpx)	Ollama server unreachable	`ValueError`
`ResponseError` (ollama)	API-level error	`ValueError`
Model not in list	Model not pulled	`ValueError`

These ValueErrors are re-raised by Pydantic as ValidationError when triggered during model_validator on construction.

libs/partners/ollama/langchain_ollama/_utils.py12-47

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models.py32-58

Authentication and Client Configuration

The base_url field supports embedding credentials in the URL using the userinfo format (http://username:password@host:port). The parse_url_with_auth() utility:

Parses the URL to extract username and password.
Base64-encodes them as a Basic Auth header.
Returns the cleaned URL (without credentials) and the header dict.

merge_auth_headers() injects the returned headers into client_kwargs before the ollama.Client and ollama.AsyncClient are constructed.

libs/partners/ollama/langchain_ollama/_utils.py50-114

For different settings per sync and async client, use sync_client_kwargs and async_client_kwargs. The base client_kwargs is merged into both.

Request / Response Data Flow

Diagram: Full request and response flow through ChatOllama

Sources: libs/partners/ollama/langchain_ollama/chat_models.py720-788 libs/partners/ollama/langchain_ollama/chat_models.py951-969

Testing

Unit tests use pytest-socket's --disable-socket flag to prevent network access. Integration tests require a locally running Ollama server.

Test file	Scope
`tests/unit_tests/test_chat_models.py`	`ChatOllama` unit tests; argument parsing; reasoning param forwarding; load-response handling
`tests/unit_tests/test_embeddings.py`	`OllamaEmbeddings` initialization; options forwarding
`tests/unit_tests/test_llms.py`	`OllamaLLM` initialization; reasoning aggregation
`tests/integration_tests/chat_models/test_chat_models_standard.py`	Standard `ChatModelIntegrationTests` suite
`tests/integration_tests/chat_models/test_chat_models.py`	Ollama-specific: structured output, tool streaming, agent loop
`tests/integration_tests/chat_models/test_chat_models_reasoning.py`	Reasoning mode behavior across `True`/`False`/`None`
`tests/integration_tests/test_llms.py`	`OllamaLLM` generate/stream/batch; reasoning in stream

libs/partners/ollama/tests/integration_tests/chat_models/test_chat_models_standard.py12-62

Default models used in tests:

Variable	Default value
`OLLAMA_TEST_MODEL`	`llama3.1`
`OLLAMA_REASONING_TEST_MODEL`	`deepseek-r1:1.5b`

libs/partners/ollama/Makefile15-16

Ollama Integration

Package Structure

Class Hierarchy

ChatOllama

Constructor Parameters

Client Initialization

Message Format Conversion

Assembling Request Parameters

Response Processing

Tool Call Parsing

Reasoning / Thinking Mode

Structured Output and Tool Binding

bind_tools()

with_structured_output()

OllamaLLM

Key differences from ChatOllama

Reasoning in OllamaLLM

OllamaEmbeddings

Methods

Model Validation

Authentication and Client Configuration

Request / Response Data Flow

Testing

On this page

Ollama Integration

Package Structure

Class Hierarchy

ChatOllama

Constructor Parameters

Client Initialization

Message Format Conversion

Assembling Request Parameters

Response Processing

Tool Call Parsing

Reasoning / Thinking Mode

Structured Output and Tool Binding

bind_tools()

with_structured_output()

OllamaLLM

Key differences from ChatOllama

Reasoning in OllamaLLM

OllamaEmbeddings

Methods

Model Validation

Authentication and Client Configuration

Request / Response Data Flow

Testing

On this page

`bind_tools()`

`with_structured_output()`

Key differences from `ChatOllama`

`bind_tools()`

`with_structured_output()`

Key differences from `ChatOllama`