This page documents Langflow's file management system: the storage abstraction layer, the file upload API, and the flow components that read and write file data. It covers BaseFileComponent, FileComponent, the Directory component, SaveToFileComponent, and the Docling integration for advanced document parsing.
For chat-session file attachments uploaded through the playground, see Chat Interface. For the REST API surface that exposes file endpoints, see API Endpoints. For the full list of storage-related environment variables, see Environment Variables.
All file I/O in Langflow is routed through the module lfx.base.data.storage_utils. This module provides a backend-agnostic interface used uniformly by file components and the upload API.
Two backends are supported:
| Backend | Description |
|---|---|
| Local filesystem | Default. Files stored in a configured directory on the server host. |
| S3-compatible object storage | Cloud storage, configured via environment variables. |
The public interface of storage_utils used by file components:
| Function | Description |
|---|---|
parse_storage_path(path) | Normalizes a raw path for the active backend |
read_file_bytes(path) | Returns file content as bytes from either backend |
get_file_size(path) | Returns the size of a file at the given storage path |
validate_image_content_type(data) | Validates that raw bytes represent a recognized image format |
Storage Backend Architecture
Sources: src/lfx/src/lfx/base/data/base_file.py12-14 src/lfx/src/lfx/components/files_and_knowledge/file.py23-24
Files are uploaded through the files router registered in the FastAPI application. The router exposes endpoints for uploading, listing, and downloading files scoped to a specific flow. After upload, the backend writes the file to the active storage backend and returns a storage path. This path is then set as the file_path value of the component's FileInput field in the flow definition. At execution time the component calls read_file_bytes(path) to retrieve the content.
See API Endpoints for the complete router listing.
File Component Class Hierarchy
Sources: src/lfx/src/lfx/base/data/base_file.py1-20 src/lfx/src/lfx/components/files_and_knowledge/file.py1-28 src/lfx/src/lfx/components/files_and_knowledge/save_file.py1-15 src/lfx/src/lfx/components/twelvelabs/video_file.py1-5
BaseFileComponent in src/lfx/src/lfx/base/data/base_file.py is the abstract base class for all file-reading components. It provides:
FileInput named path with file_types drawn from TEXT_FILE_TYPESBoolInput named parallel_loading enabling concurrent file processing via parallel_load_data from lfx.base.data.utilsZipFile/is_zipfile; TAR archives via tarfile, both extracted into a TemporaryDirectoryread_file_bytes from storage_utilsget_settings_service()Declared output types: Data, DataFrame, Message
Sources: src/lfx/src/lfx/base/data/base_file.py1-20
FileComponent in src/lfx/src/lfx/components/files_and_knowledge/file.py is the concrete file-reading component available in the flow editor. Its component index hash is 12a5841f1a03.
Key behaviors:
1. Dynamic outputs — The method update_outputs(frontend_node, field_name, file_list) is called when a user selects a file. It inspects the file extension and modifies the component's output port configuration at design time.
2. Docling subprocess isolation — PDF, DOCX, and other rich document formats are parsed by Docling running in a child OS process. This design is explicitly documented in the module header: Docling's native libraries can cause memory growth and state leakage if run in the main server process. The parent process communicates with the child via subprocess, NamedTemporaryFile, and JSON serialization.
3. Standard text parsing — Plain text and structured formats use parse_text_file_to_data from lfx.base.data.utils.
4. Image handling — Image files are validated via validate_image_content_type and returned as binary Message content.
Dynamic output configuration by file type:
| File Type(s) | Output Ports |
|---|---|
csv | DataFrame (structured) + Message (raw) |
json, yaml | Data (parsed) + Message (raw) |
pdf, docx | Data (Docling-parsed) + Message (raw text) |
txt, md, html, etc. | Message (raw text) |
jpg, png, bmp | Message (image data) |
zip, tar | Extracted contents processed per file type |
Sources: src/lfx/src/lfx/components/files_and_knowledge/file.py1-57 src/backend/tests/unit/components/files_and_knowledge/test_file_component.py10-17
File Parsing Pipeline Inside FileComponent
Sources: src/lfx/src/lfx/components/files_and_knowledge/file.py1-57 src/lfx/src/lfx/base/data/base_file.py1-50
The Directory component loads all files from a given filesystem directory path, applying the same per-file parsing as FileComponent. It is registered in the component index with hash 328e6f996926. It outputs a list of Data objects, one per file.
SaveToFileComponent in src/lfx/src/lfx/components/files_and_knowledge/save_file.py writes flow output to a file on the configured storage backend. Its component index hash is 6d0e4842271e.
Accepted input types: Data, DataFrame, Message, AsyncIterator, Iterator
Inputs:
| Input | Input Type | Description |
|---|---|---|
fields | SortableListInput | Ordered list of fields to include in the output |
overwrite | BoolInput | Whether to overwrite an existing file |
format | DropdownInput | Output format (JSON, CSV, text) |
Serialization uses orjson for JSON output and pandas for tabular formats (CSV, Excel). The component imports UploadFile from FastAPI and jsonable_encoder to handle type normalization before writing.
Sources: src/lfx/src/lfx/components/files_and_knowledge/save_file.py1-20 src/backend/tests/unit/components/processing/test_save_file_component.py1-10
Docling provides layout-aware document parsing that recovers reading order, tables, headings, and figures from complex formats such as PDF and DOCX.
Four Docling-related components are registered in the component system:
| Component Class | Registry Hash | Role |
|---|---|---|
DoclingInline | 519d12bd6451 | Runs Docling in-process |
DoclingRemote | 409d771a961e | Delegates to a remote Docling service endpoint |
ChunkDoclingDocument | 49d762d97039 | Splits a parsed Docling document into chunks |
ExportDoclingDocument | 32577a7e396b | Converts a Docling document to a target format (Markdown, HTML, etc.) |
FileComponent uses Docling through subprocess isolation for PDF/DOCX files selected on the component's path field. The standalone DoclingInline and DoclingRemote components are for flows where Docling is the primary ingestion step rather than a fallback path in FileComponent.
Docling Components in a Document Ingestion Flow
Sources: src/lfx/src/lfx/_assets/stable_hash_history.json707-726
FileInput (exported from lfx.io) is the specialized input type used by all file-accepting components. Its distinguishing fields:
| Field | Python type | Description |
|---|---|---|
file_types | list[str] | Allowed file extensions |
file_path | str | Storage path of the uploaded file |
is_list | bool | Whether multiple files are accepted simultaneously |
temp_file | bool | Marks a file as a temporary per-request upload |
The temp_file flag has a specific semantic:
temp_file value | Use case | Example |
|---|---|---|
False | Persistent configuration in a saved flow | FileComponent.path |
True | Per-session upload (chat attachment) | ChatInput.files |
Supported file type constants from lfx.base.data.utils:
TEXT_FILE_TYPES:
csv json pdf txt md mdx yaml yml xml html htm docx py sh sql js ts tsx
IMG_FILE_TYPES:
jpg jpeg png bmp image
Sources: src/lfx/src/lfx/base/data/base_file.py14-18 src/lfx/src/lfx/components/files_and_knowledge/file.py23-25
File Upload to Flow Execution: Sequence Diagram
Sources: src/lfx/src/lfx/base/data/base_file.py12-20 src/lfx/src/lfx/components/files_and_knowledge/file.py22-27
The FileComponent appears in several bundled starter projects:
| Starter Flow | Connection |
|---|---|
| Vector Store RAG | File → SplitText → AstraDB — document ingested for vector search |
| Document Q&A | File → Prompt → LanguageModel — document injected as prompt context |
| Portfolio Website Generator | File → StructuredOutput → Parser — resume JSON parsed into structured data |
In all cases, FileComponent emits a Message output named message via its message_response method.
Sources: src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json63-91 src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json90-118 src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json90-146
Storage backend selection and limits are configured through environment variables. See Environment Variables for the complete catalog. Relevant categories:
| Category | Description |
|---|---|
| Storage type | Selects between local filesystem and S3 |
| S3 settings | Bucket name, region, access key, secret key, optional endpoint URL |
| Upload limits | Maximum file size accepted per upload request |
| Temp file path | Local directory used for temporary file extraction (ZIP/TAR) |
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.