OpenAI-Compatible API Server

Relevant source files

Purpose and Scope

This page documents the FastAPI-based HTTP server that exposes vLLM's inference capabilities through an OpenAI-compatible REST API. It covers the server's entry points, startup sequence, route registration, middleware stack, engine client lifecycle, and multi-process deployment modes.

For documentation on how chat messages and multimodal content are parsed before being dispatched to the engine, see Chat Utilities and Multimodal Input Handling. For structured output and tool-calling integration, see Structured Output Generation and Responses API and Tool Calling. For the underlying engine client and async engine architecture, see Engine Core and Client APIs.

Architecture Overview

High-Level Request Flow

Sources: vllm/entrypoints/cli/serve.py42-134 vllm/entrypoints/openai/api_server.py464-530

CLI Entry Points

The vLLM CLI is defined in vllm/entrypoints/cli/main.py and dispatches subcommands through the CLISubcommand base class. The serve subcommand is registered by ServeSubcommand in vllm/entrypoints/cli/serve.py

The dispatch logic in ServeSubcommand.cmd() examines args.api_server_count:

Condition	Mode	Function
`api_server_count == 1`	Single process	`run_server(args)`
`api_server_count > 1`	Multi-process (data parallel)	`run_multi_api_server(args)`
`api_server_count < 1`	Headless (engine only)	`run_headless(args)`

The default value of api_server_count is derived from data_parallel_size unless overridden.

Sources: vllm/entrypoints/cli/main.py16-79 vllm/entrypoints/cli/serve.py42-134

Server Startup Sequence

The key startup steps performed by run_server_worker() vllm/entrypoints/openai/api_server.py474-530:

setup_server(args) – validates arguments, binds the socket (TCP or Unix domain), sets ulimit, installs a SIGTERM handler.
build_async_engine_client(args) – async context manager that creates the AsyncLLM engine and yields an EngineClient for the lifetime of the server.
engine_client.get_supported_tasks() – queries the engine to discover which task types the loaded model supports.
build_app(args, supported_tasks) – constructs the FastAPI application, registers all route routers, and applies middleware.
init_app_state(engine_client, app.state, args, supported_tasks) – populates app.state with the engine client, serving models, tokenization service, and per-task serving handlers.
serve_http(app, sock=sock, ...) – starts the uvicorn server loop.

Sources: vllm/entrypoints/openai/api_server.py419-530

FastAPI Application Construction (`build_app`)

build_app() vllm/entrypoints/openai/api_server.py158-288 creates the FastAPI instance and conditionally registers routers based on supported_tasks.

Route Registration

Routes registered unconditionally:

Router	Attachment function	Example paths
General serve routes	`register_vllm_serve_api_routers(app)`	`/health`, `/version`, `/metrics`, `/tokenize`, `/detokenize`
Model listing	`register_models_api_router(app)`	`/v1/models`
SageMaker compat	`register_sagemaker_api_router(app, supported_tasks)`	`/invocations`, `/ping`

Routes registered when "generate" in supported_tasks:

Router	Attachment function	Example paths
Text generation	`register_generate_api_routers(app)`	`/v1/completions`, `/v1/chat/completions`
Disaggregated serving	`attach_disagg_router(app)`	Internal disagg endpoints
RLHF	`attach_rlhf_router(app)`	RLHF endpoints
Elastic EP scaling	`elastic_ep_attach_router(app)`	`/scale_elastic_ep`, `/pause`, `/resume`

Routes registered when "transcription" in supported_tasks:

Router	Attachment function	Example paths
Speech to text	`register_speech_to_text_api_router(app)`	`/v1/audio/transcriptions`

Routes registered when "realtime" in supported_tasks:

Router	Attachment function	Example paths
Realtime	`register_realtime_api_router(app)`	WebSocket realtime endpoint

Routes registered for pooling tasks (embeddings, scoring, reranking):

Router	Attachment function	Example paths
Pooling	`register_pooling_api_routers(app, supported_tasks)`	`/v1/embeddings`, `/v1/score`, `/v1/rerank`

Sources: vllm/entrypoints/openai/api_server.py158-288

Middleware Stack

Middleware is applied in this order (last applied runs first):

Middleware	Condition	Description
`CORSMiddleware`	Always	Configurable via `--allowed-origins`, `--allowed-methods`, `--allowed-headers`, `--allow-credentials`
`AuthenticationMiddleware`	`--api-key` or `VLLM_API_KEY` set	Bearer token validation on `/v1/` paths only
`XRequestIdMiddleware`	`--enable-request-id-headers`	Adds `X-Request-Id` header to responses
`ScalingMiddleware`	Always	Checks for scaling state before processing
Custom middleware	`--middleware` args	Loaded dynamically from import paths; can be a class or async function

Exception handlers are registered for HTTPException (http_exception_handler) and RequestValidationError (validation_exception_handler).

Sources: vllm/entrypoints/openai/api_server.py241-288

App State Initialization (`init_app_state`)

init_app_state() vllm/entrypoints/openai/api_server.py291-379 populates app.state with all serving objects. This runs after the engine is ready and before the HTTP server starts accepting requests.

`app.state` field	Type	Description
`engine_client`	`EngineClient`	Connection to the underlying `AsyncLLM`
`vllm_config`	`VllmConfig`	Full engine configuration
`args`	`Namespace`	CLI args
`openai_serving_models`	`OpenAIServingModels`	Handles `/v1/models` and LoRA module registry
`openai_serving_tokenization`	`OpenAIServingTokenization`	Handles `/tokenize`, `/detokenize`
(generate-specific)	`OpenAIServingCompletion`, `OpenAIServingChat`, etc.	Initialized by `init_generate_state()`
`log_stats`	`bool`	Whether to log per-request stats
`enable_server_load_tracking`	`bool`	Whether to track `server_load_metrics`

OpenAIServingModels also calls init_static_loras() to pre-load LoRA adapters listed in --lora-modules.

Sources: vllm/entrypoints/openai/api_server.py291-379

Engine Client Lifecycle

The engine client is managed by build_async_engine_client() and build_async_engine_client_from_engine_args() as async context managers.

Key behaviors:

If VLLM_WORKER_MULTIPROC_METHOD=forkserver, the forkserver is pre-loaded with vllm.v1.engine.async_llm before the engine is started.
client_config (containing input_address, output_address, client_count, client_index) is passed through to AsyncLLM to configure ZMQ connections in multi-server mode.
async_llm.shutdown() is called in the finally block of the context manager regardless of how the server exits.

Sources: vllm/entrypoints/openai/api_server.py69-155

Multi-Server (Data Parallel) Mode

When api_server_count > 1, run_multi_api_server() vllm/entrypoints/cli/serve.py218-291 spawns separate OS processes for each API server, with each connected to a dedicated engine core.

Each worker process receives a client_config dict containing:

Key	Description
`input_address`	ZMQ address for sending requests to the engine
`output_address`	ZMQ address for receiving results from the engine
`client_count`	Total number of API server processes
`client_index`	Index of this process
`stats_update_address`	Optional address for receiving load-balancing stats from `DPCoordinator`

APIServerProcessManager vllm/v1/utils.py159-225 uses multiprocessing.get_context("spawn") so each worker starts in a clean state. A weakref.finalize ensures worker processes are terminated when the manager is garbage-collected.

Sources: vllm/entrypoints/cli/serve.py218-307 vllm/v1/utils.py159-225

CLI Argument Reference

Arguments are defined across two dataclasses in vllm/entrypoints/openai/cli_args.py:

`BaseFrontendArgs`

Contains arguments that do not include host/port/SSL/HTTP-server-specific settings. Used in contexts where a frontend server runs embedded (e.g., Ray Serve integration).

Argument	Type	Default	Description
`--lora-modules`	`list[LoRAModulePath]`	`None`	LoRA adapters in `name=path` or JSON format
`--chat-template`	`str`	`None`	Path or inline chat template
`--chat-template-content-format`	`str`	`"auto"`	`"string"` or `"openai"`
`--response-role`	`str`	`"assistant"`	Role returned in generation prompts
`--enable-auto-tool-choice`	`bool`	`False`	Enable auto tool calling
`--tool-call-parser`	`str`	`None`	Parser for tool call output
`--tool-parser-plugin`	`str`	`""`	Import path for custom tool parser
`--max-log-len`	`int`	`None`	Max prompt chars to log
`--disable-frontend-multiprocessing`	`bool`	`False`	Run frontend in-process with engine

`FrontendArgs` (extends `BaseFrontendArgs`)

Argument	Type	Default	Description
`--host`	`str`	`None`	Bind host
`--port`	`int`	`8000`	Bind port
`--uds`	`str`	`None`	Unix domain socket path (overrides host/port)
`--api-key`	`list[str]`	`None`	Required Bearer tokens
`--allowed-origins`	`list[str]`	`["*"]`	CORS allowed origins
`--allowed-methods`	`list[str]`	`["*"]`	CORS allowed methods
`--allowed-headers`	`list[str]`	`["*"]`	CORS allowed headers
`--ssl-keyfile`	`str`	`None`	TLS key file path
`--ssl-certfile`	`str`	`None`	TLS cert file path
`--middleware`	`list[str]`	`[]`	Additional ASGI middleware import paths
`--uvicorn-log-level`	`str`	`"info"`	Log verbosity for uvicorn
`--root-path`	`str`	`None`	FastAPI `root_path` for proxy deployments
`--disable-fastapi-docs`	`bool`	`False`	Disable Swagger/ReDoc UI
`--enable-request-id-headers`	`bool`	`False`	Emit `X-Request-Id` header

make_arg_parser() vllm/entrypoints/openai/cli_args.py313-350 combines FrontendArgs.add_cli_args() and AsyncEngineArgs.add_cli_args() into a single parser. This is the parser used by both the vllm serve subcommand and the direct python -m vllm.entrypoints.openai.api_server invocation.

Sources: vllm/entrypoints/openai/cli_args.py69-373

Argument Validation

validate_api_server_args() vllm/entrypoints/openai/api_server.py401-416 runs before the engine starts and raises KeyError for:

--enable-auto-tool-choice with an unregistered --tool-call-parser
An unregistered --reasoning-parser (from structured_outputs_config)

validate_parsed_serve_args() vllm/entrypoints/openai/cli_args.py353-365 runs at CLI parse time and raises TypeError for:

--enable-auto-tool-choice without --tool-call-parser
--enable-log-outputs without --enable-log-requests

Tool parsers are registered via ToolParserManager and reasoning parsers via ReasoningParserManager. Plugins are loaded from paths specified in --tool-parser-plugin and --reasoning-parser-plugin.

Sources: vllm/entrypoints/openai/api_server.py401-433 vllm/entrypoints/openai/cli_args.py353-365

Socket Management

setup_server() vllm/entrypoints/openai/api_server.py419-461 binds the socket before the engine is initialized to avoid a race condition with Ray (see GitHub #8204).

Mode	Function	Socket type
TCP (default)	`create_server_socket((host, port))`	`AF_INET` or `AF_INET6`
Unix domain socket (`--uds`)	`create_server_unix_socket(path)`	`AF_UNIX`

Both functions set SO_REUSEADDR and SO_REUSEPORT on TCP sockets, enabling multiple workers to share the same port. set_ulimit() is also called to raise the open-file-descriptor limit so uvicorn doesn't silently drop connections under high concurrency.

Sources: vllm/entrypoints/openai/api_server.py382-461

Security Considerations

The AuthenticationMiddleware only enforces the API key on paths starting with /v1/. Several endpoints on the same HTTP server are not protected by this middleware:

Always unprotected:

/health, /ping, /version, /metrics – operational endpoints
/tokenize, /detokenize – utility endpoints
/invocations – SageMaker-compatible inference (same capability as /v1/completions)
/pause, /resume, /scale_elastic_ep – operational control

Conditionally available (not protected):

/tokenizer_info – only when --enable-tokenizer-info-endpoint; may expose chat templates
/server_info, /collective_rpc, /sleep, /wake_up, /reset_prefix_cache, etc. – only when VLLM_SERVER_DEV_MODE=1; must not be enabled in production

The recommended deployment pattern is to place vLLM behind a reverse proxy that allowlists only the endpoints clients should access.

Sources: docs/usage/security.md112-224

Headless Mode

run_headless() vllm/entrypoints/cli/serve.py137-215 starts engine core processes without any API server. This is used in disaggregated or multi-node setups where the front-end runs on a different host.

--headless implies api_server_count=0.
The vllm_config is created with headless=True.
CoreEngineProcManager launches one engine process per local data-parallel rank.
If node_rank_within_dp > 0 (a non-head node in a multi-node pipeline/tensor-parallel group), a MultiprocExecutor is started instead.

Sources: vllm/entrypoints/cli/serve.py137-215

OpenAI-Compatible API Server

Relevant source files

Purpose and Scope

Architecture Overview

High-Level Request Flow

Sources: vllm/entrypoints/cli/serve.py42-134 vllm/entrypoints/openai/api_server.py464-530

CLI Entry Points

The dispatch logic in ServeSubcommand.cmd() examines args.api_server_count:

Condition	Mode	Function
`api_server_count == 1`	Single process	`run_server(args)`
`api_server_count > 1`	Multi-process (data parallel)	`run_multi_api_server(args)`
`api_server_count < 1`	Headless (engine only)	`run_headless(args)`

The default value of api_server_count is derived from data_parallel_size unless overridden.

Sources: vllm/entrypoints/cli/main.py16-79 vllm/entrypoints/cli/serve.py42-134

Server Startup Sequence

The key startup steps performed by run_server_worker() vllm/entrypoints/openai/api_server.py474-530:

setup_server(args) – validates arguments, binds the socket (TCP or Unix domain), sets ulimit, installs a SIGTERM handler.
build_async_engine_client(args) – async context manager that creates the AsyncLLM engine and yields an EngineClient for the lifetime of the server.
engine_client.get_supported_tasks() – queries the engine to discover which task types the loaded model supports.
build_app(args, supported_tasks) – constructs the FastAPI application, registers all route routers, and applies middleware.
init_app_state(engine_client, app.state, args, supported_tasks) – populates app.state with the engine client, serving models, tokenization service, and per-task serving handlers.
serve_http(app, sock=sock, ...) – starts the uvicorn server loop.

Sources: vllm/entrypoints/openai/api_server.py419-530

FastAPI Application Construction (`build_app`)

build_app() vllm/entrypoints/openai/api_server.py158-288 creates the FastAPI instance and conditionally registers routers based on supported_tasks.

Route Registration

Routes registered unconditionally:

Router	Attachment function	Example paths
General serve routes	`register_vllm_serve_api_routers(app)`	`/health`, `/version`, `/metrics`, `/tokenize`, `/detokenize`
Model listing	`register_models_api_router(app)`	`/v1/models`
SageMaker compat	`register_sagemaker_api_router(app, supported_tasks)`	`/invocations`, `/ping`

Routes registered when "generate" in supported_tasks:

Router	Attachment function	Example paths
Text generation	`register_generate_api_routers(app)`	`/v1/completions`, `/v1/chat/completions`
Disaggregated serving	`attach_disagg_router(app)`	Internal disagg endpoints
RLHF	`attach_rlhf_router(app)`	RLHF endpoints
Elastic EP scaling	`elastic_ep_attach_router(app)`	`/scale_elastic_ep`, `/pause`, `/resume`

Routes registered when "transcription" in supported_tasks:

Router	Attachment function	Example paths
Speech to text	`register_speech_to_text_api_router(app)`	`/v1/audio/transcriptions`

Routes registered when "realtime" in supported_tasks:

Router	Attachment function	Example paths
Realtime	`register_realtime_api_router(app)`	WebSocket realtime endpoint

Routes registered for pooling tasks (embeddings, scoring, reranking):

Router	Attachment function	Example paths
Pooling	`register_pooling_api_routers(app, supported_tasks)`	`/v1/embeddings`, `/v1/score`, `/v1/rerank`

Sources: vllm/entrypoints/openai/api_server.py158-288

Middleware Stack

Middleware is applied in this order (last applied runs first):

Middleware	Condition	Description
`CORSMiddleware`	Always	Configurable via `--allowed-origins`, `--allowed-methods`, `--allowed-headers`, `--allow-credentials`
`AuthenticationMiddleware`	`--api-key` or `VLLM_API_KEY` set	Bearer token validation on `/v1/` paths only
`XRequestIdMiddleware`	`--enable-request-id-headers`	Adds `X-Request-Id` header to responses
`ScalingMiddleware`	Always	Checks for scaling state before processing
Custom middleware	`--middleware` args	Loaded dynamically from import paths; can be a class or async function

Exception handlers are registered for HTTPException (http_exception_handler) and RequestValidationError (validation_exception_handler).

Sources: vllm/entrypoints/openai/api_server.py241-288

App State Initialization (`init_app_state`)

`app.state` field	Type	Description
`engine_client`	`EngineClient`	Connection to the underlying `AsyncLLM`
`vllm_config`	`VllmConfig`	Full engine configuration
`args`	`Namespace`	CLI args
`openai_serving_models`	`OpenAIServingModels`	Handles `/v1/models` and LoRA module registry
`openai_serving_tokenization`	`OpenAIServingTokenization`	Handles `/tokenize`, `/detokenize`
(generate-specific)	`OpenAIServingCompletion`, `OpenAIServingChat`, etc.	Initialized by `init_generate_state()`
`log_stats`	`bool`	Whether to log per-request stats
`enable_server_load_tracking`	`bool`	Whether to track `server_load_metrics`

OpenAIServingModels also calls init_static_loras() to pre-load LoRA adapters listed in --lora-modules.

Sources: vllm/entrypoints/openai/api_server.py291-379

Engine Client Lifecycle

The engine client is managed by build_async_engine_client() and build_async_engine_client_from_engine_args() as async context managers.

Key behaviors:

If VLLM_WORKER_MULTIPROC_METHOD=forkserver, the forkserver is pre-loaded with vllm.v1.engine.async_llm before the engine is started.
client_config (containing input_address, output_address, client_count, client_index) is passed through to AsyncLLM to configure ZMQ connections in multi-server mode.
async_llm.shutdown() is called in the finally block of the context manager regardless of how the server exits.

Sources: vllm/entrypoints/openai/api_server.py69-155

Multi-Server (Data Parallel) Mode

When api_server_count > 1, run_multi_api_server() vllm/entrypoints/cli/serve.py218-291 spawns separate OS processes for each API server, with each connected to a dedicated engine core.

Each worker process receives a client_config dict containing:

Key	Description
`input_address`	ZMQ address for sending requests to the engine
`output_address`	ZMQ address for receiving results from the engine
`client_count`	Total number of API server processes
`client_index`	Index of this process
`stats_update_address`	Optional address for receiving load-balancing stats from `DPCoordinator`

Sources: vllm/entrypoints/cli/serve.py218-307 vllm/v1/utils.py159-225

CLI Argument Reference

Arguments are defined across two dataclasses in vllm/entrypoints/openai/cli_args.py:

`BaseFrontendArgs`

Contains arguments that do not include host/port/SSL/HTTP-server-specific settings. Used in contexts where a frontend server runs embedded (e.g., Ray Serve integration).

Argument	Type	Default	Description
`--lora-modules`	`list[LoRAModulePath]`	`None`	LoRA adapters in `name=path` or JSON format
`--chat-template`	`str`	`None`	Path or inline chat template
`--chat-template-content-format`	`str`	`"auto"`	`"string"` or `"openai"`
`--response-role`	`str`	`"assistant"`	Role returned in generation prompts
`--enable-auto-tool-choice`	`bool`	`False`	Enable auto tool calling
`--tool-call-parser`	`str`	`None`	Parser for tool call output
`--tool-parser-plugin`	`str`	`""`	Import path for custom tool parser
`--max-log-len`	`int`	`None`	Max prompt chars to log
`--disable-frontend-multiprocessing`	`bool`	`False`	Run frontend in-process with engine

`FrontendArgs` (extends `BaseFrontendArgs`)

Argument	Type	Default	Description
`--host`	`str`	`None`	Bind host
`--port`	`int`	`8000`	Bind port
`--uds`	`str`	`None`	Unix domain socket path (overrides host/port)
`--api-key`	`list[str]`	`None`	Required Bearer tokens
`--allowed-origins`	`list[str]`	`["*"]`	CORS allowed origins
`--allowed-methods`	`list[str]`	`["*"]`	CORS allowed methods
`--allowed-headers`	`list[str]`	`["*"]`	CORS allowed headers
`--ssl-keyfile`	`str`	`None`	TLS key file path
`--ssl-certfile`	`str`	`None`	TLS cert file path
`--middleware`	`list[str]`	`[]`	Additional ASGI middleware import paths
`--uvicorn-log-level`	`str`	`"info"`	Log verbosity for uvicorn
`--root-path`	`str`	`None`	FastAPI `root_path` for proxy deployments
`--disable-fastapi-docs`	`bool`	`False`	Disable Swagger/ReDoc UI
`--enable-request-id-headers`	`bool`	`False`	Emit `X-Request-Id` header

Sources: vllm/entrypoints/openai/cli_args.py69-373

Argument Validation

validate_api_server_args() vllm/entrypoints/openai/api_server.py401-416 runs before the engine starts and raises KeyError for:

--enable-auto-tool-choice with an unregistered --tool-call-parser
An unregistered --reasoning-parser (from structured_outputs_config)

validate_parsed_serve_args() vllm/entrypoints/openai/cli_args.py353-365 runs at CLI parse time and raises TypeError for:

--enable-auto-tool-choice without --tool-call-parser
--enable-log-outputs without --enable-log-requests

Sources: vllm/entrypoints/openai/api_server.py401-433 vllm/entrypoints/openai/cli_args.py353-365

Socket Management

setup_server() vllm/entrypoints/openai/api_server.py419-461 binds the socket before the engine is initialized to avoid a race condition with Ray (see GitHub #8204).

Mode	Function	Socket type
TCP (default)	`create_server_socket((host, port))`	`AF_INET` or `AF_INET6`
Unix domain socket (`--uds`)	`create_server_unix_socket(path)`	`AF_UNIX`

Sources: vllm/entrypoints/openai/api_server.py382-461

Security Considerations

The AuthenticationMiddleware only enforces the API key on paths starting with /v1/. Several endpoints on the same HTTP server are not protected by this middleware:

Always unprotected:

/health, /ping, /version, /metrics – operational endpoints
/tokenize, /detokenize – utility endpoints
/invocations – SageMaker-compatible inference (same capability as /v1/completions)
/pause, /resume, /scale_elastic_ep – operational control

Conditionally available (not protected):

/tokenizer_info – only when --enable-tokenizer-info-endpoint; may expose chat templates
/server_info, /collective_rpc, /sleep, /wake_up, /reset_prefix_cache, etc. – only when VLLM_SERVER_DEV_MODE=1; must not be enabled in production

The recommended deployment pattern is to place vLLM behind a reverse proxy that allowlists only the endpoints clients should access.

Sources: docs/usage/security.md112-224

Headless Mode

--headless implies api_server_count=0.
The vllm_config is created with headless=True.
CoreEngineProcManager launches one engine process per local data-parallel rank.
If node_rank_within_dp > 0 (a non-head node in a multi-node pipeline/tensor-parallel group), a MultiprocExecutor is started instead.

Sources: vllm/entrypoints/cli/serve.py137-215

OpenAI-Compatible API Server

Purpose and Scope

Architecture Overview

CLI Entry Points

Server Startup Sequence

FastAPI Application Construction (`build_app`)

Route Registration

Middleware Stack

App State Initialization (`init_app_state`)

Engine Client Lifecycle

Multi-Server (Data Parallel) Mode

CLI Argument Reference

`BaseFrontendArgs`

`FrontendArgs` (extends `BaseFrontendArgs`)

Argument Validation

Socket Management

Security Considerations

Headless Mode

On this page

OpenAI-Compatible API Server

Purpose and Scope

Architecture Overview

CLI Entry Points

Server Startup Sequence

FastAPI Application Construction (`build_app`)

Route Registration

Middleware Stack

App State Initialization (`init_app_state`)

Engine Client Lifecycle

Multi-Server (Data Parallel) Mode

CLI Argument Reference

`BaseFrontendArgs`

`FrontendArgs` (extends `BaseFrontendArgs`)

Argument Validation

Socket Management

Security Considerations

Headless Mode

On this page

OpenAI-Compatible API Server

Purpose and Scope

Architecture Overview

CLI Entry Points

Server Startup Sequence

FastAPI Application Construction (build_app)

Route Registration

Middleware Stack

App State Initialization (init_app_state)

Engine Client Lifecycle

Multi-Server (Data Parallel) Mode

CLI Argument Reference

BaseFrontendArgs

FrontendArgs (extends BaseFrontendArgs)

Argument Validation

Socket Management

Security Considerations

Headless Mode

On this page

OpenAI-Compatible API Server

Purpose and Scope

Architecture Overview

CLI Entry Points

Server Startup Sequence

FastAPI Application Construction (build_app)

Route Registration

Middleware Stack

App State Initialization (init_app_state)

Engine Client Lifecycle

Multi-Server (Data Parallel) Mode

CLI Argument Reference

BaseFrontendArgs

FrontendArgs (extends BaseFrontendArgs)

Argument Validation

Socket Management

Security Considerations

Headless Mode

On this page

FastAPI Application Construction (`build_app`)

App State Initialization (`init_app_state`)

`BaseFrontendArgs`

`FrontendArgs` (extends `BaseFrontendArgs`)

FastAPI Application Construction (`build_app`)

App State Initialization (`init_app_state`)

`BaseFrontendArgs`

`FrontendArgs` (extends `BaseFrontendArgs`)