Usage Examples

Relevant source files

This page provides practical code examples demonstrating common Crawl4AI usage patterns. Each example includes working code snippets you can adapt for your own projects.

What You'll Learn:

Basic single-page and multi-page crawling patterns
LLM-powered structured data extraction with different providers
Batch processing with concurrency control and rate limiting
Docker API integration for distributed crawling
Advanced patterns including deep crawling, session management, and hooks

For architectural details about the crawler system, see Core Architecture. For configuration reference, see Configuration System.

Basic Crawling Examples

Minimal Crawl

The simplest way to crawl a single URL and extract markdown content:

Sources: README.md92-106

Crawl with Browser Configuration

Control browser behavior with BrowserConfig:

Sources: README.md373-398 crawl4ai/async_configs.py354-450

Using Cache Modes

Control caching behavior to optimize performance:

Sources: crawl4ai/cache_context.py1-20

Extracting Specific Content with CSS Selectors

Target specific page elements using CSS selectors:

Sources: README.md367-398

Processing Crawl Results

Access different content formats from the CrawlResult:

Sources: crawl4ai/models.py129-159

LLM Extraction Examples

OpenAI Schema-Based Extraction

Extract structured data using OpenAI with a Pydantic schema:

Sources: docs/examples/llm_extraction_openai_pricing.py1-56 README.md478-518

Using Different LLM Providers

Crawl4AI supports multiple LLM providers through a unified interface:

Sources: README.md478-518 crawl4ai/async_configs.py701-800

CSS Selector-Based Extraction (No LLM)

Extract structured data without LLM using CSS selectors:

Sources: README.md405-473 crawl4ai/extraction_strategy.py450-650

JavaScript Execution with Extraction

Execute JavaScript before extraction to interact with dynamic content:

Sources: README.md405-473

Multi-URL Patterns

Batch Crawling with arun_many

Crawl multiple URLs concurrently with automatic concurrency control:

Sources: README.md139-149

Streaming Results

Process results as they complete using async generator:

Sources: crawl4ai/async_webcrawler.py600-800

Rate Limiting and Concurrency Control

Control crawl speed and concurrent requests:

Sources: crawl4ai/async_dispatcher.py1-200

Per-URL Configuration Matching

Apply different configurations to different URL patterns:

Sources: crawl4ai/async_configs.py35-45

Memory-Adaptive Crawling

Automatic concurrency adjustment based on system memory:

Sources: crawl4ai/async_dispatcher.py1-400

Docker API Usage

Synchronous Crawl Request

Use the synchronous endpoint for simple, blocking requests:

Sources: docs/examples/docker_example.py49-59

Async Job Submission

Submit jobs to the queue for asynchronous processing:

Sources: docs/examples/docker_example.py14-47

Webhook Integration

Receive notifications when jobs complete:

The webhook will receive a POST request with the full result:

Sources: docs/examples/docker_example.py1-300

Docker API with Extraction

Structure extraction requests for the Docker API:

Sources: docs/examples/docker_example.py159-221

Advanced Patterns

Deep Crawling

Discover and crawl linked pages using BFS, DFS, or best-first strategies:

Sources: README.md110-118 crawl4ai/deep_crawling.py1-500

Session Management for Multi-Step Workflows

Maintain browser state across multiple operations:

Sources: README.md523-558

Using Hooks for Custom Logic

Inject custom code at specific points in the crawl lifecycle:

Available hooks:

on_browser_created: After browser launches
before_goto: Before navigation
after_goto: After navigation completes
before_retrieve_html: Before extracting HTML
before_return_html: Before returning result

Sources: crawl4ai/async_crawler_strategy.py161-206

Content Filtering with Custom Strategies

Filter page content using heuristics or LLM:

Sources: README.md367-398 crawl4ai/content_filter_strategy.py1-300

Proxy Rotation

Use proxies with automatic rotation:

Sources: crawl4ai/async_configs.py229-353 crawl4ai/proxy_strategy.py1-100

Error Handling and Retries

Implement robust error handling:

Sources: crawl4ai/async_webcrawler.py485-506

Common Integration Patterns

Pattern 1: Configuration Reuse

Create reusable configuration objects for consistent behavior:

Sources: docs/examples/quickstart.py45-66

Pattern 2: Error Handling and Retries

Sources: docs/examples/docker_example.py14-47

Pattern 3: Batch Processing with Rate Limiting

Sources: tests/async/test_0.4.2_config_params.py139-149

Advanced SDK Features

Hook System Integration

Use hooks to inject custom logic at specific lifecycle points:

Available Hook Points:

on_browser_created
before_goto
after_goto
before_retrieve_html
before_return_html
on_execution_started
on_screenshot
on_pdf

Sources: docs/examples/quickstart.py132-143 crawl4ai/async_crawler_strategy.py1-200

Memory-Adaptive Crawling

The SDK automatically adjusts concurrency based on available memory:

The MemoryAdaptiveDispatcher monitors system memory and pauses task dispatch when thresholds are exceeded.

Sources: crawl4ai/async_dispatcher.py1-200

Browser Profile Management

Create and reuse browser profiles for authenticated sessions:

Sources: crawl4ai/browser_config.py1-100

Summary: SDK and Client Usage Quick Reference

Decision Tree: Which Interface to Use

Key Classes and Methods Reference

Interface	Class	Key Methods	Import/URL
SDK	`AsyncWebCrawler`	`arun()`, `arun_many()`, `start()`, `close()`	`from crawl4ai import AsyncWebCrawler`
SDK Config	`BrowserConfig`	Constructor with browser settings	`from crawl4ai import BrowserConfig`
SDK Config	`CrawlerRunConfig`	Constructor with crawl settings	`from crawl4ai import CrawlerRunConfig`
API Client	`Crawl4AiTester`	`submit_and_wait()`, `submit_sync()`	Custom implementation
API Client	`NBCNewsAPITest`	`submit_crawl()`, `wait_for_task()`, `check_health()`	Custom implementation

Sources: crawl4ai/async_webcrawler.py1-100 crawl4ai/async_configs.py1-100 docs/examples/docker_example.py10-60 tests/test_main.py9-52

For detailed usage examples and patterns, see Usage Examples and Patterns.

For Docker API client implementation details, see Docker API Client.

Usage Examples

Relevant source files

This page provides practical code examples demonstrating common Crawl4AI usage patterns. Each example includes working code snippets you can adapt for your own projects.

What You'll Learn:

Basic single-page and multi-page crawling patterns
LLM-powered structured data extraction with different providers
Batch processing with concurrency control and rate limiting
Docker API integration for distributed crawling
Advanced patterns including deep crawling, session management, and hooks

For architectural details about the crawler system, see Core Architecture. For configuration reference, see Configuration System.

Basic Crawling Examples

Minimal Crawl

The simplest way to crawl a single URL and extract markdown content:

Sources: README.md92-106

Crawl with Browser Configuration

Control browser behavior with BrowserConfig:

Sources: README.md373-398 crawl4ai/async_configs.py354-450

Using Cache Modes

Control caching behavior to optimize performance:

Sources: crawl4ai/cache_context.py1-20

Extracting Specific Content with CSS Selectors

Target specific page elements using CSS selectors:

Sources: README.md367-398

Processing Crawl Results

Access different content formats from the CrawlResult:

Sources: crawl4ai/models.py129-159

LLM Extraction Examples

OpenAI Schema-Based Extraction

Extract structured data using OpenAI with a Pydantic schema:

Sources: docs/examples/llm_extraction_openai_pricing.py1-56 README.md478-518

Using Different LLM Providers

Crawl4AI supports multiple LLM providers through a unified interface:

Sources: README.md478-518 crawl4ai/async_configs.py701-800

CSS Selector-Based Extraction (No LLM)

Extract structured data without LLM using CSS selectors:

Sources: README.md405-473 crawl4ai/extraction_strategy.py450-650

JavaScript Execution with Extraction

Execute JavaScript before extraction to interact with dynamic content:

Sources: README.md405-473

Multi-URL Patterns

Batch Crawling with arun_many

Crawl multiple URLs concurrently with automatic concurrency control:

Sources: README.md139-149

Streaming Results

Process results as they complete using async generator:

Sources: crawl4ai/async_webcrawler.py600-800

Rate Limiting and Concurrency Control

Control crawl speed and concurrent requests:

Sources: crawl4ai/async_dispatcher.py1-200

Per-URL Configuration Matching

Apply different configurations to different URL patterns:

Sources: crawl4ai/async_configs.py35-45

Memory-Adaptive Crawling

Automatic concurrency adjustment based on system memory:

Sources: crawl4ai/async_dispatcher.py1-400

Docker API Usage

Synchronous Crawl Request

Use the synchronous endpoint for simple, blocking requests:

Sources: docs/examples/docker_example.py49-59

Async Job Submission

Submit jobs to the queue for asynchronous processing:

Sources: docs/examples/docker_example.py14-47

Webhook Integration

Receive notifications when jobs complete:

The webhook will receive a POST request with the full result:

Sources: docs/examples/docker_example.py1-300

Docker API with Extraction

Structure extraction requests for the Docker API:

Sources: docs/examples/docker_example.py159-221

Advanced Patterns

Deep Crawling

Discover and crawl linked pages using BFS, DFS, or best-first strategies:

Sources: README.md110-118 crawl4ai/deep_crawling.py1-500

Session Management for Multi-Step Workflows

Maintain browser state across multiple operations:

Sources: README.md523-558

Using Hooks for Custom Logic

Inject custom code at specific points in the crawl lifecycle:

Available hooks:

on_browser_created: After browser launches
before_goto: Before navigation
after_goto: After navigation completes
before_retrieve_html: Before extracting HTML
before_return_html: Before returning result

Sources: crawl4ai/async_crawler_strategy.py161-206

Content Filtering with Custom Strategies

Filter page content using heuristics or LLM:

Sources: README.md367-398 crawl4ai/content_filter_strategy.py1-300

Proxy Rotation

Use proxies with automatic rotation:

Sources: crawl4ai/async_configs.py229-353 crawl4ai/proxy_strategy.py1-100

Error Handling and Retries

Implement robust error handling:

Sources: crawl4ai/async_webcrawler.py485-506

Common Integration Patterns

Pattern 1: Configuration Reuse

Create reusable configuration objects for consistent behavior:

Sources: docs/examples/quickstart.py45-66

Pattern 2: Error Handling and Retries

Sources: docs/examples/docker_example.py14-47

Pattern 3: Batch Processing with Rate Limiting

Sources: tests/async/test_0.4.2_config_params.py139-149

Advanced SDK Features

Hook System Integration

Use hooks to inject custom logic at specific lifecycle points:

Available Hook Points:

on_browser_created
before_goto
after_goto
before_retrieve_html
before_return_html
on_execution_started
on_screenshot
on_pdf

Sources: docs/examples/quickstart.py132-143 crawl4ai/async_crawler_strategy.py1-200

Memory-Adaptive Crawling

The SDK automatically adjusts concurrency based on available memory:

The MemoryAdaptiveDispatcher monitors system memory and pauses task dispatch when thresholds are exceeded.

Sources: crawl4ai/async_dispatcher.py1-200

Browser Profile Management

Create and reuse browser profiles for authenticated sessions:

Sources: crawl4ai/browser_config.py1-100

Summary: SDK and Client Usage Quick Reference

Decision Tree: Which Interface to Use

Key Classes and Methods Reference

Interface	Class	Key Methods	Import/URL
SDK	`AsyncWebCrawler`	`arun()`, `arun_many()`, `start()`, `close()`	`from crawl4ai import AsyncWebCrawler`
SDK Config	`BrowserConfig`	Constructor with browser settings	`from crawl4ai import BrowserConfig`
SDK Config	`CrawlerRunConfig`	Constructor with crawl settings	`from crawl4ai import CrawlerRunConfig`
API Client	`Crawl4AiTester`	`submit_and_wait()`, `submit_sync()`	Custom implementation
API Client	`NBCNewsAPITest`	`submit_crawl()`, `wait_for_task()`, `check_health()`	Custom implementation

Sources: crawl4ai/async_webcrawler.py1-100 crawl4ai/async_configs.py1-100 docs/examples/docker_example.py10-60 tests/test_main.py9-52

For detailed usage examples and patterns, see Usage Examples and Patterns.

For Docker API client implementation details, see Docker API Client.

Usage Examples

Basic Crawling Examples

Minimal Crawl

Crawl with Browser Configuration

Using Cache Modes

Extracting Specific Content with CSS Selectors

Processing Crawl Results

LLM Extraction Examples

OpenAI Schema-Based Extraction

Using Different LLM Providers

CSS Selector-Based Extraction (No LLM)

JavaScript Execution with Extraction

Multi-URL Patterns

Batch Crawling with arun_many

Streaming Results

Rate Limiting and Concurrency Control

Per-URL Configuration Matching

Memory-Adaptive Crawling

Docker API Usage

Synchronous Crawl Request

Async Job Submission

Webhook Integration

Docker API with Extraction

Advanced Patterns

Deep Crawling

Session Management for Multi-Step Workflows

Using Hooks for Custom Logic

Content Filtering with Custom Strategies

Proxy Rotation

Error Handling and Retries

Common Integration Patterns

Pattern 1: Configuration Reuse

Pattern 2: Error Handling and Retries

Pattern 3: Batch Processing with Rate Limiting

Advanced SDK Features

Hook System Integration

Memory-Adaptive Crawling

Browser Profile Management

Summary: SDK and Client Usage Quick Reference

On this page

Usage Examples

Basic Crawling Examples

Minimal Crawl

Crawl with Browser Configuration

Using Cache Modes

Extracting Specific Content with CSS Selectors

Processing Crawl Results

LLM Extraction Examples

OpenAI Schema-Based Extraction

Using Different LLM Providers

CSS Selector-Based Extraction (No LLM)

JavaScript Execution with Extraction

Multi-URL Patterns

Batch Crawling with arun_many

Streaming Results

Rate Limiting and Concurrency Control

Per-URL Configuration Matching

Memory-Adaptive Crawling

Docker API Usage

Synchronous Crawl Request

Async Job Submission

Webhook Integration

Docker API with Extraction

Advanced Patterns

Deep Crawling

Session Management for Multi-Step Workflows

Using Hooks for Custom Logic

Content Filtering with Custom Strategies

Proxy Rotation

Error Handling and Retries

Common Integration Patterns

Pattern 1: Configuration Reuse

Pattern 2: Error Handling and Retries

Pattern 3: Batch Processing with Rate Limiting

Advanced SDK Features

Hook System Integration

Memory-Adaptive Crawling

Browser Profile Management

Summary: SDK and Client Usage Quick Reference

On this page