This page provides practical code examples demonstrating common Crawl4AI usage patterns. Each example includes working code snippets you can adapt for your own projects.
What You'll Learn:
For architectural details about the crawler system, see Core Architecture. For configuration reference, see Configuration System.
The simplest way to crawl a single URL and extract markdown content:
Sources: README.md92-106
Control browser behavior with BrowserConfig:
Sources: README.md373-398 crawl4ai/async_configs.py354-450
Control caching behavior to optimize performance:
Sources: crawl4ai/cache_context.py1-20
Target specific page elements using CSS selectors:
Sources: README.md367-398
Access different content formats from the CrawlResult:
Sources: crawl4ai/models.py129-159
Extract structured data using OpenAI with a Pydantic schema:
Sources: docs/examples/llm_extraction_openai_pricing.py1-56 README.md478-518
Crawl4AI supports multiple LLM providers through a unified interface:
Sources: README.md478-518 crawl4ai/async_configs.py701-800
Extract structured data without LLM using CSS selectors:
Sources: README.md405-473 crawl4ai/extraction_strategy.py450-650
Execute JavaScript before extraction to interact with dynamic content:
Sources: README.md405-473
Crawl multiple URLs concurrently with automatic concurrency control:
Sources: README.md139-149
Process results as they complete using async generator:
Sources: crawl4ai/async_webcrawler.py600-800
Control crawl speed and concurrent requests:
Sources: crawl4ai/async_dispatcher.py1-200
Apply different configurations to different URL patterns:
Sources: crawl4ai/async_configs.py35-45
Automatic concurrency adjustment based on system memory:
Sources: crawl4ai/async_dispatcher.py1-400
Use the synchronous endpoint for simple, blocking requests:
Sources: docs/examples/docker_example.py49-59
Submit jobs to the queue for asynchronous processing:
Sources: docs/examples/docker_example.py14-47
Receive notifications when jobs complete:
The webhook will receive a POST request with the full result:
Sources: docs/examples/docker_example.py1-300
Structure extraction requests for the Docker API:
Sources: docs/examples/docker_example.py159-221
Discover and crawl linked pages using BFS, DFS, or best-first strategies:
Sources: README.md110-118 crawl4ai/deep_crawling.py1-500
Maintain browser state across multiple operations:
Sources: README.md523-558
Inject custom code at specific points in the crawl lifecycle:
Available hooks:
on_browser_created: After browser launchesbefore_goto: Before navigationafter_goto: After navigation completesbefore_retrieve_html: Before extracting HTMLbefore_return_html: Before returning resultSources: crawl4ai/async_crawler_strategy.py161-206
Filter page content using heuristics or LLM:
Sources: README.md367-398 crawl4ai/content_filter_strategy.py1-300
Use proxies with automatic rotation:
Sources: crawl4ai/async_configs.py229-353 crawl4ai/proxy_strategy.py1-100
Implement robust error handling:
Sources: crawl4ai/async_webcrawler.py485-506
Create reusable configuration objects for consistent behavior:
Sources: docs/examples/quickstart.py45-66
Sources: docs/examples/docker_example.py14-47
Sources: tests/async/test_0.4.2_config_params.py139-149
Use hooks to inject custom logic at specific lifecycle points:
Available Hook Points:
on_browser_createdbefore_gotoafter_gotobefore_retrieve_htmlbefore_return_htmlon_execution_startedon_screenshoton_pdfSources: docs/examples/quickstart.py132-143 crawl4ai/async_crawler_strategy.py1-200
The SDK automatically adjusts concurrency based on available memory:
The MemoryAdaptiveDispatcher monitors system memory and pauses task dispatch when thresholds are exceeded.
Sources: crawl4ai/async_dispatcher.py1-200
Create and reuse browser profiles for authenticated sessions:
Sources: crawl4ai/browser_config.py1-100
Decision Tree: Which Interface to Use
Key Classes and Methods Reference
| Interface | Class | Key Methods | Import/URL |
|---|---|---|---|
| SDK | AsyncWebCrawler | arun(), arun_many(), start(), close() | from crawl4ai import AsyncWebCrawler |
| SDK Config | BrowserConfig | Constructor with browser settings | from crawl4ai import BrowserConfig |
| SDK Config | CrawlerRunConfig | Constructor with crawl settings | from crawl4ai import CrawlerRunConfig |
| API Client | Crawl4AiTester | submit_and_wait(), submit_sync() | Custom implementation |
| API Client | NBCNewsAPITest | submit_crawl(), wait_for_task(), check_health() | Custom implementation |
Sources: crawl4ai/async_webcrawler.py1-100 crawl4ai/async_configs.py1-100 docs/examples/docker_example.py10-60 tests/test_main.py9-52
For detailed usage examples and patterns, see Usage Examples and Patterns.
For Docker API client implementation details, see Docker API Client.
Refresh this wiki