Crawl4AI v0.5.0.post1 Release

Release 0.6.3 (unreleased)

Features

extraction: add RegexExtractionStrategy for pattern-based extraction, including built-in patterns for emails, URLs, phones, dates, support for custom regexes, an LLM-assisted pattern generator, optimized HTML preprocessing via fit_html, and enhanced network response body capture (9b5ccac)
docker-api: introduce job-based polling endpoints—POST /crawl/job & GET /crawl/job/{task_id} for crawls, POST /llm/job & GET /llm/job/{task_id} for LLM tasks—backed by Redis task management with configurable TTL, moved schemas to schemas.py, and added demo_docker_polling.py example (94e9959)
browser: improve profile management and cleanup—add process cleanup for existing Chromium instances on Windows/Unix, fix profile creation by passing full browser config, ship detailed browser/CLI docs and initial profile-creation test, bump version to 0.6.3 (9499164)

Fixes

crawler: remove automatic page closure in take_screenshot and take_screenshot_naive, preventing premature teardown; callers now must explicitly close pages (BREAKING CHANGE) (a3e9ef9)

Documentation

format bash scripts in docs/apps/linkdin/README.md so examples copy & paste cleanly (87d4b0f)
update the same README with full litellm argument details for correct script usage (bd5a9ac)

Refactoring

logger: centralize color codes behind an Enum in async_logger, browser_profiler, content_filter_strategy and related modules for cleaner, type-safe formatting (cd2b490)

Experimental

start migration of logging stack to rich (WIP, work ongoing) (b2f3cb0)

🚀 0.6.0 — 22 Apr 2025

Highlights

World‑aware crawlers:

crun_cfg = CrawlerRunConfig(
        url="https://browserleaks.com/geo",          # test page that shows your location
        locale="en-US",                              # Accept-Language & UI locale
        timezone_id="America/Los_Angeles",           # JS Date()/Intl timezone
        geolocation=GeolocationConfig(                 # override GPS coords
            latitude=34.0522,
            longitude=-118.2437,
            accuracy=10.0,
        )
    )

Table‑to‑DataFrame extraction, flip df = pd.DataFrame(result.media["tables"][0]["rows"], columns=result.media["tables"][0]["headers"]) and get CSV or pandas without extra parsing.
Crawler pool with pre‑warm, pages launch hot, lower P90 latency, lower memory.
Network and console capture, full traffic log plus MHTML snapshot for audits and debugging.

Added

Geolocation, locale, and timezone flags for every crawl.
Browser pooling with page pre‑warming.
Table extractor that exports to CSV or pandas.
Crawler pool manager in SDK and Docker API.
Network & console log capture, plus MHTML snapshot.
MCP socket and SSE endpoints with playground UI.
Stress‑test framework (tests/memory) for 1 k+ URL runs.
Docs v2: TOC, GitHub badge, copy‑code buttons, Docker API demo.
“Ask AI” helper button, work in progress, shipping soon.
New examples: geo location, network/console capture, Docker API, markdown source selection, crypto analysis.

Changed

Browser strategy consolidation, legacy docker modules removed.
ProxyConfig moved to async_configs.
Server migrated to pool‑based crawler management.
FastAPI validators replace custom query validation.
Docker build now uses a Chromium base image.
Repo cleanup, ≈36 k insertions, ≈5 k deletions across 121 files.

Fixed

Session leaks, duplicate visits, URL normalisation.
Target‑element regressions in scraping strategies.
Logged URL readability, encoded URL decoding, middle truncation.
Closed issues: #701 #733 #756 #774 #804 #822 #839 #841 #842 #843 #867 #902 #911.

Removed

Obsolete modules in crawl4ai/browser/*.

Deprecated

Old markdown generator names now alias DefaultMarkdownGenerator and warn.

Upgrade notes

Update any imports from crawl4ai/browser/* to the new pooled browser modules.
If you override AsyncPlaywrightCrawlerStrategy.get_page adopt the new signature.
Rebuild Docker images to pick up the Chromium layer.
Switch to DefaultMarkdownGenerator to silence deprecation warnings.

121 files changed, ≈36 223 insertions, ≈4 975 deletions

Crawl4AI v0.5.0.post1 Release

Release Theme: Power, Flexibility, and Scalability

Crawl4AI v0.5.0 is a major release focused on significantly enhancing the library's power, flexibility, and scalability.

Key Features

Deep Crawling System - Explore websites beyond initial URLs with BFS, DFS, and BestFirst strategies, with page limiting and scoring capabilities
Memory-Adaptive Dispatcher - Scale to thousands of URLs with intelligent memory monitoring and concurrency control
Multiple Crawling Strategies - Choose between browser-based (Playwright) or lightweight HTTP-only crawling
Docker Deployment - Easy deployment with FastAPI server, JWT authentication, and streaming/non-streaming endpoints
Command-Line Interface - New crwl CLI provides convenient access to all features with intuitive commands
Browser Profiler - Create and manage persistent browser profiles to save authentication states for protected content
Crawl4AI Coding Assistant - Interactive chat interface for asking questions about Crawl4AI and generating Python code examples
LXML Scraping Mode - Fast HTML parsing using the lxml library for 10-20x speedup with complex pages
Proxy Rotation - Built-in support for dynamic proxy switching with authentication and session persistence
PDF Processing - Extract and process data from PDF files (both local and remote)

Additional Improvements

LLM Content Filter for intelligent markdown generation
URL redirection tracking
LLM-powered schema generation utility for extraction templates
robots.txt compliance support
Enhanced browser context management
Improved serialization and config handling

Breaking Changes

This release contains several breaking changes. Please review the full release notes for migration guidance.

For complete details, visit: https://docs.crawl4ai.com/blog/releases/0.5.0/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 0.6.0 — 22 Apr 2025

Highlights

Added

Changed

Fixed

Removed

Deprecated

Upgrade notes

Crawl4AI v0.5.0.post1 Release

Key Features

Additional Improvements

Breaking Changes

Releases: unclecode/crawl4ai

v0.6.3

Crawl4AI 0.6.0

🚀 0.6.0 — 22 Apr 2025

Highlights

Added

Changed

Fixed

Removed

Deprecated

Upgrade notes

Crawl4AI v0.5.0.post1

Crawl4AI v0.5.0.post1 Release

Key Features

Additional Improvements

Breaking Changes