10000 [Bug]: DFS Deep crawling only crawling the 1st link · Issue #1071 · unclecode/crawl4ai · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Bug]: DFS Deep crawling only crawling the 1st link #1071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
margauxallee opened this issue May 4, 2025 · 0 comments
Open

[Bug]: DFS Deep crawling only crawling the 1st link #1071

margauxallee opened this issue May 4, 2025 · 0 comments
Labels
🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers

Comments

@margauxallee
Copy link
margauxallee commented May 4, 2025

crawl4ai version

0.6.2

Expected Behavior

Crawling the internal links until the depth specified, with a maximum of pages (specified as well), in DFS mode.

Current Behavior

Only crawling one page : the first link.

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Run any deep crawling code (DFS strategy) expecting more than one page scraped (max_pages > 1). The issue is only with DFS, not with BFS. The deep crawling worked well on v0.5.0 (no issue).

Code snippets

import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
from crawl4ai.deep_crawling import DFSDeepCrawlStrategy


async def crawler(
    url: str = "https://www.coca-colacompany.com/"
):

    dfs_config = CrawlerRunConfig(
        deep_crawl_strategy=DFSDeepCrawlStrategy(
            max_depth=2,
            include_external=False, 
            max_pages=20
        ),
        scraping_strategy=LXMLWebScrapingStrategy(),
        verbose=True,
        cache_mode=CacheMode.BYPASS,
    )

    print(f"\n===== CRAWLING ...=====")

    async with AsyncWebCrawler() as crawler:
           
        results = await crawler.arun(url=url, config=dfs_config)

        print(f" Crawled {len(results)} pages")
        for result in results:
            depth = result.metadata.get("depth", 0)
            print(f"  → Depth: {depth} | {result.url}")



if __name__ == "__main__":
    asyncio.run(crawler())

OS

macOS

Python version

3.13.2

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

===== CRAWLING ...=====
[INIT].... → Crawl4AI 0.6.2
[FETCH]... ↓ https://www.coca-colacompany.com/ | ✓ | ⏱: 2.42s
[SCRAPE].. ◆ https://www.coca-colacompany.com/ | ✓ | ⏱: 0.02s
[COMPLETE] ● https://www.coca-colacompany.com/ | ✓ | ⏱: 2.44s
Crawled 1 pages
→ Depth: 0 | https://www.coca-colacompany.com/

@margauxallee margauxallee added 🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers labels May 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers
Projects
None yet
Development

No branches or pull requests

1 participant
0