Fix: Local HTML Files crawling bug #1073
Open
+2
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR will fix the Issue Fixes #1072 due to which the users are not able to scrape a local html files.
List of files changed and why
async_crawler_strategy.py -- Because inside the crawl() function on line number 421, there is elif condition on 448 which deals with file:// path url and captured_console variable is reached only when the capture_console_messages is true in the config. However 8000 , the capture_console_messages variable is False by default. Due to which the captured_console variable is never reached (unless the sets the capture_console_messages to True). To solve this issue I initialized captured_console = [] on line 443, similar to other functions where captured_console is initialized at the start of function.
How Has This Been Tested?
Once i applied the fix i tried to test the change by scraping 50-60 different local HTML files using the file:// path url.
Checklist: