A simple webcrawler application built with HTML, CSS, and JavaScript.
This project allows users to crawl a website, extract content, and save it as Markdown.
- Crawls a website to a specified depth.
- Extracts headings, links, and paragraphs from crawled pages.
- Converts extracted content to Markdown format.
- Allows users to save the Markdown content to a file.
- Displays status updates and logs during the crawling process.
- Enter the URL of the website you want to crawl in the URL input field.
- Specify the crawl depth in the depth input field.
- Click the "Start Crawl" button to begin crawling.
- View the extracted Markdown content in the output section.
- Click the "Save Markdown" button to save the content to a file.
- HTML: Provides the structure and user interface of the application.
- CSS: Styles the application for a better user experience.
- JavaScript: Handles the crawling logic, content extraction, and Markdown conversion.
index.html
: The main HTML file containing the application's structure, styles, and JavaScript code.
No external libraries or frameworks are required.
- The crawler may not work correctly on websites with complex JavaScript or dynamic content.
- The crawler may be blocked by websites with anti-scraping measures.
- The extracted content may not be perfectly formatted.
- The crawler does not support crawling behind login pages or forms.