8000 GitHub - NerdBaba/webcrawler-md: A simple webcrawler application built with HTML, CSS, and JavaScript.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

NerdBaba/webcrawler-md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Webcrawler Markdown

A simple webcrawler application built with HTML, CSS, and JavaScript.

Overview

This project allows users to crawl a website, extract content, and save it as Markdown.

Preview

image

Features

  • Crawls a website to a specified depth.
  • Extracts headings, links, and paragraphs from crawled pages.
  • Converts extracted content to Markdown format.
  • Allows users to save the Markdown content to a file.
  • Displays status updates and logs during the crawling process.

Usage

  1. Enter the URL of the website you want to crawl in the URL input field.
  2. Specify the crawl depth in the depth input field.
  3. Click the "Start Crawl" button to begin crawling.
  4. View the extracted Markdown content in the output section.
  5. Click the "Save Markdown" button to save the content to a file.

Implementation Details

  • HTML: Provides the structure and user interface of the application.
  • CSS: Styles the application for a better user experience.
  • JavaScript: Handles the crawling logic, content extraction, and Markdown conversion.

Files

  • index.html: The main HTML file containing the application's structure, styles, and JavaScript code.

Dependencies

No external libraries or frameworks are required.

Limitations

  • The crawler may not work correctly on websites with complex JavaScript or dynamic content.
  • The crawler may be blocked by websites with anti-scraping measures.
  • The extracted content may not be perfectly formatted.
  • The crawler does not support crawling behind login pages or forms.

About

A simple webcrawler application built with HTML, CSS, and JavaScript.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0