8000 GitHub - temoto/heroshi: Heroshi – open source web crawler.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Apr 20, 2019. It is now read-only.

temoto/heroshi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heroshi, open source web crawler.

Motivation 1: learn HTTP, libraries, real world quirks.
Motivation 2: collection of libraries and tools for building custom crawlers.
Motivation 3: provide access to representative subset of Web for educational and research purposes.

As of 2012-10-12, last goal is not even started, but these guys did amazing job at it http://commoncrawl.org/

See http://temoto.github.com/heroshi/ for more information.

About

Heroshi – open source web crawler.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0