MSpider

Talk

The information security department of 360 company has been recruiting for a long time and is interested in contacting the mailbox zhangxin1[at]360.cn.

Installation

In Ubuntu, you need to install some libraries.

You can use pip or easy_install or apt-get to do this.

lxml
chardet
splinter
gevent
phantomjs

Example

Use MSpider collect the vulnerability information on the wooyun.org.

	python mspider.py -u "http://www.wooyun.org/bugs/" --focus-domain "wooyun.org" --filter-keyword "xxx" --focus-keyword "bugs" -t 15 --random-agent true

Use MSpider collect the news information on the news.sina.com.cn.

	python mspider.py -u "http://news.sina.com.cn/c/2015-12-20/doc-ifxmszek7395594.shtml" --focus-domain "news.sina.com.cn"  -t 15 --random-agent true

ToDo

Crawl and storage of information.
Distributed crawling.

MSpider's help

Usage:
  __  __  _____       _     _
 |  \/  |/ ____|     (_)   | |
 | \  / | (___  _ __  _  __| | ___ _ __
 | |\/| |\___ \| '_ \| |/ _` |/ _ \ '__|
 | |  | |____) | |_) | | (_| |  __/ |
 |_|  |_|_____/| .__/|_|\__,_|\___|_|
               | |
               |_|
                        Author: Manning23


Options:
  -h, --help            show this help message and exit
  -u MSPIDER_URL, --url=MSPIDER_URL
                        Target URL (e.g. "http://www.site.com/")
  -t MSPIDER_THREADS_NUM, --threads=MSPIDER_THREADS_NUM
                        Max number of concurrent HTTP(s) requests (default 10)
  --depth=MSPIDER_DEPTH
                        Crawling depth
  --count=MSPIDER_COUNT
                        Crawling number
  --time=MSPIDER_TIME   Crawl time
  --referer=MSPIDER_REFERER
                        HTTP Referer header value
  --cookies=MSPIDER_COOKIES
                        HTTP Cookie header value
  --spider-model=MSPIDER_MODEL
                        Crawling mode: Static_Spider: 0  Dynamic_Spider: 1
                        Mixed_Spider: 2
  --spider-policy=MSPIDER_POLICY
                        Crawling strategy: Breadth-first 0  Depth-first 1
                        Random-first 2
  --focus-keyword=MSPIDER_FOCUS_KEYWORD
                        Focus keyword in URL
  --filter-keyword=MSPIDER_FILTER_KEYWORD
                        Filter keyword in URL
  --filter-domain=MSPIDER_FILTER_DOMAIN
                        Filter domain
  --focus-domain=MSPIDER_FOCUS_DOMAIN
                        Focus domain
  --random-agent=MSPIDER_AGENT
                        Use randomly selected HTTP User-Agent header value
  --print-all=MSPIDER_PRINT_ALL
                        Will show more information

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
doc		doc
lib		lib
plugins		plugins
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mspider.py		mspider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSpider

Talk

Installation

Example

ToDo

MSpider's help

About

Releases

Packages

Languages

License

manning23/MSpider

Folders and files

Latest commit

History

Repository files navigation

MSpider

Talk

Installation

Example

ToDo

MSpider's help

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages