8000 GitHub - youtux/wikidump: Framework for the extraction of features from Wikipedia XML dumps.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Aug 29, 2020. It is now read-only.

youtux/wikidump

Repository files navigation

wikidump

Framework for the extraction of features from Wikipedia XML dumps.

Installation

This project has been tested with Python 3.5.0, but should also work with Python 3.4.3.

You need to install dependencies first, as usual.

pip install -r requirements.txt

Usage

You need to download Wikipiedia dumps first:

./download.sh

Then run the extractor:

python -m wikidump FILE [FILE ...] OUTPUT_DIR

It will take some time... RAM will not suffer, I promise.

About

Framework for the extraction of features from Wikipedia XML dumps.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0