A Python script that downloads documents from web pages. It can process a single URL or multiple URLs from a text file, detect documents, and allow selective downloading through an interactive interface.
- Process single URL via command line or multiple URLs from a text file
- Detect documents based on MIME types and file extensions
- Interactive document selection with checkbox interface
- Filter documents by name
- Select/deselect all documents with one click
- Configurable output directory via environment variables
- Progress feedback during downloads
url_downloader/
├── document_downloader.py # Main script
├── requirements.txt # Python dependencies
├── urls.txt # (Optional) List of URLs to process
├── .env # Environment configuration
└── output/ # Downloaded documents directory
Create a .env
file in the project root:
OUTPUT_DIR=/path/to/your/download/folder
- Clone the repository:
git clone https://github.com/j-chacko/url_downloader.git
cd url_downloader
- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Create and configure the
.env
file with your preferred output directory
python document_downloader.py "https://example.com/documents"
- Create a
urls.txt
file with one URL per line - Run the script:
python document_downloader.py
- Use arrow keys to navigate
- Press SPACE to select/deselect documents
- Use the filter to search for specific documents
- Select "*** SELECT/DESELECT ALL ***" to toggle all documents
- Press ENTER to confirm selection and start downloading
- Python 3.6+
- Required packages (installed via requirements.txt):
- requests
- beautifulsoup4
- python-dotenv
- questionary
This project is licensed under the MIT License - see the LICENSE file for details.