A Streamlit application that leverages the power of SmolDocling for advanced document OCR (Optical Character Recognition). This app extracts text from document images with high accuracy and produces structured output in both DocTags and Markdown formats.
- Single or Multiple Image Processing: Upload one image or batch process multiple documents
- Specialized Document Processing: Support for various document types including:
- General document conversion
- Table extraction (OTSL format)
- Code extraction
- Formula conversion to LaTeX
- Chart data extraction
- Section header extraction
- Structured Output Formats: Results provided in both DocTags format and rendered Markdown
- Download Options: Save extraction results for further use
- Python 3.12+
- Hugging Face account with API token (for model access)
-
Clone this repository:
git clone https://github.com/AIAnytime/SmolDocling-OCR-App cd smoldocling
-
Install dependencies using UV (recommended):
uv pip install -r requirements.txt
Alternatively, using pip:
pip install -r requirements.txt
-
Create a
.env
file in the project root with your Hugging Face token:HF_TOKEN=your_huggingface_token_here
-
Start the Streamlit app:
streamlit run main.py
-
Open your browser and navigate to the displayed URL (typically http://localhost:8501)
-
Use the sidebar to:
- Select single or multiple image upload mode
- Choose the processing task type
- Upload your document image(s)
-
Click "Process Image(s)" to start the OCR
-
View and download results in DocTags and Markdown formats
- streamlit - Web application framework
- torch - Deep learning framework
- transformers - Hugging Face Transformers library
- docling-core - Document processing toolkit
- huggingface_hub - Hugging Face model hub integration
- Pillow - Image processing library
- python-dotenv - Environment variable management
- accelerate - Optional for hardware acceleration
- PyMuPDF - pdf data extraction, analysis, conversion & manipulation library
MIT License
This project is licensed under the MIT License - see the LICENSE file for details.
This project uses the SmolDocling model from DS4SD/SmolDocling-256M-preview.