SmolDocling OCR App

A Streamlit application that leverages the power of SmolDocling for advanced document OCR (Optical Character Recognition). This app extracts text from document images with high accuracy and produces structured output in both DocTags and Markdown formats.

Features

Single or Multiple Image Processing: Upload one image or batch process multiple documents
Specialized Document Processing: Support for various document types including:
- General document conversion
- Table extraction (OTSL format)
- Code extraction
- Formula conversion to LaTeX
- Chart data extraction
- Section header extraction
Structured Output Formats: Results provided in both DocTags format and rendered Markdown
Download Options: Save extraction results for further use

Requirements

Python 3.12+
Hugging Face account with API token (for model access)

Installation

Clone this repository:

git clone https://github.com/AIAnytime/SmolDocling-OCR-App
cd smoldocling

Install dependencies using UV (recommended):
```
uv pip install -r requirements.txt
```
Alternatively, using pip:
```
pip install -r requirements.txt
```
Create a .env file in the project root with your Hugging Face token:
```
HF_TOKEN=your_huggingface_token_here
```

Usage

Start the Streamlit app:
```
streamlit run main.py
```
Open your browser and navigate to the displayed URL (typically http://localhost:8501)
Use the sidebar to:
- Select single or multiple image upload mode
- Choose the processing task type
- Upload your document image(s)
Click "Process Image(s)" to start the OCR
View and download results in DocTags and Markdown formats

Dependencies

streamlit - Web application framework
torch - Deep learning framework
transformers - Hugging Face Transformers library
docling-core - Document processing toolkit
huggingface_hub - Hugging Face model hub integration
Pillow - Image processing library
python-dotenv - Environment variable management
accelerate - Optional for hardware acceleration
PyMuPDF - pdf data extraction, analysis, conversion & manipulation library

License

MIT License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This project uses the SmolDocling model from DS4SD/SmolDocling-256M-preview.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
A-sample-prescription-image-in-grayscale-version.png		A-sample-prescription-image-in-grayscale-version.png
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmolDocling OCR App

Features

Requirements

Installation

Usage

Dependencies

License

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

AIAnytime/SmolDocling-OCR-App

Folders and files

Latest commit

History

Repository files navigation

SmolDocling OCR App

Features

Requirements

Installation

Usage

Dependencies

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages