Docowling is a fork of the Docling, an IBM project, developed to enhance functionalities and add new document processing capabilities.
Like an owl watching for all prey, docowling is a fork intended to attack all types of documents.
- 📄 Converts popular formats (CSV, PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) to HTML, Markdown and JSON with embedded/referenced images
- 🧩 Unified DoclingDocument format for standardized representation
- 🤖 Ready-to-use integrations with LangChain, LlamaIndex, Crew AI & Haystack
- 💻 Intuitive CLI for efficient batch processing with customizable export parameters
- 📄 More formats compatibility
- 🤖 Optimize integrations with LangChain, Crew AI & Weaviate
To use Docowling, simply install docowling
from your package manager, e.g. pip or uv:
pip install docowling
uv pip install docowling
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
To convert individual documents, use convert()
, for example:
from docowling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docowling Technical Report[...]"
from docowling.document_converter import DocumentConverter
source = "/content/drive/MyDrive/TESLA.csv" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
# output: "| Date | Open | High [...]"
The Docowling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.
Thank you IBM for creating Docling, the base of Docowling.