Docowling

Docowling is a fork of the Docling, an IBM project, developed to enhance functionalities and add new document processing capabilities.

Why Docowling?

Like an owl watching for all prey, docowling is a fork intended to attack all types of documents.

Features

📄 Converts popular formats (CSV, PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) to HTML, Markdown and JSON with embedded/referenced images
🧩 Unified DoclingDocument format for standardized representation
🤖 Ready-to-use integrations with LangChain, LlamaIndex, Crew AI & Haystack
💻 Intuitive CLI for efficient batch processing with customizable export parameters

Coming Soon

📄 More formats compatibility
🤖 Optimize integrations with LangChain, Crew AI & Weaviate

Installation

To use Docowling, simply install docowling from your package manager, e.g. pip or uv:

pip install docowling

uv pip install docowling

Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.

Getting started

To convert individual documents, use convert(), for example:

from docowling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docowling Technical Report[...]"

from docowling.document_converter import DocumentConverter

source = "/content/drive/MyDrive/TESLA.csv"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  
# output: "| Date     |      Open |      High [...]"

License

The Docowling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

IBM ❤️ Thanks

Thank you IBM for creating Docling, the base of Docowling.

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
.github		.github
docowling		docowling
docs		docs
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Docowling

Why Docowling?

Features

Coming Soon

Installation

Getting started

License

IBM ❤️ Thanks

About

Uh oh!

Releases

Packages

Languages

License

mouraworks/docowling

Folders and files

Latest commit

History

Repository files navigation

Docowling

Why Docowling?

Features

Coming Soon

Installation

Getting started

License

IBM ❤️ Thanks

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages