8000 GitHub - mouraworks/docowling: Get your documents MORE ready for gen AI
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

mouraworks/docowling

 
 

Repository files navigation

Docowling

Docowling

PyPI Downloads Docs PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2

Docowling is a fork of the Docling, an IBM project, developed to enhance functionalities and add new document processing capabilities.

Why Docowling?

Like an owl watching for all prey, docowling is a fork intended to attack all types of documents.

Docowling

Features

  • 📄 Converts popular formats (CSV, PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) to HTML, Markdown and JSON with embedded/referenced images
  • 🧩 Unified DoclingDocument format for standardized representation
  • 🤖 Ready-to-use integrations with LangChain, LlamaIndex, Crew AI & Haystack
  • 💻 Intuitive CLI for efficient batch processing with customizable export parameters

Coming Soon

  • 📄 More formats compatibility
  • 🤖 Optimize integrations with LangChain, Crew AI & Weaviate

Installation

To use Docowling, simply install docowling from your package manager, e.g. pip or uv:

pip install docowling
uv pip install docowling

Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.

Getting started

To convert individual documents, use convert(), for example:

from docowling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docowling Technical Report[...]"
from docowling.document_converter import DocumentConverter

source = "/content/drive/MyDrive/TESLA.csv"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  
# output: "| Date     |      Open |      High [...]"

License

The Docowling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

IBM ❤️ Thanks

Thank you IBM for creating Docling, the base of Docowling.

About

Get your documents MORE ready for gen AI

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 66.3%
  • HTML 33.6%
  • Dockerfile 0.1%
0