GitHub - morphik-org/morphik-core: Open source multi-modal RAG for building AI apps over private knowledge.

Docs - Community - Why Morphik? - Bug reports

Migration Required for Existing Installations: If you installed Morphik before June 8, 2025, our most recent changes affect the way content is stored in the multivector chunks. Please run the scripts/migrate_multivector_to_external_storage.py script before launching Morphik.

Morphik is a AI-native toolset for visually rich documents and multimodal data

We are building the best way for developers to integrate context (however complex and nuanced) into their AI applications. We offer a treasure chest of tools to store, represent, and search (shallow, and deep) unstructured data. End-to-End.

Why?

Building AI applications that interact with data shouldn't require duct-taping together a dozen different tools just to get relevant results to your LLM.

Traditional RAG approaches that work in proof-of-concepts often fail spectacularly in production. Cobbling together separate systems for text extraction, OCR, embeddings, vector databases, and retrieval creates fragile pipelines that break under real-world load. Each component brings its own APIs, configurations, and failure modes - what starts as a simple demo becomes an unmaintainable mess at scale.

Even worse, these pipelines fundamentally fail at understanding visually rich documents. Charts become meaningless text fragments. Critical diagrams lose their spatial relationships. Tables get mangled into unreadable strings. Technical specifications with mixed text and visuals? Forget about accuracy.

The result is AI applications that confidently return wrong answers because they never truly understood the documents. They miss crucial information embedded in images, misinterpret technical diagrams, and treat visual data as an afterthought. And performance? Watch your infrastructure costs explode as your LLM re-processes the same 500-page manual for every single query.

What?

Morphik provides developers the tools to ingest, search (deep and shallow), transform, and manage unstructured and multimodal documents. Some of our features include:

Multimodal Search: We employ techniques such as ColPali to build search that actually understands the visual content of documents you provide. Search over images, PDFs, videos, and more with a single endpoint.
Knowledge Graphs: Build knowledge graphs for domain-specific use cases in a single line of code. Use our battle-tested system prompts, or use your own.
Fast and Scalable Metadata Extraction: Extract metadata from documents - including bounding boxes, labeling, classification, and more.
Integrations: Integrate with existing tools and workflows. Including (but not limited to) Google Suite, Slack, and Confluence.
Cache-Augmented-Generation: Create persistent KV-caches of your documents to speed up generation.

The best part? Morphik has a free tier and is open source! Get started by signing up at Morphik.

Getting Started with Morphik (Recommended)

The fastest and easiest way to get started with Morphik is by signing up for free at Morphik. We have a generous free tier and transparent, compute-usage based pricing if you're looking to ingest a lot of data.

Self-hosting Morphik

If you'd like to self-host Morphik, you can find the dedicated instruction here. We offer options for direct installation and installation via docker.

Important: Due to limited resources, we cannot provide full support for self-hosted deployments. We have an installation guide, and a Discord community to help, but we can't guarantee full support.

Using Morphik

Once you've signed up for Morphik, you can get started with ingesting and searching your data right away.

Code (Example: Python SDK)

For programmers, we offer a Python SDK and a REST API. Ingesting a file is as simple as:

from morphik import Morphik

morphik = Morphik("<your-morphik-uri>")
morphik.ingest_file("path/to/your/super/complex/file.pdf")

Similarly, searching and querying your data is easy too:

morphik.query("What's the height of screw 14-A in the chair assembly instructions?")

Morphik Console

You can also interact with Morphik via the Morphik Console. This is a web-based interface that allows you to ingest, search, and query your data. You can upload files, connect to different data sources, and chat with your data all within the same place.

Model Context Protocol

Finally, you can also access Morphik via MCP. Instructions are available here.

Contributing

You're welcome to contribute to the project! We love:

Bug reports via GitHub issues
Feature requests via GitHub issues
Pull requests

Currently, we're focused on improving speed, integrating with more tools, and finding the research papers that provide the most value to our users. If you have thoughts, let us know in the discord or in GitHub!

License

Morphik Core is source-available under the Business Source License 1.1.

Personal / Indie use: free.
Commercial production use: free if your Morphik deployment generates < $2 000/month in gross revenue.
Otherwise purchase a commercial key at https://morphik.ai/pricing.
Future open source: each code version automatically re-licenses to Apache 2.0 exactly four years after its first release.

See the full licence text for details.

Contributors

Visit our special thanks page dedicated to our contributors.

Name		Name	Last commit message	Last commit date
Latest commit History 486 Commits
.github/workflows		.github/workflows
assets		assets
core		core
docs		docs
ee		ee
evaluations		evaluations
examples		examples
scripts		scripts
sdks/python		sdks/python
utils		utils
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DOCKER.md		DOCKER.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
db_atf_demo_hq.gif		db_atf_demo_hq.gif
docker-compose.run.yml		docker-compose.run.yml
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
dump.sql		dump.sql
install_and_start.sh		install_and_start.sh
install_docker.sh		install_docker.sh
morphik.docker.toml		morphik.docker.toml
morphik.toml		morphik.toml
morphik_no_pad.png		morphik_no_pad.png
ollama-entrypoint.sh		ollama-entrypoint.sh
postgres.dockerfile		postgres.dockerfile
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
quick_setup.py		quick_setup.py
start.sh		start.sh
start_server.py		start_server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Morphik is a AI-native toolset for visually rich documents and multimodal data

Why?

What?

Table of Contents

Getting Started with Morphik (Recommended)

Self-hosting Morphik

Using Morphik

Code (Example: Python SDK)

Morphik Console

Model Context Protocol

Contributing

License

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 16

Uh oh!

Languages

License

morphik-org/morphik-core

Folders and files

Latest commit

History

Repository files navigation

Morphik is a AI-native toolset for visually rich documents and multimodal data

Why?

What?

Table of Contents

Getting Started with Morphik (Recommended)

Self-hosting Morphik

Using Morphik

Code (Example: Python SDK)

Morphik Console

Model Context Protocol

Contributing

License

Contributors

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 16

Uh oh!

Languages

Packages