Welcome to VLM Run Hub, a comprehensive repository of pre-defined Pydantic schemas for extracting structured data from unstructured visual domains such as images, videos, and documents. Designed for Vision Language Models (VLMs) and optimized for real-world use cases, VLM Run Hub simplifies the integration of visual ETL into your workflows.
While vision models like OpenAIβs GPT-4o and Anthropicβs Claude Vision excel in exploratory tasks like "chat with images," they often lack practicality for automation and integration, where strongly-typed, validated outputs are crucial.
The Structured Outputs API (popularized by GPT-4o, Gemini) addresses this by constraining LLMs to return data in precise, strongly-typed formats such as Pydantic models. This eliminates complex parsing and validation, ensuring outputs conform to expected types and structures. These schemas can be nested and include complex types like lists and dictionaries, enabling seamless integration with existing systems while leveraging the full capabilities of the model.
- π Easy to use: Pydantic is a well-understood and battle-tested data model for structured data.
- π Batteries included: Each schema in this repo has been validated across real-world industry use casesβfrom healthcare to finance to mediaβsaving you weeks of development effort.< 8000 /li>
- π Automatic Data-validation: Built-in Pydantic validation ensures your extracted data is clean, accurate, and reliable, reducing errors and simplifying downstream workflows.
- π Type-safety: With Pydanticβs type-safety and compatibility with tools like
mypy
andpyright
, you can build composable, modular systems that are robust and maintainable. - π§° Model-agnostic: Use the same schema with multiple VLM providers, no need to rewrite prompts for different VLMs.
- π Optimized for Visual ETL: Purpose-built for extracting structured data from images, videos, and documents, this repo bridges the gap between unstructured data and actionable insights.
The VLM Run Hub maintains a comprehensive catalog of all available schemas in the vlmrun/hub/catalog.yaml
file. The catalog is automatically validated to ensure consistency and completeness of schema documentation. We refer the developer to the catalog-spec.yaml for the full YAML specification. Here are some featured schemas:
- Documents: document.bank-statement, document.invoice, document.receipt, document.resume, document.us-drivers-license, document.utility-bill, document.w2-form
- Other industry-specific schemas: healthcare.medical-insurance-card, retail.ecommerce-product-caption, media.tv-news, aerospace.remote-sensing
If you have a new schema you want to add to the catalog, please refer to the SCHEMA-GUIDELINES.md for the full guidelines.
Let's say we want to extract invoice metadata from an invoice image. You can readily use our Invoice
schema we have defined under vlmrun.hub.schemas.document.invoice
and use it with any VLM of your choosing.
For a comprehensive walkthrough of available schemas and their usage, check out our Schema Showcase Notebook.
pip install vlmrun-hub
With Instructor / OpenAI
import instructor
from openai import OpenAI
from vlmrun.hub.schemas.document.invoice import Invoice
IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"
client = instructor.from_openai(
OpenAI(), mode=instructor.Mode.MD_JSON
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{ "role": "user", "content": [
{"type": "text", "text": "Extract the invoice in JSON."},
{"type": "image_url", "image_url": {"url": IMAGE_URL}, "detail": "auto"}
]}
],
response_model=Invoice,
temperature=0,
)
JSON Response:
With VLM Run
import requests
from vlmrun.hub.schemas.document.invoice import Invoice
IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"
json_data = {
"image": IMAGE_URL,
"model": "vlm-1",
"domain": "document.invoice",
"json_schema": Invoice.model_json_schema(),
}
response = requests.post(
f"https://api.vlm.run/v1/image/generate",
headers={"Authorization": f"Bearer <your-api-key>"},
json=json_data,
)
import instructor
from openai import OpenAI
from vlmrun.hub.schemas.document.invoice import Invoice
IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"
client = OpenAI()
completion = client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": [
{"type": "text", "text": "Extract the invoice in JSON."},
{"type": "image_url", "image_url": {"url": IMAGE_URL}, "detail": "auto"}
]},
],
response_format=Invoice,
temperature=0,
)
When working with the OpenAI Structured Outputs API, you need to ensure that the
response_format
is a valid Pydantic model with the supported types.
Locally with Ollama
Note: For certain vlmrun.common
utilities, you will need to install our main Python SDK
via pip install vlmrun
.
from ollama import chat
from vlmrun.common.image import encode_image
from vlmrun.common.utils import remote_image
from vlmrun.hub.schemas.document.invoice import Invoice
IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"
img = remote_image(IMAGE_URL)
chat_response = chat(
model="llama3.2-vision:11b",
format=Invoice.model_json_schema(),
messages=[
{
"role": "user",
"content": "Extract the invoice in JSON.",
"images": [encode_image(img, format="JPEG").split(",")[1]],
},
],
options={
"temperature": 0
},
)
response = Invoice.model_validate_json(
chat_response.message.content
)
We periodically run popular VLMs on each of the examples & schemas in the catalog.yaml file and publish the results in the benchmarks directory.
Provider | Model | Date | Results |
---|---|---|---|
OpenAI | gpt-4o-2024-11-20 | 2025-01-09 | link |
OpenAI | gpt-4o-mini-2024-07-18 | 2025-01-09 | link |
Gemini | gemini-2.0-flash-exp | 2025-01-10 | link |
Ollama | llama3.2-vision:11b | 2025-01-10 | link |
Ollama | Qwen2.5-VL-7B-Instruct:Q4_K_M_benxh | 2025-02-20 | link |
Ollama + Instructor | Qwen2.5-VL-7B-Instruct:Q4_K_M_benxh | 2025-02-20 | link |
Microsoft | phi-4 | 2025-01-10 | link |
Schemas are organized by industry for easy navigation:
vlmrun
βββ hub
βββ schemas
| βββ <industry>
| | βββ <use-case-1>.py
| | βββ <use-case-2>.py
| | βββ ...
βΒ Β βββ aerospace
βΒ Β βΒ Β βββ remote_sensing.py
βΒ Β βββ document # all document schemas are here
| | βββ invoice.py
| | βββ us_drivers_license.py
| | βββ ...
βΒ Β βββ healthcare
βΒ Β βΒ Β βββ medical_insurance_card.py
βΒ Β βββ retail
βΒ Β βΒ Β βββ ecommerce_product_caption.py
βΒ Β βββ contrib # all contributions are welcome here!
βΒ Β βββ <schema-name>.py
βββ version.py
We're building this hub for the community, and contributions are always welcome! Follow the CONTRIBUTING and SCHEMA-GUIDELINES.md to get started.
- π¬ Send us an email at support@vlm.run or join our Discord for help.
- π£ Follow us on Twitter, and LinkedIn to keep up-to-date on our products.