8000 Unstructured · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Unstructured.IO: ETL for LLMs

Welcome to Unstructured.IO! We're here on a mission to make all of your documents available for LLM applications, from PDFs and Word Docs to emails and markdown. To get started, check out our open source offerings.

Tried the open source library and ready for more power? Check out our products page to learn more about our paid API and Unstructured Platform, and ETL tool built around our core file transformation capabilities.

Learn more

Section Description
Company Website Unstructured.io product and company info
Documentation Full unstructured documentation

Popular repositories Loading

  1. unstructured unstructured Public

    Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …

    HTML 11.6k 962

  2. unstructured-api unstructured-api Public

    Python 756 164

  3. unstructured-inference unstructured-inference Public

    Python 186 62

  4. pipeline-sec-filings pipeline-sec-filings Public archive

    Preprocessing pipeline notebooks and API supporting text extraction from SEC documents

    Jupyter Notebook 146 32

  5. unstructured-python-client unstructured-python-client Public

    A Python client for the Unstructured Platform API

    Python 104 17

  6. unstructured-ingest unstructured-ingest Public

    HTML 93 46

Repositories

Showing 10 of 38 repositories
  • unstructured Public

    Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

    Unstructured-IO/unstructured’s past year of commit activity
    HTML 11,557 Apache-2.0 962 166 (3 issues need help) 49 Updated Jun 20, 2025
  • docs Public

    Documentation for all Unstructured products and libraries

    Unstructured-IO/docs’s past year of commit activity
    MDX 6 23 0 7 Updated Jun 20, 2025
  • unstructured-js-client Public

    A JavaScript/Typescript client for the Unstructured Platform API

    Unstructured-IO/unstructured-js-client’s past year of commit activity
    TypeScript 54 MIT 17 5 2 Updated Jun 20, 2025
  • unstructured-python-client Public

    A Python client for the Unstructured Platform API

    Unstructured-IO/unstructured-python-client’s past year of commit activity
    Python 104 MIT 17 11 6 Updated Jun 19, 2025
  • Unstructured-IO/unstructured-platform-plugins’s past year of commit activity
    Python 5 Apache-2.0 1 0 2 Updated Jun 19, 2025
  • Unstructured-IO/unstructured-ingest’s past year of commit activity
    HTML 93 Apache-2.0 46 55 22 Updated Jun 18, 2025
  • Unstructured-IO/unstructured-api’s past year of commit activity
    Python 756 Apache-2.0 164 33 7 Updated Jun 18, 2025
  • base-images Public

    Store Dockerfiles and Packer configs for images to use as a base to build upon

    Unstructured-IO/base-images’s past year of commit activity
    Shell 4 Apache-2.0 2 1 1 Updated Jun 17, 2025
  • Unstructured-IO/unstructured-inference’s past year of commit activity
    Python 186 Apache-2.0 62 22 12 Updated Jun 13, 2025
  • notebooks Public
    Unstructured-IO/notebooks’s past year of commit activity
    Jupyter Notebook 1 0 0 0 Updated Jun 13, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

0