I'm assuming you have set up a working environment with your triplestore and other systems you want. Note this repo is using the UV package management system. (see: https://docs.astral.sh/uv/)
Just playing with some established patterns to quickly load the graph. Used this notebook: typeTypeView.ipynb. Eventually, you get to this type to type style network.
NOTE: This is basically just the jsonldToTriple.ts read into python. ref: https://ai-docs.bio.xyz/developers/knowledge-graphs
First set up your triplestore, I'll use oxigraph, but later we can make this work for others too.
So there is JSON-LD for the groups to start with at https://github.com/bio-xyz/BioAgents. Specifically
- https://github.com/bio-xyz/BioAgents/tree/main/sampleJsonLds
- https://github.com/bio-xyz/BioAgents/tree/main/sampleJsonLdsNew
Sin 7B97 ce this is at GitHub it's easy to convert a directory to a sitemap format pointing to the raw URLs. We will use a tool from the Gleaner.io Archetype repo.
We can run these:
./scripts/github_jsonld_sitemap.py --output output/jld-sitemap.xml https://github.com/bio-xyz/BioAgents sampleJsonLds
./scripts/github_jsonld_sitemap.py --output output/jldnew-sitemap.xml https://github.com/bio-xyz/BioAgents sampleJsonLdsNew
To load out JSON-LD now, we can use the sitemap to pull the resources directly from GitHub.
./scripts/loadSitemapToTriplestore.sh ./output/jld-sitemap.xml http://homelab.lan:7878/store
and
`./scripts/loadSitemapToTriplestore.sh ./output/jldnew-sitemap.xml http://homelab.lan:7878/store
If you have been working and testing your triplestore, we can reset it to empty with:
WARNING: use this with caution!
curl -i -X POST -H 'Content-Type: application/sparql-update' --data 'DROP ALL' http://homelab.lan:7878/update
python biohack.py query --source http://homelab.lan:7878/query --sink foo --query ./sparql/getsubjects.rq --table bar
The convert mode allows you to convert HTML or PDF documents to Markdown format. You can specify either a URL or a local file as the input source.
Convert an HTML document from a URL:
python biohack.py convert -url https://example.com -output output/example.md
Convert a local PDF file:
python biohack.py convert -local path/to/document.pdf -output output/document.md
This functionality uses html2text for HTML conversion and PyPDF2 for PDF conversion. Make sure to install the required dependencies:
Use the code bamlTest.py to use OpenAI (set the key with something like)
export OPENAI_API_KEY="..."
Note: Since this is using BAML it's easy to modify clients.baml and add in any client. Ollama, for local, Xai, Google Gemini, etc. You will then need to modify the
client "openai/gpt-4o"
in hypothesis.baml and rerunbaml-cli generate
Then run with
python bamlTest.py --input input.md --output output.json
pip install html2text PyPDF2
Also worked up a notebook (typeTypeView.ipynb) to play around with search to visualization approaches.
References:
- BioAgent repo: https://github.com/bio-xyz/plugin-bioagent
- DKG (origin trail): https://docs.origintrail.io/build-with-dkg/quickstart-test-drive-the-dkg-in-5-mins