BioHack

References

Setup

I'm assuming you have set up a working environment with your triplestore and other systems you want. Note this repo is using the UV package management system. (see: https://docs.astral.sh/uv/)

Notebook prototype

Just playing with some established patterns to quickly load the graph. Used this notebook: typeTypeView.ipynb. Eventually, you get to this type to type style network.

RDF load Quick start

NOTE: This is basically just the jsonldToTriple.ts read into python. ref: https://ai-docs.bio.xyz/developers/knowledge-graphs

First set up your triplestore, I'll use oxigraph, but later we can make this work for others too.

So there is JSON-LD for the groups to start with at https://github.com/bio-xyz/BioAgents. Specifically

Sin 7B97 ce this is at GitHub it's easy to convert a directory to a sitemap format pointing to the raw URLs. We will use a tool from the Gleaner.io Archetype repo.

We can run these:

./scripts/github_jsonld_sitemap.py --output output/jld-sitemap.xml https://github.com/bio-xyz/BioAgents sampleJsonLds

./scripts/github_jsonld_sitemap.py --output output/jldnew-sitemap.xml https://github.com/bio-xyz/BioAgents sampleJsonLdsNew

To load out JSON-LD now, we can use the sitemap to pull the resources directly from GitHub.

./scripts/loadSitemapToTriplestore.sh ./output/jld-sitemap.xml http://homelab.lan:7878/store

and

`./scripts/loadSitemapToTriplestore.sh ./output/jldnew-sitemap.xml http://homelab.lan:7878/store

If you have been working and testing your triplestore, we can reset it to empty with:

WARNING: use this with caution!

curl -i -X POST -H 'Content-Type: application/sparql-update' --data 'DROP ALL' http://homelab.lan:7878/update

biohack.py notes

query mode

python biohack.py query --source http://homelab.lan:7878/query  --sink foo  --query ./sparql/getsubjects.rq --table bar

convert mode

The convert mode allows you to convert HTML or PDF documents to Markdown format. You can specify either a URL or a local file as the input source.

Convert an HTML document from a URL:

python biohack.py convert -url https://example.com -output output/example.md

Convert a local PDF file:

python biohack.py convert -local path/to/document.pdf -output output/document.md

This functionality uses html2text for HTML conversion and PyPDF2 for PDF conversion. Make sure to install the required dependencies:

Hypothesis from LLM

Use the code bamlTest.py to use OpenAI (set the key with something like)

export OPENAI_API_KEY="..."

Note: Since this is using BAML it's easy to modify clients.baml and add in any client. Ollama, for local, Xai, Google Gemini, etc. You will then need to modify the client "openai/gpt-4o" in hypothesis.baml and rerun baml-cli generate

Then run with

python bamlTest.py --input input.md --output output.json

Notes

pip install html2text PyPDF2

Also worked up a notebook (typeTypeView.ipynb) to play around with search to visualization approaches.

References:

BioAgent repo: https://github.com/bio-xyz/plugin-bioagent
DKG (origin trail): https://docs.origintrail.io/build-with-dkg/quickstart-test-drive-the-dkg-in-5-mins

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
baml_client		baml_client
baml_src		baml_src
defs		defs
docs		docs
notebooks		notebooks
output		output
scripts		scripts
sparql		sparql
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
bamlTest.py		bamlTest.py
biohack.py		biohack.py
parseJson.py		parseJson.py
process_sources.py		process_sources.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BioHack

References

Setup

Notebook prototype

RDF load Quick start

biohack.py notes

query mode

convert mode

Hypothesis from LLM

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

fils/biohack

Folders and files

Latest commit

History

Repository files navigation

BioHack

References

Setup

Notebook prototype

RDF load Quick start

biohack.py notes

query mode

convert mode

Hypothesis from LLM

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages