8000 GitHub - fils/biohack: Some notes and ideas related to the biohack event
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fils/biohack

Repository files navigation

BioHack

References

Setup

I'm assuming you have set up a working environment with your triplestore and other systems you want. Note this repo is using the UV package management system. (see: https://docs.astral.sh/uv/)

Notebook prototype

Just playing with some established patterns to quickly load the graph. Used this notebook: typeTypeView.ipynb. Eventually, you get to this type to type style network.

img.png

RDF load Quick start

NOTE: This is basically just the jsonldToTriple.ts read into python. ref: https://ai-docs.bio.xyz/developers/knowledge-graphs

First set up your triplestore, I'll use oxigraph, but later we can make this work for others too.

So there is JSON-LD for the groups to start with at https://github.com/bio-xyz/BioAgents. Specifically

Sin 7B97 ce this is at GitHub it's easy to convert a directory to a sitemap format pointing to the raw URLs. We will use a tool from the Gleaner.io Archetype repo.

We can run these:

./scripts/github_jsonld_sitemap.py --output output/jld-sitemap.xml https://github.com/bio-xyz/BioAgents sampleJsonLds 
./scripts/github_jsonld_sitemap.py --output output/jldnew-sitemap.xml https://github.com/bio-xyz/BioAgents sampleJsonLdsNew 

To load out JSON-LD now, we can use the sitemap to pull the resources directly from GitHub.

./scripts/loadSitemapToTriplestore.sh ./output/jld-sitemap.xml http://homelab.lan:7878/store

and

`./scripts/loadSitemapToTriplestore.sh ./output/jldnew-sitemap.xml http://homelab.lan:7878/store

If you have been working and testing your triplestore, we can reset it to empty with:

WARNING: use this with caution!

curl -i -X POST -H 'Content-Type: application/sparql-update' --data 'DROP ALL' http://homelab.lan:7878/update

biohack.py notes

query mode

python biohack.py query --source http://homelab.lan:7878/query  --sink foo  --query ./sparql/getsubjects.rq --table bar

convert mode

The convert mode allows you to convert HTML or PDF documents to Markdown format. You can specify either a URL or a local file as the input source.

Convert an HTML document from a URL:

python biohack.py convert -url https://example.com -output output/example.md

Convert a local PDF file:

python biohack.py convert -local path/to/document.pdf -output output/document.md

This functionality uses html2text for HTML conversion and PyPDF2 for PDF conversion. Make sure to install the required dependencies:

Hypothesis from LLM

Use the code bamlTest.py to use OpenAI (set the key with something like)

export OPENAI_API_KEY="..."

Note: Since this is using BAML it's easy to modify clients.baml and add in any client. Ollama, for local, Xai, Google Gemini, etc. You will then need to modify the client "openai/gpt-4o" in hypothesis.baml and rerun baml-cli generate

Then run with

python bamlTest.py --input input.md --output output.json

Notes

pip install html2text PyPDF2

Also worked up a notebook (typeTypeView.ipynb) to play around with search to visualization approaches.

References:

About

Some notes and ideas related to the biohack event

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0