Prototype for an integrated content-based language learning environment.
Links
How it looks like right now, with working multilingual sentence segmentation and tokenization:
- SurrealDB + Axum + Disk as backend service exposing an API
- Python + Stanza via PyO3 for NLP
- Svelte frontend that interacts with the API
- Tauri as a desktop client
- fsrs-rs for SRS algorithm
(only a partial plan)
Phase I - Project Skeleton
- file system content access
- working vocabulary database
- allow python scripting for extendable language support
- text processing: tokenization, lemmatization, and sentence segmentation
- document query api
- basic text reader
- token data write requests and confirmations
- svelte routing structure
- read toml language configurations
- read toml application configurations
- language-specific file listing
- ensure uniqueness of vocabulary database entries
Phase II - Packaging
- tauri wrapper
- figure out how to package python dependencies (check https://pyo3.rs/v0.14.2/building_and_distribution.html)
- document set up process
- build CI
Phase III - Frontend Usability
- feedback messages
Phase IV - Frontend Language Learning Features
- dictionary
- translation
- TTS
- sentence structure analysis?
Phase V - Code Quality
- error handling
- documentation
- security and accounts?
Phase ? - Future
- markdown rendering?
- video support
- audio support
- pdf + ocr support?
- Use
toml = "0.8.8"
for toml settings parsing and editing. - Current implementation is for rapid development. Change all unwrap to proper error handling.
- File on disk could lead to race condition, but probabily won't encounter in single user situation
- Language settings could be on disk
- security? account? whatever for now as it's localhost
Try not using conda, it didn't work Try not using mac's built-in python, it didn't work Installing stanza in virtual environment doesn't work for some reason. have to install it on the system python
brew install python@3.10
brew install pipenv
python3.10 -m pip install stanza
pipenv install
pipenv shell
rm /opt/homebrew/Cellar/python\@3*/**/EXTERNALLY-MANAGED
cd influx_api
cargo run
cd influx_ui
npm run dev
Method defaults to GET is unspecified
/
returns something random/settings
returns app settings as json/langs
returns list of languages in settings
/vocab
to work with vocabs/vocab/token/{lang_identifier}/{orthography}
to query for a single token?- POST
/vocab/create_token
to create a token - POST
/vocab/update_token
to update a token - DELETE
/vocab/delete_token
to update a token
docs
to work with docs/docs/{lang_identifier}
returns list of content, with metadata, for the language specified bylang_identifier
. Currently only supports markdown content./docs/{lang_identifier}/{filename}
returns a specific piece of content, with metadata, text, tokenised text, and results from querying vocabulary database