videoToVoice

These files take in a sequence of lip images, and predict the phonemes being said.

Reimplemented in nim with some improvements.

Requeriments for executing

videoToVoice.nim includes all requerients and their version. By the way, install Arraymancer_vision with my patch.

Test and train the CNNs
Depend less in external software
- Change face_recognition for Dlib
- Port textgrid-parser to nim
Optimize, lips identify crops is very slow (4 seconds for image)
Use a Markov chain or a RNN for better results
Instead of Convulutioning, pass directly the lips points from face_recognition to a simple neural network

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
phonemes		phonemes
src		src
LICENSE		LICENSE
README.md		README.md
nim.cfg		nim.cfg
videoToVoice.nimble		videoToVoice.nimble