8000 GitHub - endes0/videoToVoice: takes in a sequence of lip images, and predicts the phonemes being said.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

endes0/videoToVoice

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

videoToVoice

These files take in a sequence of lip images, and predict the phonemes being said.

Reimplemented in nim with some improvements.

Requeriments for executing

Requeriments for compiling

videoToVoice.nim includes all requerients and their version. By the way, install Arraymancer_vision with my patch.

TODOs

  • Test and train the CNNs
  • Depend less in external software
    • Change face_recognition for Dlib
    • Port textgrid-parser to nim
  • Optimize, lips identify crops is very slow (4 seconds for image)
  • Use a Markov chain or a RNN for better results
  • Instead of Convulutioning, pass directly the lips points from face_recognition to a simple neural network

About

takes in a sequence of lip images, and predicts the phonemes being said.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Nim 100.0%
0