-
proneval Public
Forked from KoelLabs/MLKoel Labs innovates real-time pronunciation feedback for language learners! This repo contains the ML training, evaluation, and data processing code
Jupyter Notebook GNU Affero General Public License v3.0 UpdatedJan 14, 2025 -
watermark-detection Public
Forked from boomb0om/watermark-detectionModel for watermark classification implemented with PyTorch
Jupyter Notebook UpdatedSep 19, 2024 -
LookOnceToHear Public
Forked from vb000/LookOnceToHearA novel human-interaction method for real-time speech extraction on headphones.
Python Other UpdatedMay 10, 2024 -
The Whisper Hindi ASR (Automatic Speech Recognition) model utilizes the KathBath dataset, a comprehensive collection of speech samples in Hindi. Trained on this dataset, Whisper employs advanced de…
Jupyter Notebook Eclipse Public License 2.0 UpdatedApr 23, 2024 -
supervoice-dataset Public
Forked from ex3ndr/supervoice-librilight-preprocessed60k hours of phoneme-aligned audio from audio books
Python UpdatedApr 12, 2024 -
Amphion Public
Forked from open-mmlab/AmphionAmphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Python MIT License UpdatedNov 28, 2023 -
ConsistencyVC-voive-conversion Public
Forked from ConsistencyVC/ConsistencyVC-voive-conversionUsing joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Python MIT License UpdatedOct 16, 2023 -
MLnotebook Public
Forked from udlbook/udlbookUnderstanding Deep Learning - Simon J.D. Prince
Jupyter Notebook Other UpdatedOct 13, 2023 -
PhonoQ Public
Forked from TAriasVergara/PhonoQPhonoQ is a deep learning model used to compute phonetic-based features related to duration, rate, rhythm*, and goodness of pronunciation* of 18 phonological classes
Python MIT License UpdatedAug 31, 2023 -
VoskIdentification Public
Forked from virex-84/VoskIdentificationТестовый пример задействования модели для идентификации голоса с помощью библиотеки распознавания речи "Vosk" (Воск): https://alphacephei.com/vosk/
Java UpdatedAug 14, 2023 -
FastSAM Public
Forked from CASIA-IVA-Lab/FastSAMFast Segment Anything
Python Apache License 2.0 UpdatedJul 30, 2023 -
Real-time-wake-word-detection Public
Forked from matron2017/Real-time-wake-word-detectionSpoken wake-word detection for conversational avatar
Jupyter Notebook UpdatedJan 31, 2023 -
vall-e Public
Forked from enhuiz/vall-eAn unofficial PyTorch implementation of the audio LM VALL-E, WIP
Python MIT License UpdatedJan 17, 2023 -
StyleTTS Public
Forked from yl4579/StyleTTSOfficial Implementation of StyleTTS
Python MIT License UpdatedJan 9, 2023 -
recurrent-interface-network-pytorch Public
Forked from lucidrains/recurrent-interface-network-pytorchImplementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch
Python MIT License UpdatedJan 8, 2023 -
langdetect Public
langauge detection algorithm that can be expandable to add any number of languages
Python Apache License 2.0 UpdatedJan 5, 2023 -
langchain Public
Forked from langchain-ai/langchain⚡ Building applications with LLMs through composability ⚡
Python MIT License UpdatedJan 3, 2023 -
self-supervised-phone-segmentation Public
Forked from lstrgar/self-supervised-phone-segmentationPhoneme segmentation using pre-trained speech models
Python GNU General Public License v3.0 UpdatedNov 4, 2022 -
Deep-Learning-in-Production Public
Forked from ahkarami/Deep-Learning-in-ProductionIn this repository, I will share some useful notes and references about deploying deep learning-based models in production.
UpdatedOct 14, 2022 -
you-only-hear-once Public
Forked from satvik-venkatesh/you-only-hear-onceJupyter Notebook MIT License UpdatedOct 13, 2022 -
-
ULCA-asr-dataset-corpus Public
Forked from Open-Speech-EkStep/ULCA-asr-dataset-corpusCreative Commons Attribution 4.0 International UpdatedSep 6, 2021 -
pifuhd Public
Forked from facebookresearch/pifuhdHigh-Resolution 3D Human Digitization from A Single Image.
Python Other UpdatedNov 8, 2020 -
transformer-cnn-emotion-recognition Public
Forked from IliaZenkov/transformer-cnn-emotion-recognitionSpeech Emotion Classification with novel Parallel CNN-Transformer model built with PyTorch, plus thorough explanations of CNNs, Transformers, and everything in between
-
conv-emotion Public
Forked from declare-lab/conv-emotionThis repo contains implementation of different architectures for emotion recognition in conversations
Python MIT License UpdatedFeb 5, 2020 -
ddsp Public
Forked from magenta/ddspDDSP: Differentiable Digital Signal Processing
Python Apache License 2.0 UpdatedJan 16, 2020 -
whisper-to-normal-speech-conversion Public
Forked from Maitreyapatel/speech-conversion-between-different-modalitiesWhisper-to-Normal Speech Conversion Using Generative Adversarial Networks
Python MIT License UpdatedJan 2, 2020 -
Nepali-Ai-Anchor Public
Forked from kshitijsubedi/Nepali-Ai-AnchorNepali AI Anchor Using LSTM & Pix2Pix. [ Itonics Hackathon 2019]
Python UpdatedDec 15, 2019 -
melgan-neurips Public
Forked from descriptinc/melgan-neuripsGAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
Python MIT License UpdatedOct 26, 2019 -
Resemblyzer Public
Forked from resemble-ai/ResemblyzerA python package to analyze and compare voices with deep learning
Python Apache License 2.0 UpdatedOct 23, 2019