Voice of Teochew is an open-source project dedicated to preserving and empowering the 潮州话 (Teochew/Chaozhou dialect) through speech technology.
潮州话之声 是一个开源项目,旨在通过语音技术保护和赋能潮州话这一主要依靠口语传承的方言。
Inspired by Meta AI’s work on speech-to-speech translation (S2ST) for unwritten languages, this initiative combines community-driven audio collection with cutting-edge AI models to build practical tools for recognition, translation, and revitalization.
受 Meta AI 针对无文字语言的语音到语音翻译(S2ST)项目启发,本计划结合社区音频收集与先进 AI 模型,打造识别、翻译和复兴潮州话的实用工具。
“A language lives when it’s spoken — and remembered.”
“语言因口耳相传而延续,因铭记而不朽。”
-
🎤 Collect native Chaozhou speech samples from diverse speakers
收集不同说话者的潮州话语音样本 -
📝 Transcribe and translate into Mandarin and/or English
将语音转写并翻译为普通话和/或英语 -
🤖 Train ASR and S2ST models using modern self-supervised techniques
使用现代自监督学习技术训练语音识别和语音翻译模型 -
📂 Open-source datasets, tools, and models for dialect AI research
开放方言语音数据集、工具与模型,推动学术研究 -
🧑🤝🧑 Build a global network of contributors and advocates
建立全球贡献者与支持者社区