-
soundraw.top
- Beijing
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
8000🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
a curated list of speech datasets (110+ datasets, 75+ easy to download)
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
Code to accompany "A Method for Animating Children's Drawings of the Human Figure"
🎨 A powerful multi-end drawing board that brings together a lot of creative brushes to experience a whole new range of drawing effects!
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Fine tuning the UnifiedVoice autoregressor for TortoiseTTS.
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
✨✨Latest Advances on Multimodal Large Language Models
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
The PyTorch-based audio source separation toolkit for researchers
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
SALMONN: Speech Audio Language Music Open Neural Network
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation me…
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
High-Resolution Image Synthesis with Latent Diffusion Models