-
Academia Sinica
- Taipei, Taiwan
-
12:02
(UTC +08:00) - blueburnband
Highlights
- Pro
Stars
SALMONN: Speech Audio Language Music Open Neural Network
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
DSPy: The framework for programming—not prompting—language models
ACE-Step: A Step Towards Music Generation Foundation Model
Use any LLMs (Large Language Models) for Deep Research.
Codes for ISMIR 2022 paper: Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention
Chordify Annotator Subjectivity Dataset - A chord-Label harmony dataset with multiple reference annotations per song
This repo develops and compares Deep Learning algorithms to recognize music emotion.
This repository collects information about different data sets for Music Emotion Recognition.
MIDI, WAV domain music emotion recognition [ISMIR 2021]
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Self-supervised key estimation model that matches performance with supervised state-of-the-art model.
Extension of the music21 library for working with music chords encoded according to the Harte Notation.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
Pre-trained models for ISMIR 2019 Paper Large-Vocabulary Chord Transcription via Chord Structure Decomposition
Beat annotations for the beat tracker Beat This!
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms
An invisible desktop application to help you pass your technical interviews.
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
The official repository for "Piano score rearrangement into multiple difficulty levels via notation-to-notation approach" incl. ST+ tokenizer / detokenizer
LLaQo, a Large Language Query-based Coach in the domain of expressive performance
Encode and decode audio samples to/from compressed latent representations!
Self-supervised learning for fast pitch estimation
Code release for "LLMs can see and hear without any training"