Sarawak Malay

This is a Sarawak 6C3D Malay conversation data for the purpose of speech technology research. At the moment, this is an experimental data and currently used for investigating speaker diarization. The data was collected by Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak.

The data consists of 38 conversations that have been transcribed using Transcriber (see TextGrid folder), where each file contains two speakers. Each conversation was recorded by different individuals using microphones from mobile devices or laptops thus, different file formats were collected from the data collectors. All data was then standardized to mono, 16000Khz, wav format.

We provide files:

wav
rttm
textgrid

This data was experimented for a speaker diarizarion task, where it was used for evaluating our speaker diarization models. Our work was presented at the recent IALP 2023 in Singapore.

Cite our work when this data is used:

@INPROCEEDINGS{
10337314,
author={Rahim, Mohd Zulhafiz and Juan, Sarah Samson and Mohamad, Fitri Suraya},
booktitle={2023 International Conference on Asian Language Processing (IALP)},
title={Improving Speaker Diarization for Low-Resourced Sarawak Malay Language Conversational Speech Corpus},
year={2023},
pages={228-233},
keywords={Training;Oral communication;Data models;Usability;Speech processing;Testing;Speaker diarization;x-vectors;clustering;low-resource;auto-labeling;pseudo-labeling;unsupervised},
doi={10.1109/IALP61005.2023.10337314}}

For further details:

Sarah Samson Juan sjsflora@unimas.my

Mohd Zulhafiz bin Rahim mzhafiz1999@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
DiarDataset		DiarDataset
Pyannote Diarize		Pyannote Diarize
TextGrid		TextGrid
enrich_audio		enrich_audio
rttm		rttm
wav		wav
LICENSE		LICENSE
LICENSE.html		LICENSE.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Sarawak Malay

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Licenses found

sarahjuan/sarawakmalay

Folders and files

Latest commit

History

Repository files navigation

Sarawak Malay

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages