Computer Science > Computation and Language

arXiv:2301.06736 (cs)

[Submitted on 17 Jan 2023]

Title:Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam

Authors:Kavya Manohar, A. R. Jayan, Rajeev Rajan

View PDF

Abstract:In a hybrid automatic speech recognition (ASR) system, a pronunciation lexicon (PL) and a language model (LM) are essential to correctly retrieve spoken word sequences. Being a morphologically complex language, the vocabulary of Malayalam is so huge and it is impossible to build a PL and an LM that cover all diverse word forms. Usage of subword tokens to build PL and LM, and combining them to form words after decoding, enables the recovery of many out of vocabulary words. In this work we investigate the impact of using syllables as subword tokens instead of words in Malayalam ASR, and evaluate the relative improvement in lexicon size, model memory requirement and word error rate.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2301.06736 [cs.CL]
	(or arXiv:2301.06736v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2301.06736

Submission history

From: Kavya Manohar [view email]
[v1] Tue, 17 Jan 2023 07:29:47 UTC (450 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2023-01

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators