[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Real-time Assistive Reader Pen for Arabic Language

Published: 31 March 2021 Publication History

Abstract

Disability is an impairment affecting an individual's livelihood and independence. Assistive technology enables the disabled cohort of the community to break the barriers to learning, access information, contribute to the community, and live independently. This article proposes an assistive device to enable people with visual disabilities and learning disabilities to access printed Arabic material in real-time, and to help them participate in the education system and the professional workforce.
This proposed assistive device employs Optical Character Recognition (OCR) and Text To Speech (TTS) conversion, using concatenation synthesis. OCR is achieved using image processing, character extraction, and classification, while Arabic speech synthesis is achieved through concatenation synthesis, followed by Multi Band Re-synthesis Overlap-Add (MBROLA). Waveform generation in the second phase produces vocal output for the disabled user to hear. OCR character and word accuracy tests were conducted for nine Arabic fonts. The results show that six fonts were recognized with over 60% character accuracy and two fonts were recognized with over 88% accuracy. A Mean Opinion Score (MOS) test for speech quality was conducted. The results showed an overall MOS score of 3.53/5 and indicated that users were able to understand the speech. A real-time usability testing was conducted with 10 subjects. The results showed an overall average of agreements scores of 3.9/5 and indicated that the proposed Arabic reader pen meets the real-time constraints and is pleasant and satisfying to use and can contribute to make printed Arabic material accessible to visually impaired persons and people with learning disabilities.

References

[1]
World Health Organization. 2001. International classification of functioning, disability and health ICF. World Health Organization.
[2]
World Health Organization. 2014. Fact sheet no. 352, 2014.
[3]
World Health Organization. 2014. Visual impairment and blindness fact sheet N 282. World Health Organization 2014.
[4]
J. Taylor. 2018. Educating students with visual impairments for inclusion in society. Amer. Found. Blind, 2000. Retrieved from http://www.afb.org/info/teachers/inclusive-education/35.
[5]
T. Cavanaugh. 2002. The need for assistive technology in educational technology. AACE Rev. 10, 1 (2002), 27--31.
[6]
J. Allen. 1979. MITalk-79: The 1979 MIT text-to-speech system. J. Acoust. Soc. Amer. 65, S1 (1979).
[7]
N. N. Akhlagi, F. Lonn, and P. Wittrup. 2003. Reading pen. United States of America Patent 6, 509 893, 21 2003.
[8]
K. C. Ray and A. Rawoof. 2014. ARM based implementation of text-to-speech (TTS) for real time embedded system. In International Conference on Signal and Image Processing (ICSIP’14).
[9]
S. A. Sanaki and B. B. S. 2015. Embedded based implementation of real time text-to-speech conversion. Int. J. Res. 2, 8 (2015), 339--345.
[10]
M. Hamad and M. Hussain. 2011. Arabic text-to-speech synthesizer. In IEEE Student Conference on Research and Development (SCOReD’11).
[11]
P. K. Bamini. 2003. FPGA-based Implementation of Concatenative Speech Synthesis Algorithm 2003.
[12]
H. Tora, İ. B. Uslu, and T. Karameh. 2017. Implementation of Turkish text-to-speech synthesis on a voice synthesizer card with prosodic features. Anadolu Univ. J. Sci. Technol. A- Appl. Sci. Eng. 18, 3 (2017).
[13]
RC Systems. 2006. DoubleTalk RC8660, 23 Mar 2006. Retrieved on December 2020 from https://www.rcsys.com/Downloads/rc8660.pdf.
[14]
A. Chabchoub and A. Cherif. 2011. High quality Arabic concatenative speech synthesis. Sig. Image Proc. Int. J. 2 (2011).
[15]
A. W. Black. 2002. Perfect synthesis for all of the people all of the time. In IEEE Workshop on Speech Synthesis.
[16]
J. Bachan and M. Tokarski. 2017. Creation and evaluation of MaryTTS speech synthesis for polish. In Language and Technology Conference.
[17]
K. P. Sarathy and A. G. Ramakrishnan. 2008. Text to speech synthesis system for mobile applications. In Workshop in Image and Signal Processing (WISP’08).
[18]
E. Vanitha, P. K. Kasarla, and E. Kuamarswamy. 2015. Implementation of text-to-speech for real time embedded system using Raspberry Pi processor. Int. J. Mag. Eng. Technol. Manag. Res. 2, 7 (2015).
[19]
I. Rebai and Y. BenAyed. 2016. Arabic speech synthesis and diacritic recognition. Int. J. Speech Technol. 19, 3 (2016), 485--494.
[20]
D. Frontini and M. Malcangi. 2006. Neural network-based speech synthesis. In DSP Application Day.
[21]
K. Lakshmi and T. C. S. Rao. 2016. Design and implementation of text to speech conversion using Raspberry Pi. Int. J. Innov. Technol. Res. 4, 6 (2016).
[22]
P. Fogarassy-Neszly and C. Pribeanu. 2016. Multilingual text-to-speech software component for dynamic language identification and voice switching. In International Conference on Human-computer Interaction.
[23]
Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and R. A. Saurous. 2017. Tacotron: Towards end-to-end speech synthesis. In Interspeech. 4006--4010.
[24]
Yu Zhang, Ron Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, and Bhuvana Ramabhadran. 2019. Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning. In Interspeech. 2080--2084. Retrieved from 10.21437/Interspeech.2019-2668.
[25]
B. Phil, S. Polansky, D. Repetto, M. Roberts, and D. Rockmore. 2011. Music and computers: A theoretical and historical approach. Preface to the Archival Version.
[26]
S. Lukose and S. S. Upadhya. 2017. Text to speech synthesizer-formant synthesis. In International Conference on Nascent Technologies in Engineering (ICNTE’17).
[27]
G. Toussaint. 1983. Solving geometric problems with the rotating calipers. In IEEE MELECON’83.
[28]
M. I. Shamos. 1978. Computational Geometry, Yale University.
[29]
R. Smith. 2007. An overview of the tesseract OCR engine. In 9th International Conference on Document Analysis and Recognition (ICDAR’07). 629--633. Retrieved from 10.1109/ICDAR.2007.4376991.
[30]
S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neur. Comput. J. 9 (1997).
[31]
T. Zerrouki. 2014. Mishkal diacritiser. Retrieved from https://github.com/linuxscout/mishkal.
[32]
F. A. Gers, J. Schmidhuber, and F. Cummins. 1999. Learning to forget: Continual prediction with LSTM. Neural Comput. 12 (1999).
[33]
S. H. Al-Ani. 2014. Arabic Phonology: An Acoustical and Physiological Investigation. Walter de Gruyter.
[34]
Faculte Polytechnique de Mons - TCTS lab. 1998. MBROLA voices project at Github. Retrieved from https://github.com/numediart/MBROLA-voices/tree/master/data/ar2.
[35]
T. Dutoit, V. Pagel, N. Pierret, and F. Bataille. 1996. The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes. In International Conference on Spoken Language Processing (ICSLP’96).
[36]
M. H. Hayes. 1998. Schaum's Outline of Digital Signal Processing. McGraw-Hill.
[37]
N. Health. 2018. Raspberry Pi Zero W: The smart person's guide. TechRepublic, 2018. Retrieved from https://techrepublic.com/article/raspberry-pi-zero-wireless-the-smart-persons-guide/.
[38]
M. Gibbs. 2018. Ten operating systems for the Raspberry Pi. Netw. World 3 Nov. (2014). Retrieved from https://networkworld.com/article/2842678/computers/ten-operating-systems-for-the-raspberry-pi.html.
[39]
Raspi TV. 2017. How much power does pi zero w use? Retrieved from http://raspi.tv/2017/how-much-power-does-pi-zero-w-use.
[40]
Adafriut. Adafriut POWERBOOST 500 CHARGER. Retrieved on December 2020 from https://adafruit.com/product/1944.
[41]
F. E. A. Slimane. 2009. A new Arabic printed text image database and evaluation protocols. In 10th International Conference on Document Analysis and Recognition (ICDAR’09).
[42]
S. Saber, A. Ahmed, A. Elsisi, and M. Hadhoud. 2016. Performance evaluation of Arabic optical. In International Conference on Advanced Intelligent Systems and Informatics (AISI’15).
[43]
V. Grancharov and W. Kleijn. 2008. Speech quality assessment. In Springer Handbook of Speech Processing, Berlin, Springer, 83--100.
[44]
W. B. Kleijn and K. K. Paliwal. 1995. Quality evaluation of synthesized speech. In Speech Coding and Synthesis, Elsevier Science Inc., 709--734.
[45]
International Telecommunication Union. 1996. Recommendation P.800, ITU, 1996. Retrieved on December 2020 from https://www.itu.int/rec/T-REC-P.800-199608-I.
[46]
eSpeak. 2020. eSpeak NG Text-To-Speech. GitHub, Inc. Retrieved from https://github.com/espeak-ng/espeak-ng.
[47]
M. A. Alzubaidi and M. Otoom. 2018. Discussion-facilitator: towards enabling students with hearing disabilities to participate in classroom discussions. Int. J. Technol. Enhanc. Learn. 10, 1--2 (2018), 73--90.
[48]
M. Otoom and M. A. Alzubaidi. 2018. Ambient intelligence framework for real-time speech-to-sign translation. Assist. Technol. 27, 30 (2018), 119--132.
[49]
M. Otoom, M. A. Alzubaidi, and R. Aloufee. 2020. Novel navigation assistive device for deaf drivers. Assist. Technol. 2020 10 (2020), 1--1.
[50]
T. Zerrouki, M. M. A. Shquier, A. Balla, N. Bousbia, I. Sakraoui, and F. Boudardara. 2019. Adapting eSpeak to Arabic language: Converting arabic text to speech language using eSpeak. Int. J. Reas.-based Intell. Syst. 11, 1 (2019), 76--89.
[51]
Imene Zangar, Zied Mnasri, Vincent Colotte, Denis Jouvet, and Amal Houidhek. 2018. Duration modeling using DNN for Arabic speech synthesis. In 9th International Conference on Speech Prosody.
[52]
O. Zine and A. Meziane. 2017. Novel approach for quality enhancement of Arabic text to speech synthesis. In International Conference on Advanced Technologies for Signal and Image Processing (ATSIP’17). IEEE, 1--6.
[53]
O. Zine, A. Meziane, and M. Boudchiche. 2017. Towards a high-quality lemma-based text to speech system for the Arabic language. In International Conference on Arabic Language Processing. Springer, Cham, 53--66.
[54]
Amrouche Aissa, Leila Falek, and Hocine Teffahi. 2017. Design and implementation of a diacritic Arabic text-to-speech system. Int. Arab J. Inf. Technol. 14, 4 (2017).
[55]
Abdelali Ahmed, Mohammed Attia, Younes Samih, Kareem Darwish, and Hamdy Mubarak. 2018. Diacritization of Maghrebi Arabic sub-dialects. ArXiv Preprint arXiv:1810.06619 (2018).
[56]
S. Abed, M. Alshayeji, and S. Sultan. 2019. Diacritics effect on Arabic speech recognition. Arab. J. Sci. Eng. 44, 11 (2019), 9043--9056.
[57]
K. Darwish, H. Mubarak, and A. Abdelali. 2017. Arabic diacritization: Stats, rules, and hacks. In 3rd Arabic Natural Language Processing Workshop. 9--17.
[58]
R. Abdelmalek and Z. Mnasri. 2016. High quality Arabic text-to-speech synthesis using unit selection. In 13th International Multi-conference on Systems, Signals & Devices (SSD’16). IEEE, 1--5.
[59]
A. Alsaif, N. Albadrani, A. Alamro, and R. Alsaif. 2017. Towards intelligent Arabic text-to-speech application for disabled people. In International Conference on Informatics, Health & Technology (ICIHT’17). IEEE, 1--6.
[60]
O. Abdo, S. M. Abdou, and M. Fashal. 2017. Building audio-visual phonetically annotated Arabic corpus for expressive text to speech. In INTERSPEECH, 3767--3771.
[61]
I. H. Ali, Z. Mnasri, and Z. Laachri. 2019. Gemination prediction using DNN for Arabic text-to-speech synthesis. In 16th International Multi-conference on Systems, Signals & Devices (SSD’19). IEEE, 366--370.
[62]
Z. Oumaima, M. Abdelouafi, and M. El Hadi. 2018. Text-to-speech technology for Arabic language learners. In IEEE 5th International Congress on Information Science and Technology (CiSt’18). IEEE, 366--370.
[63]
I. Rebai and Y. BenAyed. 2016. Arabic speech synthesis and diacritic recognition. Int. J. Speech Technol. 19, 3 (2016), 485--494.
[64]
F. Fahmy, M. Khalil, and H. Abbas. 2020. A transfer learning end-to-end arabictext-to-speech (TTS) deep architecture. arXiv preprint arXiv:2007.11541 (2020).
[65]
H. A. Elharati, M. Alshaari, and V. Z. Këpuska. 2020. Arabic speech recognition system based on MFCC and HMMs. J. Comput. Commun. 8, 03 (2020) 28.
[66]
H. Bouressace and J. Csirik. 2019. A convolutional neural network for Arabic document analysis. In IEEE International Symposium on Signal Processing and Information Technology (ISSPIT’19). IEEE, 1--6.
[67]
M. Eltay, A. Zidouri, and I. Ahmad. 2020. Exploring deep learning approaches to recognize handwritten Arabic texts. IEEE Access 8 (2020), 89882--89898.
[68]
A. Arora, C. C. Chang, B. Rekabdar, B. BabaAli, D. Povey, D. Etter, D. Raj, H. Hadian, J. Trmal, P. Garcia, and S. Watanabe. 2019. Using ASR methods for OCR. In International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, 663--668.
[69]
H. Mohamad, S. A. Hashim, and A. H. Al-Saleh. 2019. Recognize printed Arabic letter using new geometrical features. Indon. J. Electr. Eng. Comput. Sci. 14, 3 (2019), 1518--1524.
[70]
K. Mohammad, A. Qaroush, M. Ayesh, M. Washha, A. Alsadeh, and S. Agaian. 2019. Contour-based character segmentation for printed Arabic text with diacritics. J. Electron. Imag. 28, 4 (2019), 043030.
[71]
M. E. Mustafa and M. K. Elbashir. 2020. A deep learning approach for handwritten Arabic names recognition. Int. J. Adv. Comput. Sci. Applic. 11, 1 (2020).
[72]
A. Qaroush, B. Jaber, K. Mohammad, M. Washaha, E. Maali, and N. Nayef. 2019. An efficient, font independent word and character segmentation algorithm for printed Arabic text. J. King Saud Univ.-Comput. Inf. Sci.
[73]
I. S. Al-Sheikh, M. Mohd, and L. Warlina. 2020. A review of arabic text recognition dataset. Asia-Pac. J. Inf. Technol. Multimedia 9, 1 (2020), 69--81.
[74]
T. Milo and A. G. Martínez. 2019. A new strategy for Arabic OCR: Archigraphemes, letter blocks, script grammar, and shape synthesis. In 3rd International Conference on Digital Access to Textual Cultural Heritage (DATeCH’19). Association for Computing Machinery, New York, NY, 93--96. 2019.
[75]
S. M. Darwish and K. O. Elzoghaly. 2020. An enhanced offline printed Arabic OCR model based on bio-inspired fuzzy classifier. IEEE Access 8 (2020), 117770--117781.
[76]
M. Kadi and M. Nasri. 2019. Isolated Arabic characters recognition using a robust method against noise and scaling based on the «hough transform». Int. J. Inf. Sci. Technol. 3, 4 (2019), 34--43.
[77]
W. N. Hussein and H. N. Hussain. 2019. A design of a hybrid algorithm for optical character recognition of online hand-written Arabic alphabets. Iraqi J. Sci. 60, 9 (2019), 2067--2079.
[78]
M. W. Ok and K. Rao. 2017. Using a digital pen to support secondary students with learning disabilities. Interv. School Clin. 53, 1 (2017), 36--43.
[79]
Wizcomtech. 2020. The freedom to read. Retrieved from https://www.wizcomtech.com.
[80]
C-Pen. 2020. The original pen scanner brand. Retrieved from https://cpen.com/.
[81]
IRISPen. 2020. The digital highlighter that types what you scan! Retrieved from https://www.irislink.com/EN-JO/c1708/IRISPen-Air-7—Portable-Digital-Highlighter.aspx.
[82]
WorldPenScan X. Entry & Translation Retrieved on December 2020 from http://www.penpowerinc.com/product.asp?sn=735.
[83]
Livescribe. A pen for every occasion. Retrieved on December 2020 from https://us.livescribe.com/collections/smartpens.
[84]
K. C. Huang, C. K. Sun, D. Y. Huang, Y. C. Chen, R. C. Chang, S. W. Hsu, C. Y. Yang, and B. Y. Chen. 2020. Glissade: Generating balance shifting feedback to facilitate auxiliary digital pen input. In CHI Conference on Human Factors in Computing Systems, 1--13.
[85]
C. M. Chen, J. Y. Wang, and M. Lin. 2019. Enhancement of English learning performance by using an attention-based diagnosing and review mechanism in paper-based learning context with digital pen support. Univ. Access Inf. Soc. 18, 1 (2019), 141--153.
[86]
C. M. Chen, C. C. Tan, and B. J. Lo. 2016. Facilitating English-language learners’ oral reading fluency with digital pen technology. Interact. Learn. Environ. 24, 1 (2016), 96--118.
[87]
C. C. Tan, C. M. Chen, and H. M. Lee. 2020. Effectiveness of a digital pen-based learning system with a reward mechanism to improve learners’ metacognitive strategies in listening. Comput. Assist. Lang. Learning. 33, 7 (2020), 1--26.
[88]
N. Choi, S. Kang, and J. Sheo. 2020. Children's interest in learning English through picture books in an EFL context: The effects of parent--child interaction and digital pen use. Educ. Sci. 10, 2 (2020), 40.
[89]
P. Krish. 2020. The use of the audio pen in enhancing reading skills among preschool children. Int. J. Inf. Educ. Technol. 10, 5 (2020).

Cited By

View all
  • (2023)Adaptive Learning and Correlative Assessment of Differential Usage Patterns for Students with-or-without Learning Disabilities via Learning AnalyticsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/363236522:12(1-25)Online publication date: 17-Nov-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 1
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers
January 2021
332 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3439335
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2021
Accepted: 01 September 2020
Revised: 01 August 2020
Received: 01 March 2020
Published in TALLIP Volume 20, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arabic language
  2. assistive embedded systems
  3. optical character recognition
  4. reader pen
  5. real-time systems
  6. text-to-speech

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Adaptive Learning and Correlative Assessment of Differential Usage Patterns for Students with-or-without Learning Disabilities via Learning AnalyticsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/363236522:12(1-25)Online publication date: 17-Nov-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media