Abstract
The MoveOn speech and noise database was purposely designed and implemented in support of research on spoken dialogue interaction in a motorcycle environment. The distinctiveness of the MoveOn database results from the requirements of the application domain—an information support and operational command and control system for the two-wheel police force—and also from the specifics of the adverse open-air acoustic environment. In this article, we first outline the target application, motivating the database design and purpose, and then report on the implementation details. The main challenges related to the choice of equipment, the organization of recording sessions, and some difficulties that were experienced during this effort, are discussed. We offer a detailed account of the database statistics, the suggested data splits in subsets, and discuss results from automatic speech recognition experiments which illustrate the degree of complexity of the operational environment.
Similar content being viewed by others
Notes
In earlier work (Winkler et al. 2008), published before the completion of the speech annotations, we estimated the amount of speech based on the speaker tier, i.e. including pauses at the beginning and end of each utterance, leading to a higher number of hours compared to the more precise number here.
References
Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Networks, 18(4), 437–444.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot international, 5(9/10), 341–345.
Bohus, D., Raux, A., Harris, T. K., Eskenazi, M., & Rudnicky, A. I. (2007). Olympus: An open-source framework for conversational spoken language interface research. In: Bridging the Gap: Academic and Industrial Research in Dialog Technology workshop at HLT/NAACL.
Bohus, D., & Rudnicky, A. I. (2003). RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda. In: Proceedings Eurospeech 2003 (pp. 597–600).
Gong, Y. (1995). Speech recognition in noisy environments: A survey. Journal Speech Communication, 16(3), 261–291.
Junqua, J. C., Fincke, S., & Field, K. (1999). The Lombard effect: A reflex to better communicate with others in noise. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 2083–2086).
Kaiser, M., Mögele, H., & Schiel, F. (2006). Bikers accessing the web: The SmartWeb Motorbike Corpus. In: Proceedings LREC 2006 (pp. 1628–1631).
Kalapanidas, E., Davarakis, C., Nani, M., Winkler, T., Ganchev, T., Kocsis, O., et al. (2008). MoveON: A multimodal information management application for police motorcyclists. In: Proceedings System Demonstrations of the 18th European Conference on Artificial Intelligence.
Kawaguchi, N., Matsubara, S., Kajita, H., Iwa, S., Takeda, K., Itakura, F., & Inagaki, Y. (2000). Construction of speech corpus in moving car environment. In: Proceedings ICSLP 2000 (pp. 362–365).
Lee, B., Hasegawa-Johnson, M., Goudeseune, C., Kamdar, S., Borys, S., Liu, M., & Huang, T. (2004a). AVICAR: Audio-visual speech corpus in a car environment. In: Proceedings ICSLP 2004 (pp. 2489–2492).
Lee, Y. J., Kim, B. W., Kim, Y. I., Choi, D. L., Lee, K. H., & Um, Y. (2004b). Creation and assessment of Korean speech and noise DB in Car Environment. In: Proceedings LREC 2004 (pp. 1403–1406).
Moreno, A., Lindberg, B., Draxler, C., Richard, G., Choukri, K., Euler, S., & Allen, J. (2000). SPEECHDAT-CAR: A large speech database for automotive environments. In: Proceedings LREC 2000.
Schiel, F., & Draxler, C. (2003). Production and validation of speech corpora. Munich: Bastard Verlag.
Van den Heuvel, H. (1999). Validation criteria, INCO-COP-977017.
Van den Heuvel, H. (2000). Slr validation: evaluation of the speechdat approach. In: Proceedings LREC 2000 Satellite workshop XLDB—Very large Telephone Speech Databases.
Van den Heuvel, H. (2001). The art of validation. ELRA Newsletter, 5(4), 4–6.
Van den Heuvel, H., Boves, L., Moreno, A., Omologo, M., Richard, G., & Sanders, E. (2001). Annotation in the speechdat projects. International Journal of Speech Technology, 4, 127–143.
Wakao, A., Takeda, K., & Itakura, F. (1996). Variability of Lombard effects under different noise conditions. In: Proceedings of Fourth International Conference on Spoken Language (pp. 2009–2012).
Wells, J. (1997). Standards, Assessment, and methods: Phonetic Alphabets. London: University College.
Wheatley, S. J., & Ascham, S. R. (1998). SpeechDat English database for the fixed telephone network, Technical Report.
Whissell, C. (1989). The dictionary of Affect in Language. Plutchik, R., Kellerman, H. (eds.) Emotion: Theory, research and experience, vol. 4, Academic Press, New York.
Winkler, T., Kostoulas, T., Adderley, R., Bonkowski, C., Ganchev, T., Köhler, J., & Fakotakis, N. (2008). The MoveOn Motorcycle Speech Corpus. In: Proceedings LREC 2008 (pp. 2201–2205).
Acknowledgments
This work was supported by the FP6 MoveOn project (IST-2005-034753), which was co-funded by the European Commission. The authors would like to acknowledge the significant effort that Dr. Rick Adderley from A ESolutions (BI) invested in the recruitment of professional police officers and in the supervision of the data recording campaign. Furthermore, the authors would like to thank Patrick Seidler and Mr. Ali Khan from University of Reading as well as Mr. Christian Bonkowski from the Fraunhofer Institute for Intelligent Analysis and Information Systems, who performed major parts of the annotation of the speech and noise tiers of the database. Sincere thanks also to University of Reading, Systema Technologies S.A. and the whole MoveOn project team for supporting the development of the database by detailed definitions and discussions of the project requirements, as well as all other colleagues who directly or indirectly contributed to the successful implementation of the MoveOn speech and noise database.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kostoulas, T., Winkler, T., Ganchev, T. et al. The MoveOn database: motorcycle environment speech and noise database for command and control applications. Lang Resources & Evaluation 47, 539–563 (2013). https://doi.org/10.1007/s10579-013-9222-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-013-9222-7