[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

The MoveOn database: motorcycle environment speech and noise database for command and control applications

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The MoveOn speech and noise database was purposely designed and implemented in support of research on spoken dialogue interaction in a motorcycle environment. The distinctiveness of the MoveOn database results from the requirements of the application domain—an information support and operational command and control system for the two-wheel police force—and also from the specifics of the adverse open-air acoustic environment. In this article, we first outline the target application, motivating the database design and purpose, and then report on the implementation details. The main challenges related to the choice of equipment, the organization of recording sessions, and some difficulties that were experienced during this effort, are discussed. We offer a detailed account of the database statistics, the suggested data splits in subsets, and discuss results from automatic speech recognition experiments which illustrate the degree of complexity of the operational environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://showcase.m0ve0n.net/.

  2. http://www.zoom.co.jp.

  3. http://www.akg.com.

  4. http://www.torkworld.com/tork_max.html.

  5. http://www.alan-electronics.de.

  6. http://www.cs.cmu.edu/afs/cs.cmu.edu/user/lenzo/html/areas/t2p/.

  7. In earlier work (Winkler et al. 2008), published before the completion of the speech annotations, we estimated the amount of speech based on the speaker tier, i.e. including pauses at the beginning and end of each utterance, leading to a higher number of hours compared to the more precise number here.

  8. http://htk.eng.cam.ac.uk/.

  9. http://www.elra.info/.

References

  • Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Networks, 18(4), 437–444.

    Article  Google Scholar 

  • Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot international, 5(9/10), 341–345.

    Google Scholar 

  • Bohus, D., Raux, A., Harris, T. K., Eskenazi, M., & Rudnicky, A. I. (2007). Olympus: An open-source framework for conversational spoken language interface research. In: Bridging the Gap: Academic and Industrial Research in Dialog Technology workshop at HLT/NAACL.

  • Bohus, D., & Rudnicky, A. I. (2003). RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda. In: Proceedings Eurospeech 2003 (pp. 597–600).

  • Gong, Y. (1995). Speech recognition in noisy environments: A survey. Journal Speech Communication, 16(3), 261–291.

    Article  Google Scholar 

  • Junqua, J. C., Fincke, S., & Field, K. (1999). The Lombard effect: A reflex to better communicate with others in noise. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 2083–2086).

  • Kaiser, M., Mögele, H., & Schiel, F. (2006). Bikers accessing the web: The SmartWeb Motorbike Corpus. In: Proceedings LREC 2006 (pp. 1628–1631).

  • Kalapanidas, E., Davarakis, C., Nani, M., Winkler, T., Ganchev, T., Kocsis, O., et al. (2008). MoveON: A multimodal information management application for police motorcyclists. In: Proceedings System Demonstrations of the 18th European Conference on Artificial Intelligence.

  • Kawaguchi, N., Matsubara, S., Kajita, H., Iwa, S., Takeda, K., Itakura, F., & Inagaki, Y. (2000). Construction of speech corpus in moving car environment. In: Proceedings ICSLP 2000 (pp. 362–365).

  • Lee, B., Hasegawa-Johnson, M., Goudeseune, C., Kamdar, S., Borys, S., Liu, M., & Huang, T. (2004a). AVICAR: Audio-visual speech corpus in a car environment. In: Proceedings ICSLP 2004 (pp. 2489–2492).

  • Lee, Y. J., Kim, B. W., Kim, Y. I., Choi, D. L., Lee, K. H., & Um, Y. (2004b). Creation and assessment of Korean speech and noise DB in Car Environment. In: Proceedings LREC 2004 (pp. 1403–1406).

  • Moreno, A., Lindberg, B., Draxler, C., Richard, G., Choukri, K., Euler, S., & Allen, J. (2000). SPEECHDAT-CAR: A large speech database for automotive environments. In: Proceedings LREC 2000.

  • Schiel, F., & Draxler, C. (2003). Production and validation of speech corpora. Munich: Bastard Verlag.

    Google Scholar 

  • Van den Heuvel, H. (1999). Validation criteria, INCO-COP-977017.

  • Van den Heuvel, H. (2000). Slr validation: evaluation of the speechdat approach. In: Proceedings LREC 2000 Satellite workshop XLDB—Very large Telephone Speech Databases.

  • Van den Heuvel, H. (2001). The art of validation. ELRA Newsletter, 5(4), 4–6.

    Google Scholar 

  • Van den Heuvel, H., Boves, L., Moreno, A., Omologo, M., Richard, G., & Sanders, E. (2001). Annotation in the speechdat projects. International Journal of Speech Technology, 4, 127–143.

    Article  Google Scholar 

  • Wakao, A., Takeda, K., & Itakura, F. (1996). Variability of Lombard effects under different noise conditions. In: Proceedings of Fourth International Conference on Spoken Language (pp. 2009–2012).

  • Wells, J. (1997). Standards, Assessment, and methods: Phonetic Alphabets. London: University College.

    Google Scholar 

  • Wheatley, S. J., & Ascham, S. R. (1998). SpeechDat English database for the fixed telephone network, Technical Report.

  • Whissell, C. (1989). The dictionary of Affect in Language. Plutchik, R., Kellerman, H. (eds.) Emotion: Theory, research and experience, vol. 4, Academic Press, New York.

  • Winkler, T., Kostoulas, T., Adderley, R., Bonkowski, C., Ganchev, T., Köhler, J., & Fakotakis, N. (2008). The MoveOn Motorcycle Speech Corpus. In: Proceedings LREC 2008 (pp. 2201–2205).

Download references

Acknowledgments

This work was supported by the FP6 MoveOn project (IST-2005-034753), which was co-funded by the European Commission. The authors would like to acknowledge the significant effort that Dr. Rick Adderley from A ESolutions (BI) invested in the recruitment of professional police officers and in the supervision of the data recording campaign. Furthermore, the authors would like to thank Patrick Seidler and Mr. Ali Khan from University of Reading as well as Mr. Christian Bonkowski from the Fraunhofer Institute for Intelligent Analysis and Information Systems, who performed major parts of the annotation of the speech and noise tiers of the database. Sincere thanks also to University of Reading, Systema Technologies S.A. and the whole MoveOn project team for supporting the development of the database by detailed definitions and discussions of the project requirements, as well as all other colleagues who directly or indirectly contributed to the successful implementation of the MoveOn speech and noise database.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theodoros Kostoulas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kostoulas, T., Winkler, T., Ganchev, T. et al. The MoveOn database: motorcycle environment speech and noise database for command and control applications. Lang Resources & Evaluation 47, 539–563 (2013). https://doi.org/10.1007/s10579-013-9222-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-013-9222-7

Keywords

Navigation