[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2701336.2701634acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

Monolingual and Crosslingual SMS-based FAQ Retrieval

Published: 04 December 2013 Publication History

Abstract

This paper presents results for DCU's second participation in the SMS-based FAQ Retrieval task at FIRE. For FIRE 2012, we submitted runs for the monolingual English and Hindi and the crosslingual English to Hindi subtasks. Compared to our experiments for FIRE 2011, our system was simplified by using a single retrieval engine (instead of three) and using a single approach for detection of out of domain queries (instead of three). In our approach, the SMS queries are transformed into a normalized, corrected form and submitted to a retrieval engine to obtain a ranked list of FAQ results. A classifier trained on features extracted from the training data then determines which queries are out of domain and which are not. For our crosslingual English to Hindi experiments, we trained a statistical machine translation system for Hindi to English translation to translate the full Hindi FAQ documents into English. The retrieval then operates on the corrected English input and retrieves results from the translated Hindi FAQ documents.
Our best experiments achieved an MRR of 0.949 for the monolingual English subtask, 0.880 for the monolingual Hindi subtask, and 0.450 for the crosslingual subtask.

References

[1]
O. Bojar, C. Buck, C. Federmann, B. Haddow, P. Koen, J. Leveling, C. Monz, P. Pecina, M. Post, H. Saint-Amand, R. Soricut, L. Specia, and A. Tamchyna. Findings of the 2014 workshop on statistical machine translation. In WMT 2014, 2014.
[2]
O. Bojar, V. Diatka, P. Rychlý, P. Straňák, A. Tamchyna, and D. Zeman. Hindi-English and Hindi-only Corpus for Machine Translation. In Proceedings of the Ninth International Language Resources and Evaluation Conference (LREC'14), Reykjavik, Iceland, May 2014. ELRA, European Language Resources Association.
[3]
W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg memory based learner, version 6.2, reference guide. Technical Report 09-01, ILK, 2004.
[4]
R. Haque, S. K. Naskar, J. van Genabith, and A. Way. Experiments on domain adaptation for English-Hindi SMT. In O. Kwong, editor, PACLIC, pages 670--677. City University of Hong Kong Press, 2009.
[5]
D. Hogan, J. Leveling, H. Wang, P. Ferguson, and C. Gurrin. DCU@FIRE 2011: SMS-based FAQ retrieval. In FIRE 2011, 3rd Workshop of the Forum for Information Retrieval Evaluation, 2-4 December, IIT Bombay, pages 34--42, 2011.
[6]
R. Kneser and H. Ney. Improved backing-off for m-gram language modeling. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASP), volume 1, pages 181--184. IEEE Computer Society Press, 1995.
[7]
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open source toolkit for statistical machine translation. In J. A. Carroll, A. van den Bosch, and A. Zaenen, editors, ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, June 23--30, 2007, Prague, Czech Republic. ACL, 2007.
[8]
P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. In NAACL 03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pages 48--54. ACL, 2003.
[9]
J. Leveling. On the effect of stopword removal for sms-based faq retrieval. In G. Bouma, A. Ittoo, E. Métais, and H. Wortmann, editors, Natural Language Processing and Information Systems - 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012, Groningen, The Netherlands, June 26-28, 2012. Proceedings, volume 7337 of LNCS, pages 128--139. Springer, 2012.
[10]
J. Leveling and G. J. F. Jones. Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR. ACM Transactions on Asian Language Information Processing (TALIP), 9(3), September 2010.
[11]
F. J. Och. Minimum error rate training in statistical machine translation. In E. W. Hinrichs and D. Roth, editors, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 7-12 July 2003, Sapporo Convention Center, Sapporo, Japan, pages 160--167. ACL, 2003.
[12]
M. Post, C. Callison-Burch, and M. Osborne. Constructing parallel corpora for six indian languages via crowdsourcing. In Seventh Workshop on Statistical Machine Translation, pages 401--409, Montréal, Canada, June 2012. ACL.
[13]
S. E. Robertson, S. Walker, S. Jones, M. M. H. Beaulieu, and M. Gatford. Okapi at TREC-3. In D. K. Harman, editor, Overview of the Third Text Retrieval Conference (TREC-3), pages 109--126, Gaithersburg, MD, USA, 1995. National Institute of Standards and Technology (NIST).
[14]
S. Venkatapathy. NLP tools contest -- 2008: Summary. In ICON 2008 NLP Tools Contest. Pune India, 2008.
[15]
O. F. Zaidan and C. Callison-Burch. Crowdsourcing translation: Professional quality from non-professionals. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1220--1229, Portland, Oregon, USA, June 2011. ACL.

Cited By

View all
  • (2019)Supporting Creation of FAQ Dataset for E-Learning ChatbotIntelligent Decision Technologies 201910.1007/978-981-13-8311-3_1(3-13)Online publication date: 17-Jul-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation
December 2013
105 pages
ISBN:9781450328302
DOI:10.1145/2701336
  • Editors:
  • Prasenjit Majumder,
  • Mandar Mitra,
  • Madhulika Agrawal,
  • Parth Mehta
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CLIR
  2. FAQ retrieval
  3. SMS normalization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

FIRE '13
FIRE '13: Forum for Information Retrieval Evaluation
December 4 - 6, 2013
New Delhi, India

Acceptance Rates

Overall Acceptance Rate 19 of 64 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Supporting Creation of FAQ Dataset for E-Learning ChatbotIntelligent Decision Technologies 201910.1007/978-981-13-8311-3_1(3-13)Online publication date: 17-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media