More Web Proxy on the site http://driver.im/

research-article

Monolingual and Crosslingual SMS-based FAQ Retrieval

Author:

Johannes LevelingAuthors Info & Claims

FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

Article No.: 3, Pages 1 - 6

https://doi.org/10.1145/2701336.2701634

Published: 04 December 2013 Publication History

Abstract

This paper presents results for DCU's second participation in the SMS-based FAQ Retrieval task at FIRE. For FIRE 2012, we submitted runs for the monolingual English and Hindi and the crosslingual English to Hindi subtasks. Compared to our experiments for FIRE 2011, our system was simplified by using a single retrieval engine (instead of three) and using a single approach for detection of out of domain queries (instead of three). In our approach, the SMS queries are transformed into a normalized, corrected form and submitted to a retrieval engine to obtain a ranked list of FAQ results. A classifier trained on features extracted from the training data then determines which queries are out of domain and which are not. For our crosslingual English to Hindi experiments, we trained a statistical machine translation system for Hindi to English translation to translate the full Hindi FAQ documents into English. The retrieval then operates on the corrected English input and retrieves results from the translated Hindi FAQ documents.

Our best experiments achieved an MRR of 0.949 for the monolingual English subtask, 0.880 for the monolingual Hindi subtask, and 0.450 for the crosslingual subtask.

References

[1]

O. Bojar, C. Buck, C. Federmann, B. Haddow, P. Koen, J. Leveling, C. Monz, P. Pecina, M. Post, H. Saint-Amand, R. Soricut, L. Specia, and A. Tamchyna. Findings of the 2014 workshop on statistical machine translation. In WMT 2014, 2014.

[2]

O. Bojar, V. Diatka, P. Rychlý, P. Straňák, A. Tamchyna, and D. Zeman. Hindi-English and Hindi-only Corpus for Machine Translation. In Proceedings of the Ninth International Language Resources and Evaluation Conference (LREC'14), Reykjavik, Iceland, May 2014. ELRA, European Language Resources Association.

[3]

W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg memory based learner, version 6.2, reference guide. Technical Report 09-01, ILK, 2004.

[4]

R. Haque, S. K. Naskar, J. van Genabith, and A. Way. Experiments on domain adaptation for English-Hindi SMT. In O. Kwong, editor, PACLIC, pages 670--677. City University of Hong Kong Press, 2009.

[5]

D. Hogan, J. Leveling, H. Wang, P. Ferguson, and C. Gurrin. DCU@FIRE 2011: SMS-based FAQ retrieval. In FIRE 2011, 3rd Workshop of the Forum for Information Retrieval Evaluation, 2-4 December, IIT Bombay, pages 34--42, 2011.

[6]

R. Kneser and H. Ney. Improved backing-off for m-gram language modeling. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASP), volume 1, pages 181--184. IEEE Computer Society Press, 1995.

[7]

P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open source toolkit for statistical machine translation. In J. A. Carroll, A. van den Bosch, and A. Zaenen, editors, ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, June 23--30, 2007, Prague, Czech Republic. ACL, 2007.

Digital Library

[8]

P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. In NAACL 03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pages 48--54. ACL, 2003.

Digital Library

[9]

J. Leveling. On the effect of stopword removal for sms-based faq retrieval. In G. Bouma, A. Ittoo, E. Métais, and H. Wortmann, editors, Natural Language Processing and Information Systems - 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012, Groningen, The Netherlands, June 26-28, 2012. Proceedings, volume 7337 of LNCS, pages 128--139. Springer, 2012.

Digital Library

[10]

J. Leveling and G. J. F. Jones. Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR. ACM Transactions on Asian Language Information Processing (TALIP), 9(3), September 2010.

Digital Library

[11]

F. J. Och. Minimum error rate training in statistical machine translation. In E. W. Hinrichs and D. Roth, editors, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 7-12 July 2003, Sapporo Convention Center, Sapporo, Japan, pages 160--167. ACL, 2003.

Digital Library

[12]

M. Post, C. Callison-Burch, and M. Osborne. Constructing parallel corpora for six indian languages via crowdsourcing. In Seventh Workshop on Statistical Machine Translation, pages 401--409, Montréal, Canada, June 2012. ACL.

Digital Library

[13]

S. E. Robertson, S. Walker, S. Jones, M. M. H. Beaulieu, and M. Gatford. Okapi at TREC-3. In D. K. Harman, editor, Overview of the Third Text Retrieval Conference (TREC-3), pages 109--126, Gaithersburg, MD, USA, 1995. National Institute of Standards and Technology (NIST).

[14]

S. Venkatapathy. NLP tools contest -- 2008: Summary. In ICON 2008 NLP Tools Contest. Pune India, 2008.

[15]

O. F. Zaidan and C. Callison-Burch. Crowdsourcing translation: Professional quality from non-professionals. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1220--1229, Portland, Oregon, USA, June 2011. ACL.

Digital Library

Cited By

Sumikawa YFujiyoshi MHatakeyama HNagai M(2019)Supporting Creation of FAQ Dataset for E-Learning ChatbotIntelligent Decision Technologies 201910.1007/978-981-13-8311-3_1(3-13)Online publication date: 17-Jul-2019
https://doi.org/10.1007/978-981-13-8311-3_1

Index Terms

Monolingual and Crosslingual SMS-based FAQ Retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

SMS based FAQ Retrieval for Hindi, English and Malayalam
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

This paper presents our approach for the SMS-based FAQ Retrieval monolingual task in FIRE 2012 and FIRE 2013. Current approach predicts the matching of an SMS and FAQs more accurately as compared to our previous solution for this task which was ...
Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval
ICIA-16: Proceedings of the International Conference on Informatics and Analytics

Todays, The internet has become a source of multi-lingual content. Users are not aware of multiple languages, so the language diversity becomes a great barrier for world communication. Cross-Language Information Retrieval (CLIR) provides a solution for ...
High-performance FAQ retrieval using an automatic clustering method of query logs

To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users' queries into predefined FAQ categories. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

December 2013

105 pages

ISBN:9781450328302

DOI:10.1145/2701336

Editors:
Prasenjit Majumder
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Mandar Mitra
Indian Statistical Institute, Kolkata, India
,
Madhulika Agrawal
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Parth Mehta
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Indian Statistical Institute, Kolkata: Indian Statistical Institute, Kolkata
Google India: Google India
SIGIR: ACM Special Interest Group on Information Retrieval
Microsoft Research: Microsoft Research
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

FIRE '13

FIRE '13: Forum for Information Retrieval Evaluation

December 4 - 6, 2013

New Delhi, India

Acceptance Rates

Overall Acceptance Rate 19 of 64 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
68
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sumikawa YFujiyoshi MHatakeyama HNagai M(2019)Supporting Creation of FAQ Dataset for E-Learning ChatbotIntelligent Decision Technologies 201910.1007/978-981-13-8311-3_1(3-13)Online publication date: 17-Jul-2019
https://doi.org/10.1007/978-981-13-8311-3_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten