[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1822258.1822277acmotherconferencesArticle/Chapter ViewAbstractPublication PageswikisymConference Proceedingsconference-collections
research-article

wikiBABEL: community creation of multilingual data

Published: 08 September 2008 Publication History

Abstract

In this paper, we present a collaborative framework -- wikiBABEL -- for the efficient and effective creation of multilingual content by a community of users. The wikiBABEL framework leverages the availability of fairly stable content in a source language (typically, English) and a reasonable and not necessarily perfect machine translation system between the source language and a given target language, to create the rough initial content in the target language that is published in a collaborative platform. The platform provides an intuitive user interface and a set of linguistic tools for collaborative correction of the rough content by a community of users, aiding creation of clean content in the target language. We describe the architectural components implementing the wikiBABEL framework, namely, the systems for source and target language content management, mechanisms for coordination and collaboration and intuitive user interface for multilingual editing and review. Importantly, we discuss the integrated linguistic resources and tools, such as, bilingual dictionaries, machine translation and transliteration systems, etc., to help the users during the content correction and creation process. In addition, we analyze and present the prime factors -- user-interface features or linguistic tools and resources -- that significantly influence the user experiences in multilingual content creation.
In addition to the creation of multilingual content, another significant motivation for the wikiBABEL framework is the creation of parallel corpora as a by-product. Parallel linguistic corpora are very valuable resources for both Statistical Machine Translation (SMT) and Crosslingual Information Retrieval (CLIR) research, and may be mined effectively from multilingual data with significant content overlap, as may be created in the wikiBABEL framework. Creation of parallel corpora by professional translators is very expensive, and hence the SMT and CLIR research have been largely confined to a handful of languages. Our attempt to engage the large and diverse Internet user population may aid creation of such linguistic resources economically, and may make computational linguistics research possible and practical in many languages of the world.

References

[1]
Ahn, L. V. and Dabbish, L. Labeling images with a computer game. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 319--326, 2004. http://doi.acm.org/10.1145/161468.161471.
[2]
British National Corpus. http://www.natcorp.ox.ac.uk/.
[3]
Everything Development Company. http://www.Everything2.com.
[4]
Canadian Hansard Parallel Corpus. http://www.isi.edu/natural-language/download/hansard/.
[5]
Désilets, A. Translation Wikified: How will massive online collaboration impact the world of translation?. Keynote in Translating and the Computer. 2007. http://www.aslib.com/conferences/TranslationWikified.pdf.
[6]
Désilets, A., Gonzalez, S., Paquet, S. and Stojanovic, M. Translation the Wiki Way. Proceedings of the 2006 international symposium on Wikis. pp. 19--32. 2006.
[7]
ESPGame. http://www.espgame.org/.
[8]
EuroParl Parallel Corpus V3. 2007. http://www.statmt.org/europarl/.
[9]
Google. http://www.google.com/.
[10]
Harmon, D. Meeting of the MINDS: Future directions for human language technology. Report of the MINDS workshop. Retrieved on December 12, 2007. http://www.itl.nist.gov/iad/894.02/MINDS/FINAL/exec.summary.pdf.
[11]
Koehn, P. EuroParl: A parallel corpus for statistical machine translation. Proceedings of the MT Summit IX. pp. 79--86. 2005.
[12]
Linguistic Data Consortium. http://www.ldc.upenn.edu.
[13]
Lizzywiki. http://lizzy.iit.nrc.ca.
[14]
Manning, C. and Schutze, H. Foundations of statistical natural language processing. 1999. MIT Press.
[15]
MoulinWiki. http://www.moulinwiki.org.
[16]
MSDN. http://msdn.microsoft.com.
[17]
MSDNwiki. http://msdnwiki.microsoft.com.
[18]
Munteanu, D. and Marcu, D. Improving the machine translation performance by exploiting non-parallel corpora. Computational Linguistics. 31(4):477--504. 2005.
[19]
Nenkova, A., Passonneau, R. and McKeown, K. The Pyramid Method: Incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing. 4(2). 2007.
[20]
Omegawiki. http://www.omegawiki.org.
[21]
Papineni, S, Roukos, S., Ward, T. and Zhu, W. J. BLEU: A method for automatic evaluation of machine translation. TRRC22176, IBM Research, 2001.
[22]
Quirk, C., Udupa, R. U. and Menezes, A. 2007. Generative models of noisy translations with applications to parallel fragment extraction. Proceedings of MT Summit XI, pp. 337--384, 2007.
[23]
Rainie, L. and Tancer, B. Pew/Internet: Pew Internet and American Life Project. 2007. (Retrieved on May 1, 2008) http://www.pewinternet.org/pdfs/PIP_Wikipedia07.pdf.
[24]
Rajya Sabha of the Parliament of India. http://rajyasabha.nic.in/.
[25]
Swartz, A. Raw thought: Who writes Wikipedia?. 2006. (Retrieved on May 2, 2008) http://www.aaronsw.com/weblog/whowriteswikipedia.
[26]
TraduWiki. http://www.traduwiki.org.
[27]
Verbosity. http://www.gwap.com/gwap/gamesPreview/Verbosity.
[28]
Wales, J. Internet encyclopedias go head to head. Nature 438. pp. 900--901. 2005.
[29]
Wiki Translation. http://www.wiki-translation.com.
[30]
Wikipedia. http://www.wikipedia.org.
[31]
Wikipedia Featured Articles. http://en.wikipedia.org/wiki/Wikipedia:Featured_articles.
[32]
Wikipedia Reliability. http://en.wikipedia.org/wiki/Reliability_of_Wikipedia.
[33]
Wikipedia Statistics. http://stats.wikimedia.org/EN/Sitemap.htm.
[34]
Wikipedia Translation. http://meta.wikimedia.org/wiki/Translation.
[35]
Wiktionary. http://www.wiktionary.org.
[36]
XinHua News Chinese-English Parallel Corpus. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T09.

Cited By

View all
  • (2017)Crowdsourcing and Online Collaborative TranslationsundefinedOnline publication date: 9-Mar-2017
  • (2014)VidWikiProceedings of the 17th ACM conference on Computer supported cooperative work & social computing10.1145/2531602.2531670(1167-1175)Online publication date: 15-Feb-2014
  • (2014)Quality evaluation in community post-editingMachine Translation10.1007/s10590-014-9160-128:3-4(237-262)Online publication date: 1-Dec-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WikiSym '08: Proceedings of the 4th International Symposium on Wikis
September 2008
219 pages
ISBN:9781605581286
DOI:10.1145/1822258
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • University of Porto

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human aided machine translation
  2. linguistic data creation
  3. multilingual content creation
  4. multilingual wiki
  5. user-centered design

Qualifiers

  • Research-article

Conference

WikiSym08
Sponsor:
WikiSym08: 2008 International Symposium on Wikis
September 8 - 10, 2008
Porto, Portugal

Acceptance Rates

Overall Acceptance Rate 69 of 145 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Crowdsourcing and Online Collaborative TranslationsundefinedOnline publication date: 9-Mar-2017
  • (2014)VidWikiProceedings of the 17th ACM conference on Computer supported cooperative work & social computing10.1145/2531602.2531670(1167-1175)Online publication date: 15-Feb-2014
  • (2014)Quality evaluation in community post-editingMachine Translation10.1007/s10590-014-9160-128:3-4(237-262)Online publication date: 1-Dec-2014
  • (2013)Tracking Inconsistencies in Parallel Multilingual DocumentsProceedings of the 2013 International Conference on Culture and Computing10.1109/CultureComputing.2013.11(15-20)Online publication date: 16-Sep-2013
  • (2011)Multi-Language Discussion Platform for Wikipedia TranslationThe Language Grid10.1007/978-3-642-21178-2_15(231-244)Online publication date: 16-Jul-2011
  • (2010)Enabling monolingual translatorsHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858077(537-545)Online publication date: 2-Jun-2010
  • (2010)Real-time content translation framework for interactive public display systems2010 International Conference on User Science and Engineering (i-USEr)10.1109/IUSER.2010.5716771(307-310)Online publication date: Dec-2010
  • (2010)A process study of computer-aided translationMachine Translation10.1007/s10590-010-9076-323:4(241-263)Online publication date: 8-Jul-2010
  • (2009)WikiBABELProceedings of the ACL-IJCNLP 2009 Software Demonstrations10.5555/1667872.1667880(29-32)Online publication date: 3-Aug-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media