[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/544220.544259acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Harvesting translingual vocabulary mappings for multilingual digital libraries

Published: 14 July 2002 Publication History

Abstract

This paper presents a method of information harvesting and consolidation to support the multilingual information requirements for cross-language information retrieval within digital library systems. We describe a way to create both customized bilingual dictionaries and multilingual query mappings from a source language to many target languages. We will describe a multilingual conceptual mapping resource with broad coverage (over 100 written languages can be supported) that is truly multilingual as opposed to bilingual parings usually derived from machine translation. This resource is derived from the 10+ million title online library catalog of the University of California. It is created statistically via maximum likelihood associations from word and phrases in book titles of many languages to human assigned subject headings in English. The 150,000 subject headings can form interlingua mappings between pairs of languages or from one language to several languages. While our current demonstration prototype maps between ten languages (English, Arabic, Chinese, French, German, Italian, Japanese, Portuguese, Russian, Spanish), extensions to additional languages are straightforward. We also describe how this resource is being expanded for languages where linguistic coverage is limited in our initial database, by automatically harvesting new information from international online library catalogs using the Z39.50 networked library search protocol.

References

[1]
ANSI/NISO. Information Retrieval (Z39.50): Application Service Definition and Protocol Specification (ANSI/NISO Z39.50-1995). American National Standards Institute (also available from the Library of Congress, Z39.50 Maintenance Agency at http://lcweb.loc.gov/z3950/agency), Washington, D.C., 1995]]
[2]
R. K. Barry, editor. ALA-LC romanization tables : transliteration schemes for non-Roman scripts approved by the Library of Congress and the American Library Association. Library of Congress, Washington, 1997]]
[3]
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19:263--312, June 1993]]
[4]
A. Chen, J. He, L. Xu, F. Gey, and J. Meggs. Chinese text retrieval without using a dictionary. In A. D. N. Nicholas J. Belkin and P. Willett, editors, Proceedings of the 20th Annual International ACM SIGIR Conference on Rese arch and Development in Information Retrieval, Philadelphia, pages 42--49, 1997]]
[5]
T. Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19:61--74, March 1993]]
[6]
F. Gey, M. Buckland, A. Chen, and R. Larson. Entry vocabulary -- a technology to enhance digital search. In Proceedings of HLT2001, First International Conference on Human Language Technology San Diego, pages 91--95, March 2001]]
[7]
D. Hiemstra, W. Kraaij, R. Pohlmann, and T. Westerveld. Translation resources, merging strategies, and relevance feedback for cross-language information retrieval. In Cross Language Information Retrieval and Evaluation: Workshop of the Cross-Language Evaluation Forum, CLEF 2000, Lisbon, Portugal, pages 102--115. Springer, 2001]]
[8]
R. R. Larson. Classification clustering, probabilistic information retrieval, and the online catalog. Library Quarterly, 61(2):133--173, 1991]]
[9]
R. R. Larson. Experiments in automatic library of congress classification. Journal of the American Society for Information Science, 43(2):130--148, 1992]]
[10]
R. R. Larson. Distributed resource discovery: Using Z39.50 to build cross-domain information servers. In JCDL '01, June 24-28, 2001, Roanoke, Virginia., pages 52--53. ACM, 2001]]
[11]
M. Littman, S. Dumais, and T. Landauer. Automatic cross-language information retrieval using latent semantic indexing. In G. Grefenstette, editor, Cross Language Information Retrieval, pages 51--62. Kluwer, 1998]]
[12]
J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 15-19, 1999, Berkeley, CA, USA, pages 74--81. ACM, 1999]]
[13]
C. Peters, editor. Cross Language Information Retrieval and Evaluation: Proceedings of the CLEF 2000 Workshop. Springer Computer Scinece Series LNCS 2069, 2001]]
[14]
C. Peters, editor. Working Notes of the CLEF 2001 Workshop 3 September, Darmstadt, Germany. DELOS Network of Excellence on Digital Libraries Workshop Serise, September 2001]]
[15]
P. Resnik. Parallel stands: A preliminary investigation into mining parallel text from the web for bilingual text. AMTA, 1998]]
[16]
J. Xu, A. Frazier, and R. Weischedel. Trec 2001 cross-lingual retrieval at bbn. In E. Voorhees and D. K. Harman, editors, Notebook Proceedings of the TREC 2001 Conference, pages 122--131, November 2001]]

Cited By

View all
  • (2014)Multilingual Digital Libraries: A review of issues in system-centered and user-centered studies, information retrieval and user behaviorInternational Information & Library Review10.1080/10572317.2013.1076636745:1-2(3-19)Online publication date: 8-Jan-2014
  • (2012)Multilinguality in the digital libraryThe Electronic Library10.1108/0264047121122131330:2(165-181)Online publication date: 6-Apr-2012
  • (2011)Exploiting multi-agent platform for indirect alignment between multilingual ontologiesExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.10.05538:5(5774-5780)Online publication date: 1-May-2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
July 2002
448 pages
ISBN:1581135130
DOI:10.1145/544220
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 July 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. controlled vocabularies
  2. cross-lingual information retrieval
  3. entry vocabulary indexes

Qualifiers

  • Article

Conference

JCDL02
Sponsor:
JCDL02: Joint Conference on Digital Libraries 2002
July 14 - 18, 2002
Oregon, Portland, USA

Acceptance Rates

JCDL '02 Paper Acceptance Rate 69 of 240 submissions, 29%;
Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Multilingual Digital Libraries: A review of issues in system-centered and user-centered studies, information retrieval and user behaviorInternational Information & Library Review10.1080/10572317.2013.1076636745:1-2(3-19)Online publication date: 8-Jan-2014
  • (2012)Multilinguality in the digital libraryThe Electronic Library10.1108/0264047121122131330:2(165-181)Online publication date: 6-Apr-2012
  • (2011)Exploiting multi-agent platform for indirect alignment between multilingual ontologiesExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.10.05538:5(5774-5780)Online publication date: 1-May-2011
  • (2010)Transferring structural markup across translations using multilingual alignment and projectionProceedings of the 10th annual joint conference on Digital libraries10.1145/1816123.1816126(11-20)Online publication date: 21-Jun-2010
  • (2010)Information access across languages on the web: From search engines to digital librariesProceedings of the American Society for Information Science and Technology10.1002/meet.2009.145046027846:1(1-14)Online publication date: 18-Nov-2010
  • (2009)Indirect Alignment between Multilingual OntologiesProceedings of the Third KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications10.1007/978-3-642-01665-3_24(233-241)Online publication date: 30-May-2009
  • (2008)Query Classification and Expansion for Translation Mining Via Search EnginesProceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence10.1007/978-3-540-89197-0_117(1121-1126)Online publication date: 15-Dec-2008
  • (2006)Multi-lingual detection of terrorist content on the webProceedings of the 2006 international conference on Intelligence and Security Informatics10.1007/11734628_3(16-30)Online publication date: 9-Apr-2006
  • (2006)Exploiting the Web as the multilingual corpus for unknown query translationJournal of the American Society for Information Science and Technology10.1002/asi.2032857:5(660-670)Online publication date: Feb-2006
  • (2004)Translating unknown cross-lingual queries in digital libraries using a web-based approachProceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries10.1145/996350.996378(108-116)Online publication date: 7-Jun-2004
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media