[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1390334.1390372acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Crosslingual location search

Published: 20 July 2008 Publication History

Abstract

Address geocoding, the process of finding the map location for a structured postal address, is a relatively well-studied problem. In this paper we consider the more general problem of crosslingual location search, where the queries are not limited to postal addresses, and the language and script used in the search query is different from the one in which the underlying data is stored. To the best of our knowledge, our system is the first crosslingual location search system that is able to geocode complex addresses. We use a statistical machine transliteration system to convert location names from the script of the query to that of the stored data. However, we show that it is not sufficient to simply feed the resulting transliterations into a monolingual geocoding system, as the ambiguity inherent in the conversion drastically expands the location search space and significantly lowers the quality of results. The strength of our approach lies in its integrated, end-to-end nature: we use abstraction and fuzzy search (in the text domain) to achieve maximum coverage despite transliteration ambiguities, while applying spatial constraints (in the geographic domain) to focus only on viable interpretations of the query. Our experiments with structured and unstructured queries in a set of diverse languages and scripts (Arabic, English, Hindi and Japanese) searching for locations in different regions of the world, show full crosslingual location search accuracy at levels comparable to that of commercial monolingual systems. We achieve these levels of performance using techniques that may be applied to crosslingual searches in any language/script, and over arbitrary spatial data.

References

[1]
Al-Onaizan, Y. and Knight K. Machine transliteration of names in Arabic text. In Proc. of ACL Workshop on Computational Approaches to Semitic Languages, 2002.
[2]
Chaudhary, S., Ganjam, K., Ganti, V., and Motwani, R. Robust and efficient fuzzy match for online data cleaning. In Proc. SIGMOD 2003.
[3]
Christen, P., Churches, T. and Willmore, A. A probabilistic geocoding system based on a national address file. In Proc. 3rd Australasian Data Mining Conf., 2004.
[4]
CLEF Forum. http://www.clef-campaign.org/.
[5]
Gargantini, I. An effective way to represent quadtrees. In Comm. of the ACM, 1982.
[6]
GeoCLEF. http://ir.shef.ac.uk/geoclef/.
[7]
Goldberg, D. W., Wilson, J. P., and Knoblock, C. A. From text to geographic coordindates: The current state of geocoding. In J. Urban and Regional Information Systems Assoc., 2006.
[8]
Joshi, T., Joy, J., and Sengar, V. Robust Location Search. Technical Report MSR-TR-2008-41, Microsoft Research, 2008.
[9]
Goto, I., Kato, N., Uratani, N. and Ehara, T. Transliteration considering context information based on the Maximum entropy method. In Proc. MT Summit IX, 2004.
[10]
Knight, K. and Graehl, J. Machine transliteration. Computational Linguistics, 24(4), 1998.
[11]
Kumaran, A. and Kellner, T., A generic framework for machine transliteration, In Proc. SIGIR, 2007.
[12]
Lin, Dekang. MaxEnt Classifier. 2003. http://www.cs.ualberta.ca/~lindek/maxent.tgz.
[13]
Oh,J., Choi, K. & Isahara, H. A comparison of different machine transliteration models. Artificial Intelligence Research, 2006.
[14]
Pouliquen, B., Steinberger, R., Ignat, C., and D. E. Groeve, T. Geographical information recognition and visualization in texts written in various languages. In Proc. 19th Annual ACM Sym. on Applied Computing, 2004.
[15]
Rhind, G.R. Global Sourcebook of Address Data Management A Guide to Address Formats and Data in 193 Countries. Gower Publishing Ltd, 2005.
[16]
Russell, R. Soundex. US Patent 1,261,167, 1918.
[17]
Sengar, V., Joshi, T., Joy, J., Prakash, S., and Toyama, K. Robust location search from text queries. In Proc. ACM GIS, 2007.
[18]
Viola, P. and Narasimhan, M. Learning to extract information from semi-structured text using a discriminative context free grammar. In Proc. SIGIR, 2005.

Cited By

View all
  • (2020)Is cross‐lingual readability assessment possible?Journal of the Association for Information Science and Technology10.1002/asi.2429371:6(644-656)Online publication date: 7-May-2020
  • (2019)Spatial Keyword Query of Region-Of-Interest Based on the Distributed Representation of Point-Of-InterestISPRS International Journal of Geo-Information10.3390/ijgi80602878:6(287)Online publication date: 20-Jun-2019
  • (2015)A new approach to geocodingProceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/2820783.2820827(1-10)Online publication date: 3-Nov-2015
  • Show More Cited By

Index Terms

  1. Crosslingual location search

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
    July 2008
    934 pages
    ISBN:9781605581644
    DOI:10.1145/1390334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. address geocoding
    2. crosslingual information retrieval
    3. location search

    Qualifiers

    • Research-article

    Conference

    SIGIR '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Is cross‐lingual readability assessment possible?Journal of the Association for Information Science and Technology10.1002/asi.2429371:6(644-656)Online publication date: 7-May-2020
    • (2019)Spatial Keyword Query of Region-Of-Interest Based on the Distributed Representation of Point-Of-InterestISPRS International Journal of Geo-Information10.3390/ijgi80602878:6(287)Online publication date: 20-Jun-2019
    • (2015)A new approach to geocodingProceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/2820783.2820827(1-10)Online publication date: 3-Nov-2015
    • (2013)Automatic gazetteer enrichment with user-geocoded dataProceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information10.1145/2534732.2534736(87-94)Online publication date: 5-Nov-2013
    • (2013)Map search via a factor graph modelProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505674(69-78)Online publication date: 27-Oct-2013
    • (2013)Improving Cross-Language Information Retrieval by Transliteration Mining and GenerationMultilingual Information Access in South Asian Languages10.1007/978-3-642-40087-2_29(310-333)Online publication date: 2013
    • (2009)Custom local searchProceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/1653771.1653835(424-427)Online publication date: 4-Nov-2009
    • (2009)"They Are Out There, If You Know Where to Look"Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval10.1007/978-3-642-00958-7_39(437-448)Online publication date: 18-Apr-2009

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media