[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2348283.2348379acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Mining the web for points of interest

Published: 12 August 2012 Publication History

Abstract

A point of interest (POI) is a focused geographic entity such as a landmark, a school, an historical building, or a business. Points of interest are the basis for most of the data supporting location-based applications. In this paper we propose to curate POIs from online sources by bootstrapping training data from Web snippets, seeded by POIs gathered from social media. This large corpus is used to train a sequential tagger to recognize mentions of POIs in text. Using Wikipedia data as the training data, we can identify POIs in free text with an accuracy that is 116% better than the state of the art POI identifier in terms of precision, and 50% better in terms of recall. We show that using Foursquare and Gowalla checkins as seeds to bootstrap training data from Web snippets, we can improve precision between 16% and 52%, and recall between 48% and 187% over the state-of-the-art. The name of a POI is not sufficient, as the POI must also be associated with a set of geographic coordinates. Our method increases the number of POIs that can be localized nearly three-fold, from 134 to 395 in a sample of 400, with a median localization accuracy of less than one kilometer.

References

[1]
http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html visited October 2011.
[2]
http://www.itl.nist.gov/iad/mig//tests/ace/ visited October 2011.
[3]
S. Ahern, M. Naaman, R. Nair, and J. Yang. World Explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In JCDL '07, 2007.
[4]
E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In SIGIR '04, pages 273--280, New York, NY, USA, 2004. ACM.
[5]
M. Collins. Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL), 2002.
[6]
D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web, pages 761--770. ACM, 2009.
[7]
J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In VLDB '00, pages 545--556, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[8]
J. Hoffart, F. Suchanek, K. Berberich, E. Kelham, G. de Melo, G. Weikum, F. Suchanek, G. Kasneci, M. Ramanath, and A. Pease. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Commun. ACM, 52(4):56--64, 2009.
[9]
L. Hollenstein and R. Purves. Exploring place through user-generated content: using Flickr to describe city cores. Journal of Spatial Information Science, (1), 2010.
[10]
L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of the 15th international conference on Multimedia, MULTIMEDIA '07, pages 631--640, New York, NY, USA, 2007. ACM.
[11]
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML), pages 282--289, 2001.
[12]
A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction, and web-enhanced lexicons. In Proceedings of CoNLL, 2003.
[13]
Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In WWW '06, 2006.
[14]
E. Moxley, J. Kleban, and B. S. Manjunath. Spirittagger: a geo-aware tag suggestion tool mined from flickr. In Proceeding of the 1st ACM international conference on Multimedia information retrieval, MIR '08, pages 24--30, New York, NY, USA, 2008. ACM.
[15]
L. Mummidi and J. Krumm. Discovering points of interest from users' map annotations. GeoJournal, 72:215--227, 2008.
[16]
N. O'Hare and V. Murdock. Modeling locations with social media. Journal of Information Retrieval, 2012.
[17]
N. Okazaki. Crfsuite: a fast implementation of conditional random fields (crfs), 2007.
[18]
D. Segal. "Closed, Says Google, but Shops' Signs Say Open". The New York Times, September 5, 2011.
[19]
P. Serdyukov, V. Murdock, and R. van Zwol. Placing Flickr Photos on a Map. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 484--491. ACM, 2009.
[20]
E. F. Tjong and K. Sang. Introduction to the conll-2002 shared task: Language-independent named entity recognition. In COLING-02 proceedings of the 6th Conference on Natural Language Learning, 2002.
[21]
E. F. Tjong, K. Sang, and F. de Meulder. Introduction to the conll- 2003 shared task. In CoNLL '03 Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 2003.
[22]
T. Vincenty. Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review, XXIII(176), April 1975.
[23]
H. Wallach. Conditional random fields: An introduction. Technical Report Technical report MS-CIS-04--21, University of Pennsylvania, 2004.
[24]
C. Wang, J. Wang, X. Xie, and W.-Y. Ma. Mining geographic knowledge using location aware topic model. In GIR '07, 2007.
[25]
X. Yi, H. Raghavan, and C. Leggetter. Discovering users' specific geo intention in web search. In WWW '09: Proceedings of the 18th International Conference on World Wide Web, pages 481--490, New York, NY, USA, 2009. ACM.
[26]
Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang. Geographical topic discovery and comparison. In Proceedings of the 20th International World Wide Web conference (WWW'11), 2011.
[27]
V. W. Zheng, Y. Zheng, X. Xie, and Q. Yan. Collaborative location and activity recommendation with gps history data. In Proceedings of the 19th International World Wide Web conference (WWW'10), 2010.
[28]
W. Zong, D. Wu, A. Sun, E.-P. Lim, and D. H.-L. Goh. On assigning place names to geography related web pages. In JCDL '05, pages 354--362, New York, NY, USA, 2005. ACM.

Cited By

View all
  • (2024)Housing prices and points of interest in three Polish citiesJournal of Housing and the Built Environment10.1007/s10901-024-10124-739:3(1509-1540)Online publication date: 15-May-2024
  • (2023)Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose CityMultimedia Tools and Applications10.1007/s11042-023-14862-882:22(34749-34770)Online publication date: 13-Mar-2023
  • (2022)Web Mining to Inform Locations of Charging Stations for Electric VehiclesCompanion Proceedings of the Web Conference 202210.1145/3487553.3524264(166-170)Online publication date: 25-Apr-2022
  • Show More Cited By

Index Terms

  1. Mining the web for points of interest

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
    August 2012
    1236 pages
    ISBN:9781450314725
    DOI:10.1145/2348283
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. geo-localisation
    2. geographic information extraction
    3. location-based applications
    4. points of interest

    Qualifiers

    • Research-article

    Conference

    SIGIR '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 30 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Housing prices and points of interest in three Polish citiesJournal of Housing and the Built Environment10.1007/s10901-024-10124-739:3(1509-1540)Online publication date: 15-May-2024
    • (2023)Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose CityMultimedia Tools and Applications10.1007/s11042-023-14862-882:22(34749-34770)Online publication date: 13-Mar-2023
    • (2022)Web Mining to Inform Locations of Charging Stations for Electric VehiclesCompanion Proceedings of the Web Conference 202210.1145/3487553.3524264(166-170)Online publication date: 25-Apr-2022
    • (2022)Leveraging Textual Descriptions for House Price ValuationIntelligent Systems10.1007/978-3-031-21686-2_25(355-369)Online publication date: 19-Nov-2022
    • (2021)Spatial Layout and Coupling of Urban Cultural Relics: Analyzing Historical Sites and Commercial Facilities in District III of ShaoxingSustainability10.3390/su1312687713:12(6877)Online publication date: 18-Jun-2021
    • (2021)Conflation of Geospatial POI Data and Ground-level Imagery via Link Prediction on Joint Semantic GraphProceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery10.1145/3486635.3491068(5-8)Online publication date: 2-Nov-2021
    • (2021)GEDITProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481924(4135-4144)Online publication date: 26-Oct-2021
    • (2021)POI and Future Visitors Recommendation2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT51525.2021.9579767(1-4)Online publication date: 6-Jul-2021
    • (2021)A Latent Customer Flow Model for Interpretable Predictions of Check-In Counts2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671946(529-539)Online publication date: 15-Dec-2021
    • (2020)On the Construction of Web NER Model Training Tool based on Distant SupervisionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/342281719:6(1-28)Online publication date: 15-Nov-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media