Abstract
Points of Interest (POIs) are crucial data sources for location based applications. Social media and traditional web can include up-to-date POI information that is emerging in the real world and shared by users and organizations. They have been demonstrated as the potential data sources for enriching the existing POI databases and online map services. Commonly, a POI is associated with a street for navigation, accessing, indexing and searching. The association between a POI and a street is present in the POI address. This paper proposes a novel approach for automatically constructing POI address lists at streets of a city from geo-tagged social media photos and web data. The proposed method can yield POI addresses that are missing on Google Maps, OpenStreetMap or Wikimapia. As a result, it is potentially applied for enriching POI data and enhancing online digital map services. In our approach, we first specify the relation between a POI name discovered from geo-tagged photos and related streets, candidate addresses; and then we utilize this relation to mine the POI address from web snippets by a search engine. We present a case study of San Jose City, California, USA. The analysis results have demonstrated the effectiveness of the proposed method, providing a promising solution for automatically constructing POI address lists at city streets from geo-tagged social media photos and web data.
Similar content being viewed by others
References
Ahlers D (2013) Business entity retrieval and data provision for yellow pages by local search. In: proceedings of IRPS workshop@ ECIR2013
Ahlers D, Boll S (2008) Retrieving address-based locations from the web. In: Proceedings of the 5th International Workshop on Geographic Information Retrieval, pp 27–34
Alves AO, Pereira FC, Rodrigues F, Oliveirinha J (2010) Place in perspective: extracting online information about points of interest. In: Proceedings of International Joint Conference on Ambient Intelligence, pp. 61–72
Asadi S, Yang G, Zhou X, Shi Y, Zhai B, Jiang WWR (2008) Pattern-Based Extraction of Addresses from Web Page Content. In: Pattern-based extraction of addresses from web page content. Proceedings of Asia-Pacific Web Conference, In, pp 407–418
Blohm S (2011) Large-scale pattern-based information extraction from the world wide web. KIT Scientific Publishing
Borges KA, Laender AH, Medeiros CB, Davis Jr CA (2007) Discovering geographic locations in web pages using urban addresses. In: Proceedings of the 4th ACM Workshop on Geographical Information Retrieval, pp 31–36
Cai W, Wang S, Jiang Q (2005) Address extraction: extraction of location-based information from the web. Proceedings of Asia-Pacific Web Conference, In, pp 925–937
Chuang HM, Chang CH, Kao TY, Cheng CT, Huang YY, Cheong KP (2016) Enabling maps/location searches on mobile devices: constructing a POI database via focused crawling and information extraction. Int J Geogr Inf Sci 30(7):1405–1425
Chuang HM, Chang CH, Cheng CT (2016) Improving the effectiveness of POI search by associated information summarization. In: proceedings of Asian language processing (IALP). pp 336-339. IEEE
Dakrory S, Abdelatif BA, Kayed M, Ali AA (2021) Extracting geographic addresses from social media using deep recurrent neural networks. In: 2021 9th international Japan-Africa conference on electronics, communications, and computations (JAC-ECC) (pp. 135-139). IEEE.
Efremova J, Endres I, Vidas I, Melnik O (2018) A geo-tagging framework for address extraction from web pages. In: Proceedings of Industrial Conference on Data Mining. pp. 288–295
Gao S, Li L, Li W, Janowicz K, Zhang Y (2017) Constructing gazetteers from volunteered big geo-data based on hadoop. Comput, Environ Urban Syst, Geospat Cloud Comput Big Data 61:172–186
Gelernter J, Ganesh G, Krishnakumar H, Zhang W (2013) Automatic gazetteer enrichment with user-geocoded data. In: Proceedings of GEOCROWD ‘13, pp 87–94
Hu Y, Mao H, McKenzie G (2019) A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int J Geogr Inf Sci 33(4):714–738
Koswatte S, Mcdougall K, Liu X (2016) Semantic location extraction from crowdsourced data. Int Archiv Photogram, Remote Sens Spatial Inform Sci 41(B2):543–547
Lamprianidis G, Skoutas D, Papatheodorou G, Pfoser D (2014) Extraction, integration and analysis of crowdsourced points of interest from multiple web sources. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information. pp 16–23
Li C, Sun A (2017) Extracting fine-grained location with temporal awareness in tweets: a two-stage approach. J Assoc Inf Sci Technol 68(7):1652–1670
Li L, Wang W, He B, Zhang Y (2018) A hybrid method for Chinese address segmentation. Int J Geogr Inf Sci 32(1):30–48
Lim J, Nitta N, Nakamura K, Babaguchi N (2019) Constructing geographic dictionary from streaming geotagged tweets. ISPRS Int J Geo Inf 8(5):216
Lingad J, Karimi S, Yin J (2013) Location extraction from disaster-related microblogs. In: proceedings of the international conference on world wide web (companion). pp 1017–1020
Matuszka T , Kiss A (2014) Geodint: towards semantic web-based geographic data integration. In: Proceedings of Asian Intelligent Information and Database Systems. pp. 191–200
Moura TH, Davis CA, Fonseca FT (2017) Reference data enhancement for geographic information retrieval using linked data. Trans GIS 21(4):683–700
Nesi P, Pantaleo G, Tenti M (2016) Geographical localization of web domains and organization addresses recognition by employing natural language processing, pattern matching and clustering. Eng Appl Artif Intell 51:202–211
Popescu A, Grefenstette G, Moëllic P-A (2008) Gazetiki: automatic creation of a geographical gazetteer. In: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp 16–20
Popescu A, Grefenstette G, Bouamor H (2009) Mining a multilingual geographical gazetteer from the web. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 01, pp 58–65
Rae A, Murdock V, Popescu A, Bouchard H (2012) Mining the web for points of interest. In: Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval., pp. 711–720
Steven B, Loper E, Klein E (2009) Natural language processing with Python. O’Reilly Media, Inc., Sebastopol, CA
Uryupina O (2003) Semi-supervised learning of geographical gazetteers from the internet. In Proceedings of the HLTNAACL 2003 Workshop on Analysis of Geographic References. pp 18–25
Van Canneyt S, Van Laere O, Schockaert S, Dhoedt B (2012) using social media to find places of interest: a case study. In: proceedings of the 1st ACM SIGSPATIAL international workshop on crowdsourced and volunteered geographic information. pp 2-8. ACM
Xu L, Du Z, Mao R, Zhang F, Liu R (2020) GSAM: a deep neural network model for extracting computational representations of Chinese addresses fused with geospatial feature. Comput Environ Urban Syst 1(81):101473
Zenasni S, Kergosien E, Roche M, Teisseire M (2016) Extracting new spatial entities and relations from short messages. In: proceedings of the 8th international conference on Management of Digital EcoSystems.pp 189-196. ACM
Zhang Y, Ma Q, Chiang YY, Knoblock C, Zhang X, Yang P, Gao M, Hu X (2019) Extracting geographic features from the internet: a geographic information mining framework. Knowl-Based Syst 174:57–72
Acknowledgements
This research is funded by University of Economics Ho Chi Minh City, Vietnam.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bui, TH. Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose City. Multimed Tools Appl 82, 34749–34770 (2023). https://doi.org/10.1007/s11042-023-14862-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14862-8