[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Ontology-driven discovery of geospatial evidence in web pages

Published: 01 October 2011 Publication History

Abstract

When users need to find something on the Web that is related to a place, chances are place names will be submitted along with some other keywords to a search engine. However, automatic recognition of geographic characteristics embedded in Web documents, which would allow for a better connection between documents and places, remains a difficult task. We propose an ontology-driven approach to facilitate the process of recognizing, extracting, and geocoding partial or complete references to places embedded in text. Our approach combines an extraction ontology with urban gazetteers and geocoding techniques. This ontology, called OnLocus, is used to guide the discovery of geospatial evidence from the contents of Web pages. We show that addresses and positioning expressions, along with fragments such as postal codes or telephone area codes, provide satisfactory support for local search applications, since they are able to determine approximations to the physical location of services and activities named within Web pages. Our experiments show the feasibility of performing automated address extraction and geocoding to identify locations associated to Web pages. Combining location identifiers with basic addresses improved the precision of extractions and reduced the number of false positive results.

References

[1]
Aho AV (1990) Algorithms for finding patterns in strings. Handbook of theoretical computer science. In: van Leeuwen J (ed) Volume A: Algorithms and complexity. The MIT Press, pp 255-300
[2]
Amitay E, Har'El N, Sivan R, Soffer A (2004) Web-a-Where: Geotagging Web Content. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, pp 273-280
[3]
Arampatzis A, van Kreveld M, Reinbacher I, Jones CB, Vaid S, Clough P, Joho H, Sanderson M (2006) Web-based delineation of imprecise regions. Comput Environ Urban Syst 30:436-459
[4]
Borges KAV (2006) Use of an ontology of urban places for recognition and extraction of geospatial evidences on the web (in Portuguese). Belo Horizonte (MG), Brazil, Federal University of Minas Gerais
[5]
Borges KAV, Davis CA Jr, Laender AHF (2001) OMT-G: an object-oriented data model for geographic applications. GeoInformatica 5(3):221-260
[6]
Borges KAV, Laender AHF, Medeiros CB, Davis CA Jr (2007) Discovering geographic locations in web pages using urban addresses. Proceedings of the 4th ACM Workshop on Geographic Information Retrieval, Lisbon, Portugal, pp 31-36
[7]
Borges KAV, Laender AHF, Medeiros CB, Silva AS, Davis CA Jr (2003) The web as a data source for spatial databases. Proc. of the V Brazilian Symp. on GeoInformatics, Campos do Jordão (SP), Brazil: CD-ROM
[8]
Buneman P, Khanna S, Tan W-C (2000) Data provenance: some basic issues. FST TCS 2000: Foundations of software technology and theoretical computer science: 20th conference. New Delhi, India: p87
[9]
Casati R, Varzi AC (1996) The structure of spatial localization. Philos Stud 82:205-239
[10]
Clementini E, DiFelice P, van Oosterom P (1993) A small set of formal topological relationships suitable for end-user interaction. 3rd Symposium on Spatial Database Systems: 277-295
[11]
Davis CA Jr, Fonseca FT (2007) Assessing the certainty of locations produced by an address geocoding system. Geoinformatica 11(1):103-129
[12]
Davis CA Jr, Fonseca FT, Borges KAV (2003) A flexible addressing system for approximate urban geocoding. V Brazilian Symposium on GeoInformatics (GeoInfo 2003), Campos do Jordão (SP):CDROM
[13]
Delboni TM, Borges KAV, Laender AHF, Davis CA Jr (2007) Semantic expansion of geographic web queries based on natural language positioning expressions. Trans GIS 11(3):377-397
[14]
Ding J, Gravano L, Shivakumar N (2000) Computing geographical scopes of web resources. Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt: 545-556
[15]
Egenhofer M, Franzosa R (1991) Point-set topological spatial relations. Int J Geogr Inf Syst 5(2):161- 174
[16]
Egenhofer MJ (2002) Toward the semantic geospatial web. Geographic Information Science 2002. McLean, Virginia, pp 1-4
[17]
Embley DW (2004) Toward semantic understanding--an approach based on information extraction ontologies. Proceedings of the 15th Australasian Database Conference, Dunedin, New Zealand, pp 18- 22
[18]
Embley DW, Campbell DM, Jiang YS, Liddle SW, Lonsdale DW, Ng Y-K, Quass D, Smith RD (1999) Conceptual-model-based data extraction from multiple-record web pages. Data Knowl Eng 31(3):227- 251
[19]
Friedl J (2002) Mastering regular expressions. O'Reilly
[20]
Fu G, Jones CB, Abdelmoty A (2005) Building a geographical ontology for intelligent spatial search on the web. Proc. of the IASTED Int'l Conf. on Databases and Applications, Innsbruck, Austria, pp 167- 172
[21]
Goodchild MF, Hill LL (2008) Introduction to digital gazetteer research. Int J Geogr Inf Sci 22 (10):1039-1044
[22]
Goyal RK (2000) Similarity assessment for caardinal directions between extended spatial objects. Orono, Maine, University of Maine, p189
[23]
Hill LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. 4th European Conference on Research and Advanced Technology for Digital Libraries, pp 280-290
[24]
Himmelstein H (2005) Local search: the internet is the yellow pages. IEEE Comput 38(2):26-35
[25]
Jones CB, Purves R, Ruas A, Sanderson M, Sester M, van Kreveld M, Weibel R (2002) Spatial information retrieval and geographic ontologies: an overview of the SPIRIT project. ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland, pp 387-388
[26]
Jones CB, Purves RS, Clough PD, Joho H (2008) Modelling vague places with knowledge from the web. Int J Geogr Inf Sci 22(10):1045-1065
[27]
Laender AHF, Borges KAV, Carvalho JCP, Medeiros CB, Silva AS, Davis CA Jr (2005) Integrating web data and geographic knowledge into spatial databases. Spatial databases: techniques, technologies and trends. In: Manolopoulos Y, Papadopoulos A, Vassilakopoulos M. Hershey Pennsylvania, USA, Idea Group Publishing, pp 23-48.
[28]
Larson RR (1996) Geographic information retrieval and spatial browsing. Geographic information systems and libraries: patrons, maps, and spatial information. In: Smith LC, Gluck M (eds). Urbana, IL, Un. of Illinois, pp 81-123
[29]
Manov D, Kiryakov A, Popov B, Bontcheva K, Maynard D, Cunningham H (2003) Experiments with knowledge for extraction. Proceedings of the Human Language Technology Conference Workshop on Analysis of Geographic, Edmonton, Canada, pp 1-9
[30]
Martins B, Silva MJ, Freitas S, Afonso AP (2006) Handling locations in search engine queries. Proceedings of the 3rd ACM Workshop on Geographical Information Retrieval (GIR 2006), Seattle, Washington, USA
[31]
McCurley KS (2001) Geospatial mapping and navigation on the web. Tenth International World Wide Web Conference (WWW10), Hong Kong, ACM, pp 221-229
[32]
Miller C (2006) A beast in the field: the google maps mashup as GIS/2. Cartographica Int J Geogr Inf Vis 41(3):187-199
[33]
Modesto M, Pereira á Jr, Ziviani N, Castillo C, Baeza-Yates R (2005) A new portrait of the Brazilian Web (in Portuguese). Proceedings of the XXXII Seminar on Integrated Software and Hardware (SEMISH 2005), São Leopoldo (RS), Brazil, pp 2005-2016
[34]
Rhind G (1999) Global sourcebook of address data management: a guide to address formats and data in 194 countries gower
[35]
Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL (2006) Geocoding in cancer research: a review. Am J Preventative Med 30(2S):S16-S24
[36]
Sanderson M, Kohler J (2004) Analyzing geographic queries. Proc. of the ACM SIGIR Workshop on Geographic Information Retrieval, Sheffield, UK, pp 1-2
[37]
Schockaert S, De Cock M, Kerre EE (2008) Location approximation for local search services using natural language hints. Int J Geogr Inf Sci 22(3):315-336
[38]
Scowen RS (1993) Extended BNF--a generic base standard. Proceedings of the 1993 Software Engineering standards Symposium (SESS'93), Brighton, UK
[39]
Sengar V, Joshi T, Joy J, Prakash S, Toyama K (2007) Robust location search from text queries. Proceedings of th 15th International Conference on Advances in Geographic Information Systems (ACM GIS 2007), Seattle, Washington, USA
[40]
Silva MJ, Martins B, Chaves M, Cardoso N, Afonso AP (2006) Adding geographic scopes to web resources. Comput Environ Urban Syst 30:378-399
[41]
Smith J, Smith D (1977) Database abstractions: aggregation and generalization. ACM Trans Database Syst 2(2):105-133
[42]
Souza LA, Davis CA Jr, Borges KAV, Delboni TM, Laender AHF (2005) The role of gazetteers in geographic knowledge discovery on the web. 3rd Latin American Web Congress, Buenos Aires, Argentina, pp 157-165
[43]
Spaccapietra S, Cullot N, Parent C, Vangenot C (2004) On spatial ontologies. VI Brazilian Symposium on GeoInformatics (GeoInfo 2004), Campos do Jordão (SP), Brazil:CD-ROM
[44]
Sui DT (2008) The wikification of GIS and its consequences: or Angelina Jolie's new tattoo and the future of GIS. Comput Environ Urban Syst 32(1):1-5
[45]
Sun G, Chen J, Guo W, Ray Liu KJ (2005) Signal processing techniques in network-aided positioning: a survey of state-of-the-art positioning designs. IEEE Signal Process Mag 22(4):12-23
[46]
Tsichritzis D, Klug AC (1978) The ANSI/X3/SPARC DBMS framework report of the study group on dabatase management systems. Inf Syst 3(3):173-191
[47]
U.S. Census Bureau. (2003, March 2003). "108th CD Census 2000 TIGER/Line Files Technical Documentation." RetrievedMarch 2009, from http://www.census.gov/geo/www/tiger/tgrcd108/tgr108cd.pdf
[48]
Wang C, Xie X, Wang L, Lu Y, Ma W (2005) Detecting geographic locations from web resources. Proc. of the 2nd Int'l Workshop on Geographic Information Retrieval, Bremen, Germany, pp 17-24
[49]
Zandbergen PA (2008) A comparison of address point, parcel and street geocoding techniques. Comput Environ Urban Syst 32(2008):214-232
[50]
Zong W, Wu D, Sun A, Lim E, Goh DHG (2005) On assigning place names to geographic related web pages. Proc. of the 5th ACM/IEEE-CS Joint Conf. on Digital Libraries, Denver, Colorado, USA, pp 354-362

Cited By

View all

Recommendations

Reviews

Carlos Linares Lopez

The ability to correctly interpret geospatial information is undoubtedly a difficult exercise for artificial intelligence. While humans easily deal with this problem, it is an enormous challenge for computers. The importance of this problem is even more acute when considering that the Internet is crowded with geospatial references. More importantly, with the advent of mobile devices at a very large scale, the ability to find and classify locations becomes a very demanding feature. This necessity has not been overlooked by the most prominent Internet providers who already offer a wide range of services for accessing geospatial information. However, the quality of the results is still far from our ability to deal with the same pieces of information. As usually happens with automated systems that have to reason with incomplete yet imperfect information, a first step consists of creating an ontology that recognizes the most important concepts and relates them in a consistent manner. This paper introduces OnLocus, an ontology that recognizes the following concepts: place, territorial division, landmark, place descriptor, address, place name, and positioning expressions. In contrast with other similar ontologies, it is domain-dependent since it exploits concepts that are found in most urban communities. As a matter of fact, the ontology has been proven on a number of Brazilian Web pages. It can also be seen as an extraction ontology since it provides services for extracting geospatial information from the pages visited. All in all, the paper is very well organized and makes an important contribution to this type of knowledge representation schema. It is very well written and is easy to read and follow. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Geoinformatica
Geoinformatica  Volume 15, Issue 4
October 2011
191 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2011

Author Tags

  1. Extraction ontologies
  2. Geocoding
  3. Geographic information retrieval
  4. Geospatial evidence in text
  5. Positioning expressions

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media