[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1367798.1367803acmotherconferencesArticle/Chapter ViewAbstractPublication PageslocwebConference Proceedingsconference-collections
research-article

Urban web crawling

Published: 22 April 2008 Publication History

Abstract

Local search is increasingly becoming a major focus point of research interest. It is a widely-recognized speciality search with a large application area. Its data is usually aggregated from a variety of sources. One as yet largely untapped source of location data is the WWW. Today, the Web does not explicitly reveal its location-relation; rather this information is hidden somewhere within pages' contents. To exploit such location information, we need to find, extract and geo-spatially index relevant Web pages. For an effective retrieval of such content, this paper examines the application of focused Web crawling to the geospatial domain. We describe our approach for a geo-aware focused crawling of urban areas and other regions with a high building density. We present our experimental results that give us insight into spatial Web information such as location density and link distance between topical pages. Our crawls and evaluations back our hypothesis that geospatially focused crawling is suitable for the urban geospatial topic.

References

[1]
D. Ahlers and S. Boll. Geospatially Focused Web Crawling. Datenbank-Spektrum, Special Issue Focused Search, 7(23):3--12, 2007.
[2]
D. Ahlers and S. Boll. Location-based Web search. In A. Scharl and K. Tochterman, editors, The Geospatial Web. Springer, London, 2007.
[3]
D. Ahlers and S. Boll. A Web more Geospatial: Insights into the Location Inside. In Web Science Workshop (WSW2008) at WWW2008, Beijing, China, 2008.
[4]
E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-Where: Geotagging Web Content. In SIGIR '04, pages 273--280, New York, NY, USA, 2004. ACM.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.
[6]
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web. Computer Networks, 33(1):309--320, June 2000.
[7]
W. Cai, S. Wang, and Q. Jiang. Address extraction: Extraction of location-based information from the web. In Y. Zhang, K. Tanaka, J. X. Yu, S. Wang, and M. Li, editors, APWeb 2005. Springer, 2005.
[8]
S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated Focused Crawling through Online Relevance Feedback. In WWW '02, pages 148--159, New York, NY, USA, 2002. ACM.
[9]
S. Chakrabarti, M. van den Berg, and B. Dom. Distributed Hypertext Resource Discovery Through Examples. In The VLDB Journal, pages 375--386, 1999.
[10]
S. Chakrabarti, M. van den Berg, and B. Dom. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. Computer Networks, 31(11--16):1623--1640, 1999.
[11]
B. Cheswick, H. Burch, and S. Branigan. Mapping and Visualizing the Internet. In ATEC'00, Berkeley, CA, USA, 2000. USENIX Association.
[12]
J. Ding, L. Gravano, and N. Shivakumar. Computing Geographical Scopes of Web Resources. In VLDB 2000, Cairo, Egypt, 2000.
[13]
W. Gao, H. C. Lee, and Y. Miao. Geographically focused collaborative crawling. In WWW '06, pages 287--296, New York, NY, USA, 2006. ACM.
[14]
L. L. Hill. Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints. In ECDL 2000, pages 280--290, London, UK, 2000. Springer.
[15]
M. Jakob, M. Gromann, D. Nicklas, and B. Mitschang. DCbot: Finding Spatial Information on the Web. In L. Zhou, B. C. Ooi, and X. Meng, editors, DASFAA 2005, pages 779--790. Springer, 2005.
[16]
M. Kamvar and S. Baluja. A large scale study of wireless search behavior: Google mobile search. In CHI '06, pages 701--709. ACM, 2006.
[17]
A. Lakhina, J. W. Byers, M. Crovella, and I. Matta. On the Geographic Location of Internet Resources. In IMW '02, pages 249--250. ACM, 2002.
[18]
M. Levene. An Introduction to Search Engines and Web Navigation. Addison Wesley, 2006.
[19]
A. Markowetz, Y.-Y. Chen, T. Suel, X. Long, and B. Seeger. Design and Implementation of a Geographic Search Engine. In WebDB 2005, pages 19--24, 2005.
[20]
K. S. McCurley. Geospatial Mapping and Navigation of the Web. In WWW '01, pages 221--229, New York, NY, USA, 2001. ACM.
[21]
Y. Morimoto, M. Aono, M. E. Houle, and K. S. Mc-Curley. Extracting Spatial Knowledge from the Web. In SAINT '03. IEEE, 2003.
[22]
M. Najork and A. Heydon. High-Performance Web Crawling. In Handbook of massive data sets. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[23]
M. Najork and J. L. Wiener. Breadth-First Crawling Yields High-Quality Pages. In WWW10, pages 114--118, Hong Kong, May 2001. Elsevier Science.
[24]
M. Sanderson and J. Kohler. Analyzing Geographic Queries. In Proc. of the ACM SIGIR Workshop on Geographic Information Retrieval, Sheffield, UK, 2004.
[25]
T. T. Tang, D. Hawking, N. Craswell, and K. Griffiths. Focused Crawling for both Topical Relevance and Wuality of Medical Information. In CIKM '05, pages 147--154, New York, NY, USA, 2005. ACM.
[26]
T. T. Tang, D. Hawking, N. Craswell, and R. S. Sankaranarayana. Focused Crawling in Depression Portal Search: A Feasibility Study. In ADCS 2004, pages 2--9, Melbourne, Australia, 2004.

Cited By

View all
  • (2021)Postal address extraction from the web: a comprehensive surveyArtificial Intelligence Review10.1007/s10462-021-09983-1Online publication date: 14-Mar-2021
  • (2017)Information Retrieval in Web Crawling Using Population Based, and Local Search Based Meta-heuristics: A ReviewProceedings of Sixth International Conference on Soft Computing for Problem Solving10.1007/978-981-10-3325-4_10(87-104)Online publication date: 13-Apr-2017
  • (2011)A Comparison over Focused Web Crawling StrategiesProceedings of the 2011 15th Panhellenic Conference on Informatics10.1109/PCI.2011.53(245-249)Online publication date: 30-Sep-2011
  • Show More Cited By

Index Terms

  1. Urban web crawling

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    LOCWEB '08: Proceedings of the first international workshop on Location and the web
    April 2008
    192 pages
    ISBN:9781605581606
    DOI:10.1145/1367798
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive crawling
    2. focused crawling
    3. geographic web information retrieval
    4. location-based web search
    5. topical search

    Qualifiers

    • Research-article

    Conference

    WWW '08

    Acceptance Rates

    Overall Acceptance Rate 4 of 5 submissions, 80%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Postal address extraction from the web: a comprehensive surveyArtificial Intelligence Review10.1007/s10462-021-09983-1Online publication date: 14-Mar-2021
    • (2017)Information Retrieval in Web Crawling Using Population Based, and Local Search Based Meta-heuristics: A ReviewProceedings of Sixth International Conference on Soft Computing for Problem Solving10.1007/978-981-10-3325-4_10(87-104)Online publication date: 13-Apr-2017
    • (2011)A Comparison over Focused Web Crawling StrategiesProceedings of the 2011 15th Panhellenic Conference on Informatics10.1109/PCI.2011.53(245-249)Online publication date: 30-Sep-2011
    • (2011)Ad-Hoc Georeferencing of Web-Pages Using Street-Name Prefix TreesWeb Information Systems and Technologies10.1007/978-3-642-22810-0_19(259-271)Online publication date: 2011
    • (2010)Location-based search engine for multimedia phones2010 IEEE International Conference on Multimedia and Expo10.1109/ICME.2010.5583538(558-563)Online publication date: Jul-2010
    • (2010)A Crawler for Local SearchProceedings of the 2010 Fourth International Conference on Digital Society10.1109/ICDS.2010.23(86-91)Online publication date: 10-Feb-2010
    • (2009)Adaptive geospatially focused crawlingProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646011(445-454)Online publication date: 2-Nov-2009
    • (2009)Web mining based OALF model for context-aware mobile advertising system2009 IFIP/IEEE International Symposium on Integrated Network Management-Workshops10.1109/INMW.2009.5195962(211-216)Online publication date: Jun-2009
    • (2008)Retrieving address-based locations from the webProceedings of the 5th Workshop on Geographic Information Retrieval10.1145/1460007.1460015(27-34)Online publication date: 29-Oct-2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media