Abstract
The search results offered currently by majority of search portals are horizontal by nature. This denotes that these search engines intend to index as much web pages as possible and present search results based on these web pages. These results often offer generalized results. Focused Crawlers were built to download web pages relevant only to a pre-specified topic. Searching on these kinds of pages is called as Vertical Search, as it attempts to drill down on a single topic, rather than exploring a plethora of other pages on web which are related to search query in one way or another. In this paper, we propose an algorithm which helps a focused crawler decide whether a web page should be downloaded on not. The selection algorithm proposed in this paper makes use of semantic properties of the content to arrive at a decision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Johnson, J., Tsioutsiouliklis, K., Giles, C.L.: Evolving Strategies for Focused Web Crawling. In: Machine Learning–International Conference, vol. 20, Part 1, pp. 298–305 (2003)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks 31(11-16), 1623–1640 (1999)
Ehrig, M., Maedche, A.: Ontology-focused crawling of Web documents. In: Proceedings of the 2003 ACM Symposium on Applied Computing (SAC 2003), pp. 1174–1178. ACM, New York (2003)
de Assis, G.T., Laender, A.H.F., Gonçalves, M.A., da Silva, A.S.: Exploiting Genre in Focused Crawling. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 62–73. Springer, Heidelberg (2007)
Almpanidis, G., Kotropoulos, C., Pitas, I.: Focused Crawling Using Latent Semantic Indexing - An Application for Vertical Search Engines. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 402–413. Springer, Heidelberg (2005)
Kozanidis, L.: An Ontology-Based Focused Crawler. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 376–379. Springer, Heidelberg (2008)
Wadwekar, S., Mukhopadhyay, D.: A Ranking Algorithm integrating Vector Space Model with Semantic Metadata. In: CUBE 2012 Proceedings, Pune, India, September 3-5, pp. 623–628. ACM Digital Library, USA (2012)
Bergmark, D., Lagoze, C., Sbityakov, A.: Focused Crawls, Tunneling, and Digital Libraries. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 91–106. Springer, Heidelberg (2002)
Ding, L., Finin, T., Joshi, A., Pan, R., Scott Cost, R., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management (CIKM 2004), pp. 652–659. ACM, New York (2004)
Flouris, G., Plexousakis, D., Antoniou, G.: Evolving Ontology Evolution. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 14–29. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wadwekar, S., Mukhopadhyay, D. (2013). A Selection Algorithm for Focused Crawlers Incorporating Semantic Metadata. In: Hota, C., Srimani, P.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2013. Lecture Notes in Computer Science, vol 7753. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36071-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-36071-8_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36070-1
Online ISBN: 978-3-642-36071-8
eBook Packages: Computer ScienceComputer Science (R0)