[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/782096.782107dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
Article

Node similarity in networked information spaces

Published: 05 November 2001 Publication History

Abstract

Networked information spaces contain information entities, corresponding to nodes, which are connected by associations, corresponding to links in the network. Examples of networked information spaces are: the World Wide Web, where information entities are web pages, and associations are hyperlinks: the scientific literature, where information entities are articles and associations are references to other articles. Similarity between information entities in a networked information space can be defined not only based on the content of the information entities, but also based on the connectivity established by the associations present. This paper explores the definition of similarity based on connectivity only, and proposes several algorithms for this purpose. Our metrics take advantage of the local neighborhoods of the nodes in the networked information space. Therefore, explicit availability of the networked information space is not required, as long as a query engine is available for following links and extracting the necessary local neighbourhoods for similarity estimation. Two variations of similarity estimation between two nodes are described, one based on the separate local neighbourhoods of the nodes, and another based on the joint local neighbourhood expanded from both nodes at the same time. The algorithms are implemented and evaluated on the citation graph of computer science. The immediate application of this work is in finding papers similar to a given paper in a digital library, but they are also applicable to other networked information spaces, such as the Web.

References

[1]
{1} Yuan An. Characterizing and mining the citation graph of computer science literature. Technical report. Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada, 2001. http://www.cs.dal.ca.
[2]
{2} S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In In 7th International World Wide Web Conference, 1998.
[3]
{3} Steve Lawrence C. Lee Giles, Kurt D. Bollacker. Citeseer: An automatic citation indexing system. In Digital Libraries 98 - Third ACM Conference on Digital Libraries 98, pages 89-98, 1998.
[4]
{4} Jeffrey Dean and Monica R. Henzinger. Finding related pages in the world wide web. Technical report, 1999.
[5]
{5} Reinhard Diestel. Graph Theory. Springer. 2000.
[6]
{6} E. Garfield. Citation indexes for science. In Science. pages 108-111, 1955.
[7]
{7} E. Garfield. Citation analysis as a tool in journal evaluation. In Science. pages 471-479, 1972.
[8]
{8} Belver C. Griffith Henry Small. The structure of scientific literatures. In Science Studies, pages 17-40. 1974.
[9]
{9} Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. IBM Research Report RJ 10076, May 1997.
[10]
{10} R. Rousseau L. Egghe. Introduction to Informetrics . Elsevier, 1990.
[11]
{11} V. I. Levenshtein. Binary codes capable of correcting spurious insertions and deletions of ones. In Russian Problemy Peredachi Informatsii, pages 1:12-25, 1965.
[12]
{12} Wangzhong Lu. Similarity in networked information spaces, master thesis. Technical report, Department of Computer Science, Dalhousie University, 2001.
[13]
{13} Henry Small. Co-citation in scientific literature: A new measure of the relationship between two documents. In Journal of the American Society for Information Science. pages 265-269. 1973.
[14]
{14} Don R. Swanson. On the fragmentation of knowledge. the connection explosion. and assembling other people's ideas. Bulletin of the American Society for Information Science and Technology, 27(3), February/March 2001. http://www.asis.org/Bulletin/Mar- 01/swanson.html (accessed on Sept. 15, 2001.
[15]
{15} Don R. Swanson and Neil R. Smalheiser. An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence. 91(2):183-203, 1997.

Cited By

View all
  • (2017)Measuring similarity of users with qualitative preferences for service selectionKnowledge and Information Systems10.1007/s10115-016-0985-151:2(561-594)Online publication date: 1-May-2017
  • (2011)Axiomatic ranking of network role similarityProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2020408.2020561(922-930)Online publication date: 21-Aug-2011
  • (2010)A new closeness metric for social networks based on the k shortest pathsProceedings of the 7th international conference on Advances in Neural Networks - Volume Part II10.1007/978-3-642-13318-3_36(282-291)Online publication date: 6-Jun-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CASCON '01: Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
November 2001
230 pages

Sponsors

  • IBM Canada: IBM Canada
  • NRC: National Research Council - Canada

Publisher

IBM Press

Publication History

Published: 05 November 2001

Author Tags

  1. citation graph
  2. digital libraries
  3. document similarity metric
  4. networked information spaces

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Measuring similarity of users with qualitative preferences for service selectionKnowledge and Information Systems10.1007/s10115-016-0985-151:2(561-594)Online publication date: 1-May-2017
  • (2011)Axiomatic ranking of network role similarityProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2020408.2020561(922-930)Online publication date: 21-Aug-2011
  • (2010)A new closeness metric for social networks based on the k shortest pathsProceedings of the 7th international conference on Advances in Neural Networks - Volume Part II10.1007/978-3-642-13318-3_36(282-291)Online publication date: 6-Jun-2010
  • (2009)MatchSimProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646185(1613-1616)Online publication date: 2-Nov-2009
  • (2009)Relating web pages to enable information-gathering tasksProceedings of the 20th ACM conference on Hypertext and hypermedia10.1145/1557914.1557935(109-118)Online publication date: 29-Jun-2009
  • (2008)Accuracy estimate and optimization techniques for SimRank computationProceedings of the VLDB Endowment10.14778/1453856.14539041:1(422-433)Online publication date: 1-Aug-2008
  • (2007)Practical Algorithms and Lower Bounds for Similarity Search in Massive GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2007.100819:5(585-598)Online publication date: 1-May-2007
  • (2005)Enhanced searching algorithms for relevant web pages using hyperlink graphsProceedings of the 43rd annual ACM Southeast Conference - Volume 110.1145/1167350.1167400(159-161)Online publication date: 18-Mar-2005
  • (2005)Scaling link-based similarity searchProceedings of the 14th international conference on World Wide Web10.1145/1060745.1060839(641-650)Online publication date: 10-May-2005
  • (2005)Algorithmic detection of semantic similarityProceedings of the 14th international conference on World Wide Web10.1145/1060745.1060765(107-116)Online publication date: 10-May-2005
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media