Abstract
Similarity of semantic content of web pages is displayed using interactive graphs presenting fragments of minimum spanning trees. Homepages of people are analyzed, parsed into XML documents and visualized using TouchGraph LinkBrowser, displaying clusters of people that share common interest. The structure of these graphs is strongly affected by selection of information used to calculate similarity. Influence of simple selection and Latent Semantic Analysis (LSA) on structures of such graphs is analyzed. Homepages and lists of publications are converted to a word frequency vector, filtered, weighted and similarity matrix between normalized vectors is used to create separate minimum sub-trees showing clustering of people’s interest. Results show that in this application simple selection of important keywords is as good as LSA but with much lower algorithmic complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
G. Aas. Html-parser. http://search.cpan.org/~gaas/HTML-Parser, 2004.
G. Aas. Html-parser. http://www.touchgraph.com, 2004.
The Brain. The brain. http://www.thebrain.com, 2004.
T. Buzan. Mind maps. http://www.mind-map.com, 2004.
P. Groenen I. Borg. Modern Multidimensional Scaling. Theory and Applications. Springer Series in Statistics, Heidelberg, 1996.
T. Kohonen. Websom. http://websom.hut.fi, 1999.
J. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the American Mathematical Society, volume 7, pages 48–50, 1956.
T. K. Landauer, P. W. Foltz, and D. Laham. Introduction to latent semantic analysis. Discourse Processes, 25:259–284, 1998.
D.D. Lewis. Reuters-21578 text categorization test collection. http://www.daviddlewis.com/resources/testcollections/reuters21578/, 1997.
P. Matykiewicz. Demonstration applet. http://www.neuron.m4u.pl/search, 2004.
W. Pedrycz. Knowledge-Based Clustering: From Data to Information Granules. John Wiley and Sons, Chichester, 2005.
M._F. Porter. An algorithm for suffix stripping. Program, 14(3):48–50, 1980.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duch, W., Matykiewicz, P. (2005). Minimum Spanning Trees Displaying Semantic Similarity. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_4
Download citation
DOI: https://doi.org/10.1007/3-540-32392-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25056-2
Online ISBN: 978-3-540-32392-1
eBook Packages: EngineeringEngineering (R0)