[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1255175.1255186acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Document clustering using small world communities

Published: 18 June 2007 Publication History

Abstract

Words in natural language documents exhibit a small world network structure. Thus the physics community provides us with an extensive supply of algorithms for extracting community structure. We present a novel method for semantically clustering a large collection of documents using small world communities. This method combines modified physics algorithms with traditional information retrieval techniques. A term network is generated from the document collection, the terms are clustered into small world communities, the semantic term clusters are used to generate overlapping document clusters. The algorithm combines the speed of single link with the quality of complete link. Clustering takes place in nearly real-time and the results are judged to be coherent by expert users. Our algorithm occupies a middle ground between speed and quality of document clustering.

References

[1]
Caldarelli, G., Cartozo, C. C., De Los Rios, P. and Servedio, V. D. P. Widespread occurrence of the inverse square distribution in social sciences and taxonomy. Phys. Rev. E, 69(3), 35101-1-3, 2004.
[2]
Chung, Y., He, Q., Powell, K. and Schatz, B. Semantic indexing for a complete subject discipline. In Proc. of the Fourth ACM Conference on Digital Libraries, (Berkeley, CA, 1999), ACM Press, 39--48.
[3]
Church, K. W. and Hanks, P. Word association norms, mutual information, and lexicography. In Proc. of the 27th Annual Conference of the Association of Computational Linguistics, (Vancouver, B.C., 1989), ACM Press, 76--83.
[4]
Clauset, A., Newman, M. E. J., and Moore, C. Finding community structure in very large networks. Phys. Rev. E, 70(6), 066111, 2004.
[5]
Fernandess, Y. and Malkhi, D. K-clustering in wireless ad hoc networks. In Proc. of the second ACM international workshop on Principles of mobile computing (Toulouse, France, 2002), ACM Press, 31--37.
[6]
Ferrer i Cancho, R. and Solé, R. V. Patterns in syntactic dependency networks. Phys. Rev. E, 69(5), 051915, 2004.
[7]
Ferrer i Cancho, R. and Solé, R. V. The small world of human language. Proc. R. Soc. Lond. B, 268(1482), 2261--2265, 2001.
[8]
Firth, J. R. A synopsis of linguistic theory, 1930-1955. in Palmer F. R. ed. Selected Papers of J. R. Firth 1952-59, Ed., Longmans, London, 1968, 168--205.
[9]
Girvan, M. and Newman, M. E. J. Community structure in social and biological networks. PNAS, 99(12), 7821--7826, 2002.
[10]
Ismail, N., Robinson, G. E, and Fahrbach S. E. Stimulation of muscarinic receptors mimics experience-dependent plasticity in the honey bee brain. PNAS, 103(1), 207--211, 2006.
[11]
Jain, A. K., Murty, M. N., and Flynn, P. J. Data clustering: A Review. ACM Computing Surveys, 31(3), 265--323, 1999.
[12]
Manning, C. D. and Schutze, H. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, 1999.
[13]
Newman, M. E. J. Analysis of weighted networks. Phys. Rev. E, 70, 056131, 2004.
[14]
Newman, M. E. J. Fast algorithm for detecting community structure in networks. Phys. Rev. E, 69(6), 066133, 2004.
[15]
Newman, M. E. J. Models of the small world: A review. J. Stat. Phys., 101(3-4), 819--841, 2000.
[16]
Newman, M. E. J. The structure and function of complex networks. SIAM Rev., 45(2) 167--256, 2003.
[17]
Newman, M. E. J. and Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2), 026113, 2004.
[18]
Newman, M. E. J., Watts, D. J., and Strogatz, S. H. Random graph models of social networks. PNAS, 99, Suppl. 1, Feb., 2566--2572, 2002.
[19]
Osiński, S. Improving quality of search results clustering with approximate matrix factorizations. In Proc. of the 28th European Conference on IR Research (ECIR 2006), (London, UK, 2006), 167--178.
[20]
Osiński, S. and Weiss, Dawid. Conceptual clustering using Lingo algorithm: Evaluation on Open Directory Project data. In Proc. of the International IIS: IIPWM'04 Conference, (Zakopane, Poland, 2004), 369--378.
[21]
Palla, G., Derényi, I., Farkas, I., and Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435 (Jun.), 814--818, 2005.
[22]
Slonim, N. and Tishby, N. Document clustering using word clusters via the information bottleneck method. In Proc. of the 23rd Annual Internation ACM/SIGIR Conf. on Research and Development in Information Retrieval, (Athens, Greece, 2000), ACM Press, 208--215.
[23]
Solé, R., Ferrer-Cancho, R., Montoya, J. M., and Valverde, S. Selection, tinkering, and emergence in complex networks. Complexity, 8(1), 20--33, 2003.
[24]
Sparck Jones, K. and Willett, P. Readings in information retrieval. Morgan Kaufman, San Francisco, 1997.
[25]
Watts, D. J. and Strogatz, S. H. Collective dynamics of 'small-world' networks. Nature, 393(6684), 440--442, 1998.
[26]
Willet, P. Recent trends in hierarchic document clustering: A critical review. Information Processing and Management, 24 (5), 577--597, 1988.
[27]
Wu, A. Y., Garland, M., and Han, J. Mining scale-free networks using geodesic clustering. In Proc. 10th ACM SIGKDD Int. Conf., (Seattle, WA, 2004), ACM Press,719--724.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
June 2007
534 pages
ISBN:9781595936448
DOI:10.1145/1255175
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. community structure
  2. document clustering
  3. scale-free networks
  4. semantic clustering
  5. small worlds

Qualifiers

  • Article

Conference

JCDL07
JCDL07: Joint Conference on Digital Libraries
June 18 - 23, 2007
BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2013)Interlinking educational resources and the web of dataProgram10.1108/0033033121129631247:1(60-91)Online publication date: 8-Feb-2013
  • (2011)The Future of Healthcare InfrastructureHealthcare Infrastructure10.1007/978-0-85729-452-4_14(247-266)Online publication date: 9-Apr-2011
  • (2011)Mobile Monitors for Health SystemsHealthcare Infrastructure10.1007/978-0-85729-452-4_13(229-246)Online publication date: 9-Apr-2011
  • (2010)Analysis and Visualization of Japanese Law Networks Based on Granular Computing -Visual Law: Visualization System of Japanese Law-Journal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2010.p015014:2(150-154)Online publication date: 20-Mar-2010
  • (2009)Hierarchical structure analysis and visualization of Japanese law networks based on morphological analysis and granular computing2009 IEEE International Conference on Granular Computing10.1109/GRC.2009.5255062(539-543)Online publication date: Aug-2009
  • (2009)Clustering of document collection - A weighting approachExpert Systems with Applications: An International Journal10.1016/j.eswa.2008.11.01736:4(7904-7916)Online publication date: 1-May-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media