[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1081870.1081910acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Cross-relational clustering with user's guidance

Published: 21 August 2005 Publication History

Abstract

Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CrossClus, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CrossClus is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CrossClus extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.

References

[1]
C.C. Aggarwal, P.S. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. SIGMOD, 2000.]]
[2]
C.C. Aggarwal, C. Procopiuc, J.L. Wolf, P.S. Yu, J.S. Park. Fast Algorithms for Projected Clustering. SIGMOD, 1999.]]
[3]
P. Cheeseman, et al. AutoClass: A Bayesian Classfication System. ICML, 1988.]]
[4]
J.G. Dy, C.E. Brodley. Feature Selection for Unsupervised Learning. J. Machine Learning Research, 2004.]]
[5]
W. Emde, D. Wettschereck. Relational Instance-Based Learning. ICML, 1996.]]
[6]
V. Ganti, J. Gehrke, R. Ramakrishnan. CACTUS - Clustering Categorical Data Using Summaries. KDD, 1999.]]
[7]
T. Gärtner, J. W. Lloyd, P. A. Flach. Kernels and Distances for Structured Data. Machine Learning, 57, 2004.]]
[8]
I. Guyon, A. Elisseeff. An Introduction to Variable and Feature Selection. J. Machine Learning Research, 2003.]]
[9]
M.A. Hall. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. ICML, 2000.]]
[10]
V. Hristidis, Y. Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. VLDB, 2002.]]
[11]
L. Kaufman, P.J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, 1990.]]
[12]
K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl. Constrained k-means clustering with background knowledge. ICML, 2001.]]
[13]
H. Kim, S. Lee. A semi-supervised document clustering technique for information organization. CIKM, 2000.]]
[14]
M. Kirsten, S. Wrobel. Relational Distance-Based Clustering. ILP, 1998.]]
[15]
M. Kirsten, S. Wrobel. Extending K-Means Clustering to First-order Representations. ILP, 2000.]]
[16]
J. MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. Berkeley Symposium, 1967.]]
[17]
T.M. Mitchell. Machine Learning. McGraw Hill, 1997.]]
[18]
P. Mitra, C.A. Murthy, S.K. Pal. Unsupervised Feature Selection Using Feature Similarity. PAMI, 2002.]]
[19]
R.T. Ng, J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB, 1994.]]
[20]
X. Yin, J. Han, J. Yang, P.S. Yu. CrossMine: Efficient Classification Across Multiple Database Relations. ICDE, 2004.]]
[21]
E. P. Xing, A. Y. Ng, M. I. Jordan, S. Russell. Distance metric learning, with application to clustering with side-information. NIPS, 2002.]]

Cited By

View all
  • (2021)Incorporating domain ontology information into clustering in heterogeneous networksWIREs Data Mining and Knowledge Discovery10.1002/widm.141311:4Online publication date: 10-May-2021
  • (2019)Heterogeneous information network based clustering for precision traditional Chinese medicineBMC Medical Informatics and Decision Making10.1186/s12911-019-0963-019:S6Online publication date: 19-Dec-2019
  • (2017)Clustering multi-typed objects in extended star-structured heterogeneous dataIntelligent Data Analysis10.3233/IDA-15041621:2(225-241)Online publication date: 2-Mar-2017
  • Show More Cited By

Index Terms

  1. Cross-relational clustering with user's guidance

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
    August 2005
    844 pages
    ISBN:159593135X
    DOI:10.1145/1081870
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. data mining
    3. relational databases

    Qualifiers

    • Article

    Conference

    KDD05

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Incorporating domain ontology information into clustering in heterogeneous networksWIREs Data Mining and Knowledge Discovery10.1002/widm.141311:4Online publication date: 10-May-2021
    • (2019)Heterogeneous information network based clustering for precision traditional Chinese medicineBMC Medical Informatics and Decision Making10.1186/s12911-019-0963-019:S6Online publication date: 19-Dec-2019
    • (2017)Clustering multi-typed objects in extended star-structured heterogeneous dataIntelligent Data Analysis10.3233/IDA-15041621:2(225-241)Online publication date: 2-Mar-2017
    • (2017)Combining Structured Node Content and Topology Information for Networked Graph ClusteringACM Transactions on Knowledge Discovery from Data10.1145/299619711:3(1-29)Online publication date: 21-Mar-2017
    • (2017)Detecting Communities on Topic of Transportation With Sparse Crowd AnnotationsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2016.259632118:4(1017-1022)Online publication date: 1-Apr-2017
    • (2016)Eigen-Optimization on Large Graphs by Edge ManipulationACM Transactions on Knowledge Discovery from Data10.1145/290314810:4(1-30)Online publication date: 14-Jun-2016
    • (2016)User-Guided Large Attributed Graph Clustering with Multiple Sparse AnnotationsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-31753-3_11(127-138)Online publication date: 12-Apr-2016
    • (2014)Clustering on heterogeneous networksWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11264:3(213-233)Online publication date: 1-May-2014
    • (2012)Using trees to mine multirelational databasesData Mining and Knowledge Discovery10.1007/s10618-011-0218-x24:1(1-39)Online publication date: 1-Jan-2012
    • (2012)New approach for clustering relational data based on relationship and attribute informationProceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II10.1007/978-3-642-33266-1_56(451-458)Online publication date: 11-Sep-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media