[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

An EM-Approach for Clustering Multi-Instance Objects

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Abstract

In many data mining applications the data objects are modeled as sets of feature vectors or multi-instance objects. In this paper, we present an expectation maximization approach for clustering multi-instance objects. We therefore present a statistical process that models multi-instance objects. Furthermore, we present M-steps and E-steps for EM clustering and a method for finding a good initial model. In our experimental evaluation, we demonstrate that the new EM algorithm is capable to increase the cluster quality for three real world data sets compared to a k-medoid clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dietterich, T., Lathrop, R., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)

    Article  MATH  Google Scholar 

  2. Kriegel, H.P., Schubert, M.: Classification of websites as sets of feature vectors. In: Proc. IASTED Int. Conf. on Databases and Applications (DBA 2004), Innsbruck, Austria (2004)

    Google Scholar 

  3. Zhou, Z.H.: Multi-Instance Learning: A Survey. Technical Report, AI Lab, Computer Science a. Technology Department, Nanjing University, Nanjing, China (2004)

    Google Scholar 

  4. Ruffo, G.: Learning single and multiple instance decision tree for computer security applications. PhD thesis, Department of Computer Science, University of Turin, Torino, Italy (2000)

    Google Scholar 

  5. Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Eiter, T., Mannila, H.: Distance Measures for Point Sets and Their Computation. Acta Informatica 34, 103–133 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ramon, J., Bruynooghe, M.: A polynomial time computable metric between points sets. Acta Informatica 37, 765–780 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  8. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    MATH  Google Scholar 

  9. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 291–316 (1996)

    Google Scholar 

  10. Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.: Multi-Instance Kernels, pp. 179–186 (2002)

    Google Scholar 

  11. Ng, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 144–155 (1994)

    Google Scholar 

  12. Wang, J., Zucker, J.: Solving Multiple-Instance Problem: A Lazy Learning Approach, pp. 1119–1125 (2000)

    Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, London (2001)

    MATH  Google Scholar 

  14. Fayyad, U., Reina, C., Bradley, P.: Initialization of Iterative Refinement Clustering Algorithms. In: Proc. Int. Conf. on Knowledge Discovery in Databases (KDD) (1998)

    Google Scholar 

  15. Smyth, P.: Clustering using monte carlo cross-validation. In: KDD, pp. 126–133 (1996)

    Google Scholar 

  16. Wang, J.T.L., Ma, Q., Shasha, D., Wu, C.H.: New techniques for extracting features from protein sequences. IBM Syst. J. 40, 426–441 (2001)

    Article  Google Scholar 

  17. Newman, D.J., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kriegel, HP., Pryakhin, A., Schubert, M. (2006). An EM-Approach for Clustering Multi-Instance Objects. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_18

Download citation

  • DOI: https://doi.org/10.1007/11731139_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics