Abstract
Multiple-instance learning (MIL) is a popular concept among the AI community to support supervised learning applications in situations where only incomplete knowledge is available. We propose an original reformulation of the MIL concept for the unsupervised context (UMIL), which can serve as a broader framework for clustering data objects adequately described by the multiple-instance representation. Three algorithmic solutions are suggested by derivation from available conventional methods: agglomerative or partition clustering and MIL’s citation-kNN approach. Based on standard clustering quality measures, we evaluated these algorithms within a bioinformatic framework to perform a functional profiling of two genomic data sets, after relating expression data to biological annotations into an UMIL representation. Our analysis spotlighted meaningful interaction patterns relating biological processes and regulatory pathways into coherent functional modules, uncovering profound features of the biological model. These results indicate UMIL’s usefulness in exploring hidden behavioral patterns from complex data.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1-2), 31–71 (1997)
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: NIPS (1997)
Wang, J., Zucker, J.D.: Solving the multiple-instance problem: A lazy learning approach. In: ICML, pp. 1119–1126 (2000)
Chevaleyre, Y., Zucker, J.D.: Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. Application to the mutagenesis problem. In: Canadian Conference on AI, pp. 204–214 (2001)
Zhang, Q., Goldman, S.A.: Em-dd: An improved multiple-instance learning technique. In: NIPS, pp. 1073–1080 (2001)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, pp. 561–568 (2002)
Goldman, S.A., Scott, S.D.: Multiple-instance learning of real-valued geometric patterns. Ann. Math. Artif. Intell. 39(3), 259–290 (2003)
Tao, Q., Scott, S., Vinodchandran, N.V., Osugi, T.T.: SVM-based generalized multiple-instance learning via approximate box counting. In: ICML (2004)
Tao, Q., Scott, S.D.: A faster algorithm for generalized multiple-instance learning. In: FLAIRS Conference (2004)
Ray, S., Craven, M.: Supervised versus multiple instance learning: an empirical comparison. In: ICML 2005 Conference (2005)
Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.: Content-based image retrieval using multiple-instance learning. In: ICML, pp. 682–689 (2002)
Zhou, Z.H., Jiang, K., Li, M.: Multi-instance learning based web mining. Appl. Intell. 22(2), 135–147 (2005)
Brown, J., Zhang, J., Scott, S.: On generalized multiple-instance learning. Technical report, University of Nebraska (2003)
Yang, J.: Review of multi-instance learning and its applications. Technical report, School of Computer Science Carnegie Mellon University (2005)
Dooly, D.R., Zhang, Q., Goldman, S.A., Amar, R.A.: Multiple-instance learning of real-valued data. Journal of Machine Learning Research 3, 651–678 (2002)
Butte, A., Kohane, I.: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In: Pac. Symp. Biocomput., pp. 418–429 (2000)
Zhou, X., Wang, X., Dougherty, E., Russ, D., Suh, E.: Gene clustering based on clusterwide mutual information. J. Comput. Biol. 11(1), 147–161 (2004)
Murtagh, F.: Multidimensional clustering algorithms. In: Physica-Verlag, V. (ed.) COMPSTAT Lectures 4 (1985)
Kaufman, L., Rousseuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons Inc., Chichester (1990)
Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA (2002)
Azuaje, F., Bolshakova, N.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)
Cancello, R., Henegar, C., Viguerie, N., Taleb, S., Poitou, C., Rouault, C., Coupaye, M., Pelloux, V., Hugol, D., Bouillot, J., Bouloumie, A., Barbatelli, G., Cinti, S., Svensson, P., Barsh, G., Zucker, J., Basdevant, A., Langin, D., Clement, K.: Reduction of macrophage infiltration and chemoattractant gene expression changes in white adipose tissue of morbidly obese subjects after surgery-induced weight loss. Diabetes 54(8), 2277–2286 (2005)
Feve, B.: Adipogenesis: cellular and molecular aspects. Best Pract. Res. Clin. Endocrinol Metab. 19(4), 483–499 (2005)
Pedersen, T., Kowenz-Leutz, E., Leutz, A., Nerlov, C.: Cooperation between C/EBPalpha TBP/TFIIB and SWI/SNF recruiting domains is required for adipocyte differentiation. Genes. Dev. 15(23), 3208–3216 (2001)
Charriere, G., Cousin, B., Arnaud, E., Andre, M., Bacou, F., Penicaud, L., Casteilla, L.: Preadipocyte conversion to macrophage. Evidence of plasticity. J. Biol. Chem. 278(11), 9850–9855 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Henegar, C., Clément, K., Zucker, JD. (2006). Unsupervised Multiple-Instance Learning for Functional Profiling of Genomic Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_21
Download citation
DOI: https://doi.org/10.1007/11871842_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)