Abstract
Detecting quasi-cliques in graphs is a useful tool for detecting dense clusters in graph-based data mining. Particularly in large-scale data sets that are error-prone, cliques are overly restrictive and impractical. Quasi-clique detection has been accomplished using heuristic approaches in various applications of graph-based data mining in protein interaction networks, gene co-expression networks, and telecommunication networks. Quasi-cliques are not hereditary, in the sense that every subset of a quasi-clique need not be a quasi-clique. This lack of heredity introduces interesting challenges in the development of exact algorithms to detect maximum cardinality quasi-cliques. The only exact approaches for this problem are limited to two mixed integer programming formulations that were recently proposed in the literature. The main contribution of this article is a new combinatorial branch-and-bound algorithm for the maximum quasi-clique problem.
Similar content being viewed by others
References
Abello, J., Pardalos, P. M., & Resende, M. G. C. (1999). On maximum clique problems in very large graphs. In J. Abello & J. Vitter (Eds.), DIMACS series on discrete mathematics and theoretical computer science: Vol. 50. External memory algorithms and visualization (pp. 119–130). Providence: American Mathematical Society.
Abello, J., Resende, M. G. C., & Sudarsky, S. (2002). Massive quasi-clique detection. In S. Rajsbaum (Ed.), LATIN 2002: proceedings of the 5th Latin American symposium on theoretical informatics (pp. 598–612). London: Springer.
Adamic, L., & Huberman, B. (2000). Power-law distribution of the World Wide Web. Science, 287, 2115a.
Alba, R. D. (1973). A graph-theoretic definition of a sociometric clique. The Journal of Mathematical Sociology, 3(1), 113–126.
Almaas, E., & Barabási, A. L. (2006). Power laws in biological networks. In E. Koonin, Y. I. Wolf, & G. P. Karev (Eds.), Power laws, scale-free networks and genome biology (pp. 1–11). New York: Springer.
Balasundaram, B., Butenko, S., & Trukhanov, S. (2005). Novel approaches for analyzing biological networks. Journal of Combinatorial Optimization, 10(1), 23–39.
Balasundaram, B., Butenko, S., & Hicks, I. V. (2011). Clique relaxations in social network analysis: the maximum k-plex problem. Operations Research, 59(1), 133–142.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Barabási, A. L., Albert, R., & Jeong, H. (2000). Scale-free characteristics of random networks: the topology of the World Wide Web. Physica. A, 281(1–4), 69–77.
Batagelj, V., & Mrvar, A. (2006). Pajek datasets: Reuters terror news network. Online: http://vlado.fmf.uni-lj.si/pub/networks/data/CRA/terror.htm. Accessed March 2008.
Boginski, V., Butenko, S., & Pardalos, P. M. (2003). On structural properties of the market graph. In A. Nagurney (Ed.), Innovation in financial and economic networks. London: Edward Elgar.
Boginski, V., Butenko, S., & Pardalos, P. (2006). Mining market data: a network approach. Computers & Operations Research, 33, 3171–3184.
Bomze, I. M., Budinich, M., Pardalos, P. M., & Pelillo, M. (1999). The maximum clique problem. In D. Z. Du & P. M. Pardalos (Eds.), Handbook of combinatorial optimization (pp. 1–74). Dordrecht: Kluwer Academic.
Broido, A., & Claffy, K. C. (2001). Internet topology: connectivity of IP graphs. In S. Fahmy & K. Park (Eds.), Scalability and traffic control in IP networks (pp. 172–187). Bellingham: SPIE.
Brunato, M., Hoos, H., & Battiti, R. (2008). On effectively finding maximal quasi-cliques in graphs. In V. Maniezzo, R. Battiti, & J. P. Watson (Eds.), Lecture notes in computer science: Vol. 5313. Learning and intelligent optimization (pp. 41–55). Berlin: Springer.
Bu, D., Zhao, Y., Cai, L., Xue, H., Zhu, X., Lu, H., Zhang, J., Sun, S., Ling, L., Zhang, N., Li, G., & Chen, R. (2003). Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research, 31(9), 2443–2450.
Carlson, J. M., & Doyle, J. (1999). Highly optimized tolerance: a mechanism for power laws in designed systems. Physical Review E, 60(2), 1412–1427.
Carraghan, R., & Pardalos, P. (1990). An exact algorithm for the maximum clique problem. Operations Research Letters, 9, 375–382.
Chung, F., & Lu, L. (2006). CBMS lecture series. Complex graphs and networks. Providence: American Mathematical Society.
Cook, D. J., & Holder, L. B. (2000). Graph-based data mining. IEEE Intelligent Systems, 15(2), 32–41.
Corman, S., Kuhn, T., McPhee, R., & Dooley, K. (2002). Studying complex discursive systems: centering resonance analysis of organizational communication. Human Communication Research, 28(2), 157–206.
Corneil, D., & Perl, Y. (1984). Clustering and domination in perfect graphs. Discrete Applied Mathematics, 9, 27–39.
Dimacs (1995). Cliques, coloring, and satisfiability: second Dimacs implementation challenge. Online: http://dimacs.rutgers.edu/Challenges/. Accessed March 2007.
Erdös, P., & Rényi, A. (1959). On random graphs. Publicationes Mathematicae, 6, 290–297.
Faloutsos, M., Faloutsos, P., & Faloutsos, C. (1999). On power-law relationships of the Internet topology. In Proceedings of the ACM-SIGCOMM conference on applications, technologies, architectures, and protocols for computer communication, Cambridge (pp. 251–262).
Feige, U., Kortsarz, G., & Peleg, D. (2001). The dense k-subgraph problem. Algorithmica, 29, 410–421.
Gagneur, J., Krause, R., Bouwmeester, T., & Casari, G. (2004). Modular decomposition of protein-protein interaction networks. Genome Biology, 5(8), R57.
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12), 7821–7826.
Grossman, J., Ion, P., & Castro, R. D. (1995). The Erdös number project. Online: http://www.oakland.edu/enp/. Accessed March 2007.
IBM Corporation (2010). IBM ILOG CPLEX Optimizer 12.2. http://www.ibm.com/software/integration/optimization/cplex-optimizer/. IBM Academic Initiative. Accessed June 2011.
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., & Sakaki, Y. (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America, 98(8), 4569–4574.
Jiang, D., & Pei, J. (2009). Mining frequent cross-graph quasi-cliques. ACM Transactions on Knowledge Discovery from Data, 2(4), 16.
Kortsarz, G., & Peleg, D. (1993). On choosing a dense subgraph. In Proceedings of the 34th annual IEEE symposium on foundations of computer science (pp. 692–701). Piscataway: IEEE Comput. Soc.
Kreher, D. L., & Stinson, D. R. (1998). Combinatorial algorithms: generation, enumeration, and search (1st ed.). Boca Raton: CRC Press.
Leskovec, J., & Horvitz, E. (2008). Planetary-scale views on a large instant-messaging network. In Proceeding of the 17th international conference on World Wide Web. WWW ’08 (pp. 915–924). New York: ACM.
Lu, H., Zhu, X., Liu, H., Skogerb, G., Zhang, J., Zhang, Y., Cai, L., Zhao, Y., Sun, S., Xu, J., Bu, D., & Chen, R. (2004). The interactome as a tree—an attempt to visualize the protein-protein interaction network in yeast. Nucleic Acids Research, 32(16), 4804–4811.
Luce, R. D. (1950). Connectivity and generalized cliques in sociometric group structure. Psychometrika, 15(2), 169–190.
Mokken, R. J. (1979). Cliques, clubs and clans. Quality and Quantity, 13(2), 161–173.
Östergård, P. R. J. (2002). A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 120, 197–207.
Patillo, J., Veremyev, A., Butenko, S., & Boginski, V. (2012). On the maximum quasi-clique problem. Discrete Applied Mathematics. doi:10.1016/j.dam.2012.07.019.
Pei, J., Jiang, D., & Zhang, A. (2005a). Mining cross-graph quasi-cliques in gene expression and protein interaction data. In Proceedings of the 21st international conference on data engineering. ICDE 2005 (pp. 353–356).
Pei, J., Jiang, D., & Zhang, A. (2005b). On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. KDD ’05 (pp. 228–238). New York: ACM.
Peng, X., Langston, M. A., Saxton, A. M., Baldwin, N. E., & Snoddy, J. R. (2007). Detecting network motifs in gene co-expression networks through integration of protein domain information. In P. McConnell, S. M. Lin, & P. Hurban (Eds.), Methods of microarray data analysis V (pp. 89–102). New York: Springer.
Seidman, S. B., & Foster, B. L. (1978). A graph theoretic generalization of the clique concept. The Journal of Mathematical Sociology, 6, 139–154.
Simonite, T. (2011). Bracing for the data deluge. http://www.technologyreview.com/business/37506/. Accessed May 2011.
Spirin, V., & Mirny, L. A. (2003). Protein complexes and functional modules in molecular networks. Proceedings of the National Academy of Sciences of the United States of America, 100(21), 12123–12128.
Washio, T., & Motoda, H. (2003). State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter, 5(1), 59–68.
West, D. (2001). Introduction to graph theory. Upper Saddle River: Prentice-Hall.
Zeng, Z., Wang, J., Zhou, L., & Karypis, G. (2006). Coherent closed quasi-clique discovery from large dense graph databases. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’06 (pp. 797–802). New York: ACM.
Zeng, Z., Wang, J., Zhou, L., & Karypis, G. (2007). Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Transactions on Database Systems, 32, 13.
Acknowledgements
The authors would like to thank the two anonymous referees for their comments that improved the content and presentation of this article, and would also like to thank Dr. Sergiy Butenko for sharing the manuscript of Patillo et al. (2012) with us. The computational experiments reported in this article were performed at the Oklahoma State University High Performance Computing Center (OSUHPCC). The authors are grateful to Dr. Dana Brunson for her support in conducting these experiments at OSUHPCC. This research was supported by the US Department of Energy Grant DE-SC0002051 and the Air Force Office of Scientific Research Grant FA9550-12-1-0103.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahdavi Pajouh, F., Miao, Z. & Balasundaram, B. A branch-and-bound approach for maximum quasi-cliques. Ann Oper Res 216, 145–161 (2014). https://doi.org/10.1007/s10479-012-1242-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-012-1242-y