Abstract
The detection of community structures within network data is a type of graph analysis with increasing interest across a broad range of disciplines. In a network, communities represent clusters of nodes that exhibit strong intra-connections or relationships among nodes in the cluster. Current methodology for community detection often involves an algorithmic approach, and commonly partitions a graph into node clusters in an iterative manner before some stopping criterion is met. Other statistical approaches for community detection often require model choices and prior selection in Bayesian analyses, which are difficult without some amount of data inspection and pre-processing. Because communities are often fuzzily-defined human concepts, an alternative approach is to leverage human vision to identify communities. The work presents a tool for community detection in form of a web application, called gravicom, which facilitates the detection of community structures through visualization and direct user interaction. In the process of detecting communities, the gravicom application can serve as a standalone tool or as a step to potentially initialize (and/or post-process) another community detection algorithm. In this paper we discuss the design of gravicom and demonstrate its use for community detection with several network data sets. An “Appendix” describes details in the technical formulation of this web application built on the R package Shiny and the JavaScript library D3.
Similar content being viewed by others
References
Airoldi EM, Blei DM, Fienberg SE, Xing EP (2009) Mixed membership stochastic blockmodels. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21. Curran Associates Inc, Red Hook, pp 33–40
Amini AA, Chen A, Bickel PJ, Levina E (2013) Pseudo-likelihood methods for community detection in large sparse networks. Ann Stat 41(4):2097–2122
Ball B, Karrer B, Newman MEJ (2011) Efficient and principled method for detecting communities in networks. Phys Rev E 84(036):103
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154
Bickel P, Choi D, Chang X, Zhang H (2013) Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann Stat 41(4):1922–1943
Bostock M, Ogievetsky V, Heer J (2011) D3: data-driven documents. IEEE Trans Vis Computer Gr 17(12):2301–2309
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(066):111
Couture-Beil A (2013) rjson: JSON for R. http://CRAN.R-project.org/package=rjson, R package version 0.2.12
Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst :1695, http://igraph.sf.net
Cukier J (2012) Events in the game of thrones. http://www.jeromecukier.net/projects/agot/events.html
Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(027):104
Dunne C, Shneiderman B (2013) Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, NY, USA, CHI ’13, pp 3247–3256
Dwyer T, Lee B, Fisher D, Quinn KI, Isenberg P, Robertson G, North C (2009) A comparison of user-generated and automatic graph layouts. IEEE Trans Vis Computer Gr 15(6):961–968
Fortunato S (2010) Community detection in graphs. Phys Rep 486(35):75–174
Gandrud C (2014) d3Network: tools for creating D3 JavaScript network, tree, dendrogram, and Sankey graphs from R. http://CRAN.R-project.org/package=d3Network, R package version 0.5.1
Gansner ER, North SC (2000) An open graph visualization system and its applications to software engineering. Softw Pract Exp 30(11):1203–1233
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826
Greene D, Doyle D, Cunningham P (2010) Tracking the evolution of communities in dynamic social networks. In: Advances in Social Networks Analysis and Mining (ASONAM), international conference on 2010, pp 176–183
Guo J, Wilson AG, Nordman DJ (2013) Bayesian nonparametric models for community detection. Technometrics 55(4):390–402
Hansen D, Shneiderman B, Smith MA (2010) Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann, Burlington
Hofman JM, Wiggins CH (2008) Bayesian approach to network modularity. Phys Rev Lett 100(258):701
Holland PW, Leinhardt S (1981) An exponential family of probability distributions for directed graphs. J AmStat Assoc 76(373):33–50
Karrer B, Newman MEJ (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83(016):107
Kawadia V, Sreenivasan S (2012) Sequential detection of temporal communities by estrangement confinement. Sci Rep 2, http://www.nature.com/articles/srep00794
Kemp C, Tenenbaum JB, Griffiths TL, Yamada T, Ueda N (2006) Learning systems of concepts with an infinite relational model. In: Proceedings of the 21st national conference on artificial intelligence - Vol 1, AAAI Press, AAAI’06, pp 381–388
Kling F (2014) Jsnetworkx: A javascript port of the networkx graph library. http://felix-kling.de/JSNetworkX/
Krebs V (2004a) Books about US politics. http://networkdata.ics.uci.edu/data.php?d=polbooks
Krebs V (2004b) Divided we stand... still. http://www.orgnet.com/divided2.html
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(046):110
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’08, pp 695–704
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’10, pp 631–640
Martin GR (1996) A game of thrones. Bantam Books, New York
Martin GR (1999) A clash of kings. Bantam Books, New York
Martin GR (2000) A storm of swords. Bantam Books, New York
Martin GR (2005) A feast for crows. Bantam Books, New York
Martin GR (2011) A dance with dragons. Bantam Books, New York
McGrath C, Blythe J, Krackhardt D (1996) Seeing groups in graph layouts. Connections 19(2):22–29
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(026):113
Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087
R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123
RStudio Inc (2013) shiny: Web application framework for R. http://CRAN.R-project.org/package=shiny, R package version 0.4.0
Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, New York
Xie J, Chen M, Szymanski BK (2013) Labelrankt: Incremental community detection in dynamic networks via label propagation. In: Proceedings of the workshop on dynamic networks management and mining, ACM, New York, NY, USA, DyNetMM ’13, pp 25–32
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Author information
Authors and Affiliations
Corresponding author
Technical appendix
Technical appendix
gravicom utilizes three main pieces of software to establish interactive user control of a random graph as sketched out in Fig. 15, which are Shiny, D3, and igraph. These are used, respectively, for server/client interaction management, user interface and graph layout, and data formatting, respectively. In the following subsections, we describe the purposes of these three components in more detail.
There are very minimal software requirements for a user of gravicom. The client simply needs to have a JavaScript enabled internet browser with HTML5 compatibility, something which almost any modern browser fulfills (an exception is IE8 and below).
The server side requirements are more extensive, but this does not affect the user of gravicom, only those wanting to host their own instance of the application. To host gravicom, a Linux server is required, with the following installed:
-
Node.js (0.8.16 or later)
-
R (2.15 or later)
-
Shiny R package, installed into the machine-wide site library.
-
Shiny Server
1.1 Shiny
Shiny RStudio Inc (2013) is an R package created by RStudio that enables R users to create an interactive web application that utilizes R as the background engine. Through default methods to build user interface elements in HTML and a handle to the server side code, Shiny is a simple way to turn R code into a website.
gravicom uses the Shiny functionality to create user controls, pass correctly formatted data to the client, and as a means to display summary information regarding the user’s interactions with a graph at any point in time. In this context, Shiny serves as the translator between the formatted data and what the user sees and interacts with on his screen.
1.2 D3
D3 Bostock et al. (2011) stands for “Data Driven Documents” and is a JavaScript library developed and maintained by Mike Bostock with the purpose of visualizing and interacting with data in a web-based interface. It is freely available from http://www.d3js.org. The library facilitates manipulation of HTML elements, SVG (scalable vector graphics), and CSS (cascading style sheets) with the end goal of rendering animations and providing user interactions that are tied to the underlying data. The key idea behind the library is that Document Object Model elements are completely determined by the data. The Document Object Model (DOM) is a convention for representing and interacting with objects in HTML, XHTML and XML. So, rather than adding elements to a web page to be viewed by users, D3 allows users to see and interact with graphical representations of their data in a web framework.
gravicom uses D3 to handle all graphical displays and user interactions with the graph. The data is passed to the client and able to be used through Shiny’s input bindings. It is crucial that the data has been formatted correctly at this point for the JavaScript to properly function. For this reason, we limit the file types being passed into the tool to a robust graph-specific type.
At this point in the page lifecycle, the graph nodes are tied to circles and the edges are tied to paths on the page. User manipulations such as selecting, dragging, and grouping are handled by D3 and data is passed back to the server via Shiny’s output bindings to allow for communication between user and the R engine underneath. This is illustrated in Fig. 16. What this means is that all visualization and user interaction with the graph are accomplished using JavaScript, more specifically the library D3. Shiny and R serve as the framework on which the data sits, but when the user touches the data they are doing so through the JavaScript elements.
1.3 igraph
igraph Csardi and Nepusz (2006) is a software package used for creating and manipulating undirected and directed graphs. It is a cross-language package available for C, R, python, and Ruby. igraph also supports multiple graph file formats and visualization of graph structures.
gravicom utilizes two parts of igraph, first is the conversion from a gml file to an XML file. The gml file format, short for Graph Modelling Language, is a hierarchical ASCII-based file format for describing graphs. Below is an example gml file of an undirected graph consisting of two nodes linked by a single edge. The important points to note are that node identifiers (id) have to be numeric. An edge consists only of source and target ids of the nodes it connects, while nodes can have other attributes, e.g. value in the example. For a directed graph, the parameter directed has to be set to 1, which will result in the edge information on target and source being evaluated accordingly.
For the conversion from an XML file to a JSON file we make user of the R package rjson Couture-Beil (2013). JSON is the native data format used in D3, which makes working with data in the D3 library incredibly straightforward. Here is our example in the finalized JSON format:
Our example data will yield the graph in Fig. 17.
The second use of igraph within gravicom is to compute initial x and y coordinates for the nodes of the graph using a force-driven layout. This provides the initialization for the force-layout algorithm in D3. This reduces the computational load on the clients’ side and helps minimize unnecessary movement by the nodes. This is critical as the extra movement at the loading of the pages creates an unnecessarily chaotic start to the user’s experience.
Rights and permissions
About this article
Cite this article
Kaplan, A., Hofmann, H. & Nordman, D. An interactive graphical method for community detection in network data. Comput Stat 32, 535–557 (2017). https://doi.org/10.1007/s00180-016-0663-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-016-0663-5