1 Introduction

Recent advances in non-invasive in-vivo neuroimaging technology offers a new insight to see a large-scale map of structural and function connections in the whole brain at the individual level. The ensemble of macroscopic brain connections can then be described as a complex network - the ‘connectome’ [1]. Graph theory, a powerful framework to characterize diverse properties of the complex network, has been widely investigated to understand the characteristics of network organization that are related to brain development or the neurobiological mechanism of brain disorders.

In the last several decades, various computational approaches have been proposed to construct volumetric image atlas by (1) registering all individual images to a common space and (2) averaging the voxel-wise values (such as intensity degrees and tissue probabilities) across the normalized subjects [2]. The image atlas acts a very critical role in neuroimaging studies since it provides a common reference space to compare and discover the alteration in human brain. In this regard, the connectome atlas, a population-wise average of brain networks, is of high necessity to reveal and characterize the network differences across clinical cohorts.

However, it is not straightforward to apply the above image atlas construction method to brain networks in the setting of graph, since graph, a data representation of brain network, holds an irregular data geometry. Currently, most of the work on brain connectome atlas focus on the modeling of nodes in the connectome atlas such as fine-grained region parcellations based on the connectivity patterns [3, 4]. For example, a connectivity-based parcellation framework is proposed in [4] to better characterize functional connectivity information, resulting in a node-wise “brain connectome atlas” with 210 cortical and 36 subcortical regions. Although connectivities is much more important and relevant to characterize network alterations than nodes in the connectome atlas, little attention has been paid to investigate the most representative brain connectivities in the population. Among a few connectome atlas work, it is common to average the connectivity values at each link by treating brain network as a data matrix. Albeit simple, such arithmetic average might break down the network topology since brain network is a structured data representation with complicated data geometry. It is worth noting that the matrix diffusion technique was used in [5] to propagate the connectivity information from one subject to another until all networks become similar. Thus, the connectome atlas is the average over the diffused matrices, instead of the original network matrices. However, matrix diffusion suffers from the issue of network degeneration where each node has almost equal probability to connect to all other nodes.

To address the above challenges, we propose a learning-based graph inference framework to unravel the connectome atlas in the population with focus on the population-wise connectivities. To do so, we first turn each brain network into a set of graph signals by diffusion mapping technique [6], which allows us to embed the network into the Euclidean space. After that, we propose a learning-based approach to discover the common network that is prevalent in the population. Specifically, we present a graph-based inference to learn the most representative topology which has the largest consensus with all aligned graph signals in the common space. Since the diffusion mapping can integrate local connectivity to global geometry of the whole network, our learning-based framework offers a new window to construct multi-scale connectome atlas that has the capability to “think globally, fit locally”.

We evaluate the accuracy and robustness of our learning-based connectome atlas construction method on various group comparisons for both neurodevelopmental and neurodegenerative diseases that include Autism Spectrum Disorders (ASD), Obsessive-Compulsive Disorder (OCD), and Fronto-Temporal Dementia (FTD). Compared to the counterpart non-learning based approaches, our connectome atlas shows more reasonable result than the existing methods such as network averaging or network diffusion.

2 Methods

Suppose we have a set of brain networks from \( M \) individual subjects. Each brain network \( {\mathcal{G}}_{m} = \left( {\varvec{V},\varvec{A}_{m} } \right) \) \( \left( {m = 1, \ldots ,M} \right) \) consists of \( N \) nodes \( \varvec{V} = \{ v_{i} |i = 1, \ldots ,N\} \) and the adjacency matrix \( \varvec{A}_{m} = \left[ {a_{ij}^{m} } \right]_{i,j = 1, \ldots ,N} \), where each element \( a_{ij}^{m} \) measures strength of connectivity between two regions. In structural network, \( a_{ij}^{m} \) is related to the number of fibers connecting between two regions. While in functional network, \( a_{ij}^{m} \) measures the synchronization of functional activities in terms of statistical correlation between BOLD (blood oxygen-level dependent) signals. The connectome atlas \( {\mathbb{G}} = \left( {\varvec{V},\varvec{W}} \right) \) represents the center of all individual brain networks in the population, where the population-wise whole brain connectivity is encoded in the adjacency matrix \( \varvec{W} = \left[ {w_{ij} } \right]_{i,j = 1}^{N} \). For convenience, we assume each element \( w_{ij} \) in \( \varvec{W} \) is non-negative.

2.1 Overcome Irregular Data Geometry: From Network to Signals

First, we will address the challenge of irregular data structure by embedding the brain network into a diffusion space. Since the procedure of diffusion mapping is same for each network \( {\mathcal{G}}_{m} \), we omit the variable \( m \) of subject index in Sect. 2.1 for clarity. In general, it is reasonable to assume that each adjacent matrix is symmetric (i.e., \( a_{ij} = a_{ji} \)) and non-negative \( \left( {a_{ij} \ge 0} \right) \). Here, we go one step further to normalize each row of adjacency matrix by \( \varvec{P} = \varvec{D}^{ - 1} \varvec{A} \), resulting a Markov matrix \( \varvec{P} \), where \( \varvec{D} \) is the diagonal matrix with the \( i^{th} \) diagonal element equals \( \sum\nolimits_{j = 1}^{N} {a_{ij} } \). Each element \( p_{ij} \) of \( \varvec{P} \) can be interpreted as a transition probability of single step taken from node \( v_{i} \) to \( v_{j} \) in the network. By taking the power of the Markov matrix, we can increase the number of steps taken. For instance, each element \( p_{ij}^{t} \) in \( \varvec{P}^{t} \) sums up all walks of length \( t \) from node \( v_{i} \) to node \( v_{j} \). As we increase the value of \( t \), we can characterize the connectivity of brain network in a local to global manner, as shown in the top row of Fig. 1(c).

Fig. 1.
figure 1

Measuring node-to-node distance using Euclidean distance (a) and diffusion distance (b). (c) The power of Markov matrix (top) and the corresponding reconstructed network (bottom) using the diffusion distance via diffusion map technique.

Diffusion Distance.

Given the Markov matrix \( \varvec{P} \), the diffusion distance \( d_{t} \) between two nodes is a time-dependent metric that measures the connectivity strength at the specific scale \( t \) as \( d_{t} \left( {v_{i} ,v_{j} } \right) = \sum\nolimits_{k = 1}^{N} {\left( {p_{ik}^{t} - p_{kj}^{t} } \right)^{2} } \). As shown in Fig. 1(a), the Euclidean distance between points \( Q_{1} \) and \( Q_{2} \) is roughly the same as the distance between \( Q_{1} \) and \( Q_{3} \). However, as the diffusion process runs forward, the diffusion distance between \( Q_{1} \) and \( Q_{3} \) is much larger since the pathway between \( Q_{1} \) and \( Q_{3} \) has to go through the bottleneck between two clusters (Fig. 1(b)). Since the diffusion distance encodes the topology of the whole network, it is more robust to noise and spurious connections than the classic Euclidean distance.

Approximate Diffusion Distance in the Euclidean Space.

Denote the set of eigenvectors of \( \varvec{P} \) as \( \left\{ {\psi_{c} } \right\}_{c = 1}^{N} \), with corresponding eigenvalues \( \left\{ {\lambda_{c} } \right\}_{c = 1}^{N} \,\left( {\lambda_{1} \ge \lambda_{2} \ge \ldots \ge \lambda_{N} } \right) \). The diffusion map \( {\varvec{\Psi}}_{t} = \left[ {\xi_{c,t} } \right]_{c = 1}^{N} \) is formed by a set of orthogonal column vectors, where \( \xi_{c,t} = \lambda_{c}^{t} \psi_{c} \) is a time-dependent basis and \( \left\{ {\xi_{c,t} } \right\}_{c = 1}^{N} \) form the spectrum bases of underlying network. Meanwhile, each row vector in \( {\varvec{\Psi}}_{t} \) can be regarded as the diffusion embedding vector of the corresponding network node where each element is the projection coefficient of each spectrum basis. Thus, the diffusion distance \( d_{t} \left( {v_{i} ,v_{j} } \right) \) can be approximated by the \( \ell_{2} \)-norm distance in the Euclidean space as: \( d_{t} \left( {v_{i} ,v_{j} } \right) = \mathop \sum \limits_{c = 1}^{N} \left\| {\xi_{c,t} \left( i \right) - \xi_{c,t} \left( j \right)} \right\|^{2} \). It is clear that nodes that are strongly connected in the network should have similar diffusion embedding vectors. The bottom row in Fig. 1(c) displays the reconstructed networks using the approximated diffusion distance at scale \( t = 0 \), \( t = 1 \), \( t = 2 \) and \( t = 12 \). Compared to the corresponding power of Markov matrix in the top row, the diffusion map can efficiently embed the brain network from the irregular data domain to the Euclidean space where the Euclidean distance is equal to the diffusion distance.

Instead of considering each \( \xi_{c,t} \) as a data array, we further regard \( \xi_{c,t} \) as a graph signal [7] in the following text, where the element-to-element relationship in each \( \xi_{c,t} \) is interpreted in the context of the latent common network \( {\mathbb{G}}_{t} \) (shown in Fig. 2(c)). In other word, the graph signal \( \xi_{c,t} \) describes the characteristics of column vector \( \xi_{c,t} \) that resides on common network \( {\mathbb{G}}_{t} \). Next, we present the learning-based graph inference model for connectome atlas by learning the Laplacian matrix of common network \( {\mathbb{G}}_{t} \) from a set of graph signals across individual networks at the scale \( t \).

Fig. 2.
figure 2

Overview of our learning-based graph inference model for connectome atlas. In this example, we assume each network has seven nodes, i.e., \( N = 7 \).

2.2 Learn Graph Laplacian of Common Network: From Signals to Network

Since the graph signals \( \left\{ {\xi_{c,t}^{m} } \right\} \) are defined in Euclidean space, it is straight forward to construct common network by averaging them across individuals. However, the graph signals \( \left\{ {\xi_{c,t}^{m} } \right\} \) are not aligned across different subjects yet due to different \( \xi_{c,t}^{m} \)s corresponds to different Eigen values. Simple average might introduce the spurious and irrelevant connectivity information. To overcome this limitation, we propose a learning-based approach to learn the Laplacian matrix of common network \( {\mathbb{G}}_{t} \) from a set of graph signals \( \left\{ {\xi_{c,t}^{m} } \right\} \), which mainly consists of two alternative steps (shown in Fig. 2).

  • Step 1: Modulate graph signals to the common graph spectrum space. We use \( \varvec{L}_{t} \) denote the Laplacian matrix of latent common network \( {\mathbb{G}}_{t} \). Then, \( {\varvec{\Phi}}_{t} = \left[ {\varphi_{c,t} } \right]_{c = 1}^{N} \) is a \( N \times N \) left eigenvector matrix of the Laplacian matrix \( \varvec{L}_{t} \), where \( \varphi_{c,t} \) is the column vector corresponding the \( c^{th} \) Eigen value of \( \varvec{L}_{t} \). Recall that each \( \xi_{c,t}^{m} \) represents the oscillation mode. In each \( c^{th} \) mode, \( \xi_{c,t}^{m} \) and \( \xi_{c,t}^{{m^{\prime}}} \) \( \left( {m \ne m^{'} } \right) \) may not have the same Eigen value (frequency) to each other. Therefore, we first need to align each graph signals \( \xi_{c,t}^{m} \) at the \( c^{th} \) Eigen mode by modulating them to the corresponding Eigen basis \( \varphi_{c,t} \) of latent common network \( {\mathbb{G}}_{t} \) by: \( \tilde{\xi }_{c,t}^{m} \left( n \right) = \sqrt N \xi_{c,t}^{m} \left( n \right)\varphi_{c,t} \left( n \right) \) [7]. Thus, all modulated graph signals \( \left\{ {\tilde{\xi }_{c,t}^{m} } \right\} \) are aligned to the same frequency band where the common network holds at \( c^{th} \) mode, as shown in Fig. 2(e).

  • Step 2: Learn Laplacian matrix of common network. First, we seek for the representative graph signal \( y_{c,t} \) which is close to all aligned graph signals \( \left\{ {\tilde{\xi }_{c,t}^{m} } \right\}_{m = 1}^{M} \) at the corresponding \( c^{th} \) Eigen mode. Second, we examine the consensus between the common network geometry and the current representative graph signal \( y_{c,t} \) by minimizing the value of graph smoothness in the sense that the \( i^{th} \) and \( j^{th} \) elements in \( y_{c,t} \) should have similar values if nodes \( v_{i} \) and \( v_{j} \) are strongly connected in the common network \( {\mathbb{G}}_{t} \). The graph smoothness under the condition of \( {\mathbb{G}}_{t} \) can be measured in terms of a quadratic form of the graph Laplacian \( y_{c,t}^{T} \varvec{L}_{t} y_{c,t} \) [7]. Third, we apply matrix Frobenius norm \( \left\| {\varvec{L}_{t} } \right\|_{F}^{2} \) and the trace norm constraint (i.e., \( tr\left( {\varvec{L}_{t} } \right) = N \)) to guarantee \( \varvec{L}_{t} \) is a valid Laplacian matrix. By combining these terms, the overall energy function for learning the Laplacian matrix of common network \( {\mathbb{G}}_{t} \) is:

$$ \mathop {\hbox{min} }\nolimits_{{\left\{ {y_{c} } \right\},\varvec{L}_{t} }} \sum\nolimits_{c = 1}^{N} {\sum\nolimits_{m = 1}^{M} {\left\| {y_{c,t} - \tilde{\xi }_{c,t}^{m} } \right\|_{2}^{2} } } + \alpha \sum\nolimits_{c = 1}^{N} {y_{c,t}^{T} } \varvec{L}_{t} y_{c,t} + \beta \left\| {\varvec{L}_{t} } \right\|_{F}^{2} ,\,{\text{s}}.{\text{t}}.\,tr\left( {\varvec{L}_{t} } \right) = N, $$
(1)

where \( \alpha \) and \( \beta \) are two scalars controlling the strength of the constraints on graph smoothness and the Laplacian matrix.

Optimization.

We optimize \( \left\{ {y_{c,t} } \right\} \) and \( \varvec{L}_{t} \) in an alternative manner. First, by discarding unrelated terms w.r.t. \( y_{c,t} \) in Eq. (1), the optimization of each \( y_{c,t} \) is a classic Tikhonov regularization problem. We have closed form solution as: \( \widehat{y}_{c,t} = \left( {\alpha \varvec{L}_{t} + \varvec{I}_{N \times N} } \right)^{ - 1} \left( {\sum\nolimits_{m = 1}^{M} {\widehat{\xi }_{c,t}^{m} } } \right) \).

Second, by fixing \( \left\{ {y_{c,t} } \right\} \), we can optimize \( \varvec{L}_{t} \) by minimizing \( \alpha \sum\nolimits_{c = 1}^{N} {y_{c,t}^{T} \varvec{L}y_{c,t} } + \beta \left\| {\varvec{L}_{t} } \right\|_{F}^{2} \). Considering each element in the adjacency matrix \( \varvec{W}_{t} \) of common network is non-negative, it is relatively easier to search for a valid adjacency matrix \( \varvec{W}_{t} \), instead of \( \varvec{L}_{t} \). In [8], minimizing \( \left\| {\varvec{L}_{t} } \right\|_{F}^{2} \) is equivalent to minimize \( \left\| {\varvec{W}_{t} } \right\|_{F}^{2} \) and the degree vector \( \varvec{W}_{t} 1 \) as:

$$ { \arg }\mathop {\hbox{min} }\nolimits_{{\varvec{W}_{t} }} \sum\nolimits_{c = 1}^{N} {\left[ {\sum\nolimits_{i,j = 1}^{N} {w_{t} \left( {i,j} \right)\left\| {y_{c,t} \left( i \right) - y_{c,t} \left( j \right)} \right\|_{2}^{2} } } \right]} + \gamma \left\| {\varvec{W}_{\varvec{t}} } \right\|_{F}^{2} + \gamma \left\| {\varvec{W}_{t} 1} \right\|_{2}^{2} , $$
(2)

where \( \gamma = \beta /\alpha \) controls the sparsity of \( \varvec{W}_{t} \). We first convert Eq. 2 to an augmented Lagrangian function with KKT (Karush Kuhn Tunker) multipliers and then optimize \( \varvec{W}_{t} \) using the classic ADMM (Alternating Direction Method of Multipliers) [8].

Note, we repeat the same procedure for different scale t and we are able to obtain multi-scale connectome atlases \( \left\{ {{\mathbb{G}}_{t} |t = 1, \ldots ,T} \right\} \).

3 Experiments

As shown in Eq. 1, \( \alpha \) and \( \beta \) are the only parameters required in the optimization. We fix \( a = 0.5 \) and \( \beta = 1.0 \) in all experiments by the grid search. The counterpart methods include (1) matrix averaging on original networks (called “matrix averaging”), and (2) graph-based averaging with matrix diffusion [5] (called “diffusive shrinking graphs”). Note, none of the counterpart methods is learning-based.

Description of Datasets.

There are in total three datasets used in the following experiments. (1) ASD dataset. We use the resting state fMRI data of 45 ASD patient and 47 healthy controls (HC) from the NYU Langone Medical Center of Autism Brain Imaging Data Exchange (ABIDE) database. (2) OCD dataset. The resting state fMRI data includes 65 HC subjects, 62 OCD subjects, and 73 unaffected first-degree relatives of OCD subjects. (3) FTD dataset. We select 90 FTD subjects and 101 age-matched HC subjects from the recent NIFD database which is managed by the frontotemporal lobar degeneration neuroimaging initiative. For all imaging data, we follow the partitions of AAL template and construct the function network for each subject with 90 nodes. The region-to-region connection is measured by Pearson’s correlation coefficient.

Evaluation of the Replicability.

In this experiment, we use the ASD dataset to evaluate the replicability of connectome atlas. For clarity, we only focus on one scale \( \left( {t = 1} \right) \) in our learning-based method. For each connectome atlas construction method, we apply the following resample procedure for 1,000 times: (1) randomly sample 70 networks from in total 92 networks; (2) continue sample another two sets of networks separately, each with 5 networks; (3) form two paired cohorts by combining the networks sampled in step 1 and 2; (4) run the underlying atlas construction method on two datasets independently. Since two paired cohorts only have 6.7% differences in terms of subjects, we can evaluate the replicability of connectome atlas method by examining whether the resulting connectome atlases show significant difference via paired t-test at each link \( \left( {p < 0.01} \right) \). Figure 3 visualizes the connectivities that present statistical significant difference between two largely overlapped datasets by three different connectome atlas construction methods, where the thickness and color of edge indicates the magnitude of difference (t score) after paired t-test. To display the locations of connectivities showing statistical difference in the whole brain, we map the indices (yellow dots) of connectives showing significant difference under the resampling test. Since our learning-based method leverages the global information such as network topology during atlas construction, our method shows better replicability than other methods.

Fig. 3.
figure 3

The replicability test for three atlas construction methods after resampling test. The thickness of each link indicates the magnitude of difference after paired t-test.

Evaluation of the Accuracy.

We apply matrix averaging, diffusive shrinking graph, and our learning-based connectome atlas construction method to overall 92 functional networks from ABIDE database, 200 functional networks from OCD dataset, and 191 functional networks from NIFD database. The connectome atlases by matrix averaging, diffusive shrinking graphs, and our learning-based method \( \left( {t = 1} \right) \) are shown in Fig. 4(a)–(c), respectively. Through visual inspection, the connectome atlases built by our learning-based method show more distinct modular structure than the counterpart methods. Furthermore, we calculate the network distance between each network to the connectome atlas using the Gromov-Hausdorff distance [9]. As shown in Fig. 4(d), the overall network distance with respect to the connectome atlas by our learning-based method is much smaller than other two counterpart methods, indicating that our connectome atlas is more reasonable since our connectome atlas is more close to the population center which is supposed to have the minimal discrepancy to all other subjects.

Fig. 4.
figure 4

Connectome atlases on ASD (top), OCD (middle), and FTD dataset (bottom) built by matrix averaging (a), diffusive shrinking graph (b), and our learning-based method (c). The statistics of average GH distance between the connectome atlas and individual functional brain network is shown in (d). (e) multi-scale connectome atlases for OCD population.

Multi-scale Connectome Atlas.

In Fig. 4(e), we display the multi-scale connectome atlases for 62 OCD patients (bottom), 73 OCD siblings (middle), and 65 age-matched HC subjects (top), where the scale \( t \) ranges from 0.5 to 10. It is clear that (1) high-level information of network organization such as modules has been captured at the global scale; (2) the difference of network organization between OCD and HC cohorts becomes more and more distinct as the scale \( t \) increases. Also, it is apparent that the multi-scale connectome atlas from OCD siblings cohort is more similar to HC cohort than OCD cohort since these siblings subjects have not been diagnosed OCD yet.

4 Conclusion

The construction of connectome atlas is much more challenging than image atlas, due to the irregular graph structure. To address this problem, we proposed a learning-based approach to discover the common network in the setting of graphs. Our proposed method offers a new window to investigate the population-wise whole brain connectivity profiles in human brain. Promising results have been obtained in various brain network studies such as ASD, OCD, and FTD, which indicates the applicability of our proposed learning-based connectome atlas construction methods in neuroimaging applications.