Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity

Joseph T. Lizier^1,2,
Jakob Heinzle³,
Annette Horstmann⁴,
John-Dylan Haynes^3,4,5 &
…
Mikhail Prokopenko^2,6

2939 Accesses
146 Citations
17 Altmetric
1 Mention
Explore all metrics

Abstract

The human brain undertakes highly sophisticated information processing facilitated by the interaction between its sub-regions. We present a novel method for interregional connectivity analysis, using multivariate extensions to the mutual information and transfer entropy. The method allows us to identify the underlying directed information structure between brain regions, and how that structure changes according to behavioral conditions. This method is distinguished in using asymmetric, multivariate, information-theoretical analysis, which captures not only directional and non-linear relationships, but also collective interactions. Importantly, the method is able to estimate multivariate information measures with only relatively little data. We demonstrate the method to analyze functional magnetic resonance imaging time series to establish the directed information structure between brain regions involved in a visuo-motor tracking task. Importantly, this results in a tiered structure, with known movement planning regions driving visual and motor control regions. Also, we examine the changes in this structure as the difficulty of the tracking task is increased. We find that task difficulty modulates the coupling strength between regions of a cortical network involved in movement planning and between motor cortex and the cerebellum which is involved in the fine-tuning of motor control. It is likely these methods will find utility in identifying interregional structure (and experimentally induced changes in this structure) in other cognitive tasks and data modalities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

An Introduction to fMRI

Intrinsic Functional Connectivity is Organized as Three Interdependent Gradients

Article Open access 04 November 2019

Dynamic activity of human brain task-specific networks

Article Open access 12 May 2020

Notes

The TE can be formed as T _k,l(Y →X), where l past states of Y are considered as the information source $y_n^{(l)}=\{ y_n, y_{n-1}, \ldots ,y_{n-l+1} \}$.
Note that the TE is equivalent to the directed transinformation (DTI) measure under certain parameter settings for the DTI (specifically M = 1 and N = 0) as per Hinrichs et al. (2006). Also, note that the TE is equivalent to the specific formulation of the DTI used in Saito and Harashima (1981) if the TE parameter l (discussed in footnote 1) is set equal to k.
Note the TE could be computed in the style of Kraskov et al. (2004) and Kraskov (2004) but with a direct conditional MI calculation as per Frenzel and Pompe (2007).
For example, fMRI regions contain potentially hundreds of voxels.
The following explanation assumes that only one previous state y _n of the source is used in the computation of T _k(Y →X); i.e. the parameter l = 1 (see Schreiber 2000).
We use z-tests in our experiments in Section 4 because we are comparing to very low α values after making Bonferroni corrections (see Section 2.2.2), which would render direct counting quite sensitive to statistical fluctuations.
We analyze the MI with separate matrices.
Note that testing against a binomial distribution is a conservative choice here, because it is less likely to get 6 significant results (5 with positive mean and 1 with negative mean) than to get 4 positive ones only. However, when tested over the group we consider the threshold according to the latter, which is truly binomial.
See Chapter 5 of the PhD thesis which can be downloaded from the German National Library: http://d-nb.info/992989221.
We explain in Appendix B how the number of joint voxels v = 3 was selected to balance the ability to capture multivariate interactions with the limitations of the number of available observations. Also in that appendix, we explore the effect of altering v (including conducting univariate analysis with v = 1). Furthermore, the appendix explores the effect of altering the number of subset pairs S and surrogate measurements P.
As described in Appendix A.3, this simple test does not mean that the right SC → right Cerebellum link is a false positive; it simply does not add evidence against the false positive.
Our use of 140 time steps for each C and χ combination matches the length of fMRI time series analyzed in Section 4.
The minimum strengths required for detection here may seem large at first glance, however one must bear in mind the specific difficulties built into this data set: the non-linear coupling, the small number of samples, and relatively low influence of the Y on X (low χ/ϵ _x). Also our correction for a large number of comparisons is a factor here. This being said, correcting for multiple comparisons provides important protection against false positives so must be maintained when investigating all values of C here.
High memory in the source Z is required for the values z _n (considered by the interregional TE) to contain some information about the previous values z _n − 1 which had an indirect effect on x _n + 1 via y _n.
We expected that high memory in the destinations Y and X and in the common source Z would help preserve information in Y about the source Z which would be helpful to predicting X.
Note that the combination of undersampling and memory in our variables provides a smoothing-type effect on the data. As such, these results imply some level of robustness for the technique against temporal smoothing in the underlying data.
Similarly, only two interregional links were inferred at the group level by the interregional TE with univariate analysis (v = 1) and S = 3,000, P = 300.

References

Bassett, D. S., & Bullmore, E. T. (2009). Human brain networks in health and disease. Current Opinion in Neurology, 22(4), 340–347.
Article PubMed Google Scholar
Bettencourt, L. M. A., Stephens, G. J., Ham, M. I., & Gross, G. W. (2007). Functional structure of cortical neuronal networks grown in vitro. Physical Review E, 75(2), 021915.
Article Google Scholar
Bode, S., & Haynes, J. D. (2009). Decoding sequential stages of task preparation in the human brain. NeuroImage, 45(2), 606–613.
Article PubMed Google Scholar
Bressler, S. L., Tang, W., Sylvester, C. M., Shulman, G. L., & Corbetta, M. (2008). Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. Journal of Neuroscience, 28(40), 10056–10061.
Article CAS PubMed Google Scholar
Büchel, C., & Friston, K. J. (1997). Modulation of connectivity in visual pathways by attention: cortical interactions evaluated with structural equation modelling and fMRI. Cerebral Cortex, 7(8), 768–778.
Article PubMed Google Scholar
Bullier, J. (2001). Integrated model of visual processing. Brain Research Reviews, 36, 96–107.
Article CAS PubMed Google Scholar
Chai, B., Walther, D. B., Beck, D. M., & Fei-Fei, L. (2009). Exploring functional connectivity of the human brain using multivariate information analysis. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 270–278). NIPS Foundation.
Chávez, M., Martinerie, J., & Le Van Quyen, M. (2003). Statistical assessment of nonlinear causality: Application to epileptic EEG signals. Journal of Neuroscience Methods, 124(2), 113–128.
Article PubMed Google Scholar
Frenzel, S., & Pompe, B. (2007). Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters, 99(20), 204101.
Article PubMed Google Scholar
Friston, K. (2002). Beyond phrenology: What can neuroimaging tell us about distributed circuitry? Annual Review of Neuroscience, 25, 221–250.
Article CAS PubMed Google Scholar
Friston, K., Ashburner, J., Kiebel, S., Nichols, T., & Penny, W. (2006). Statistical parametric mapping: The analysis of functional brain images. Elsevier, London.
Google Scholar
Friston, K. J. (1994). Functional and effective connectivity in neuroimaging: A synthesis. Human Brain Mapping, 2, 56–78.
Article Google Scholar
Friston, K. J., & Büchel, C. (2000). Attentional modulation of effective connectivity from V2 to V5/MT in humans. Proceedings of the National Academy of Sciences of the USA, 97(13), 7591–7596.
Article CAS PubMed Google Scholar
Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. Neuroimage, 19(4), 1273–1302.
Article CAS PubMed Google Scholar
Gong, P., & van Leeuwen, C. (2009). Distributed dynamical computation in neural circuits with propagating coherent activity patterns. PLoS Computational Biology, 5(12), e1000611.
Article Google Scholar
Grosse-Wentrup, M. (2008). Understanding brain connectivity patterns during motor imagery for brain-computer interfacing. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 561–568). Curran Associates, Inc.
Handwerker, D. A., Ollinger, J. M., & D’Esposito, M. (2004). Variation of bold hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage, 21(4), 1639–1651.
Article PubMed Google Scholar
Haynes, J. D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523–534.
Article CAS PubMed Google Scholar
Haynes, J. D., Tregellas, J., & Rees, G. (2005). Attentional integration between anatomically distinct stimulus representations in early visual cortex. Proceedings of the National Academy of Sciences of the USA, 102(41), 14925–14930.
Article CAS PubMed Google Scholar
Hinrichs, H., Heinze, H. J., & Schoenfeld, M. A. (2006). Causal visual interactions as revealed by an information theoretic measure and fMRI. NeuroImage, 31(3), 1051–1060.
Article CAS PubMed Google Scholar
Honey, C. J., Kotter, R., Breakspear, M., & Sporns, O. (2007). Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proceedings of the National Academy of Sciences, 104(24), 10240–10245.
Article CAS Google Scholar
Horstmann, A. (2008). Sensorimotor integration in human eye-hand coordination: Neuronal correlates and characteristics of the system. Ph.D. thesis, Ruhr-Universität Bochum.
Johansen-Berg, H., Behrens, T. E., Robson, M. D., Drobnjak, I., Rushworth, M. F., Brady, J. M., et al. (2004). Changes in connectivity profiles define functionally distinct regions in human medial frontal cortex. Proceedings of the National Academy of Sciences of the USA, 101(36), 13335–13340.
Article CAS PubMed Google Scholar
Kantz, H., & Schreiber, T. (1997). Nonlinear time series analysis. Cambridge: Cambridge University Press.
Google Scholar
Kraskov, A. (2004). Synchronization and interdependence measures and their applications to the electroencephalogram of epilepsy patients and clustering of data. In Publication series of the John von Neumann Institute for computing (Vol. 24). Ph.D. thesis, John von Neumann Institute for Computing, Jülich, Germany.
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138.
Article Google Scholar
Liang, H., Ding, M., & Bressler, S. L. (2001). Temporal dynamics of information flow in the cerebral cortex. Neurocomputing, 38–40, 1429–1435.
Article Google Scholar
Lizier, J. T., & Prokopenko, M. (2010). Differentiating information transfer and causal effect. European Physical Journal B, 73(4), 605–615.
Article CAS Google Scholar
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2008). Local information transfer as a spatiotemporal filter for complex systems. Physical Review E, 77(2), 026110.
Article Google Scholar
Logothetis, N., Pauls, J., Augath, M., Trinath, T., & Oeltermann, A. (2001). Neurophysiological investigation of the basis of the fMRI signal. Nature, 412, 150–157.
Article CAS PubMed Google Scholar
Lunenburger, L., Kleiser, R., Stuphorn, V., Miller, L. E., & Hoffmann, K. P. (2001). A possible role of the superior colliculus in eye-hand coordination. Progress in Brain Research, 134, 109–125. 0079-6123 (Print) 0079-6123 (Linking) Journal Article Research Support, Non-U.S. Gov’t Review.
Lungarella, M., Pegors, T., Bulwinkle, D., & Sporns, O. (2005). Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics, 3(3), 243–262.
Article PubMed Google Scholar
MacKay, D. J. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.
Google Scholar
Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.
Article PubMed Google Scholar
Penhune, V. B., & Doyon, J. (2005). Cerebellum and m1 interaction during early learning of timed motor sequences. Neuroimage, 26(3), 801–812.
Article CAS PubMed Google Scholar
Ramsey, J., Hanson, S., Hanson, C., Halchenko, Y., Poldrack, R., & Glymour, C. (2010). Six problems for causal inference from fMRI. NeuroImage, 49(2), 1545–1558.
Article CAS PubMed Google Scholar
Rubinov, M., Knock, S. A., Stam, C. J., Micheloyannis, S., Harris, A. W. F., Williams, L. M., et al. (2009). Small-world properties of nonlinear brain activity in schizophrenia. Human Brain Mapping, 30, 403–416.
Article PubMed Google Scholar
Saito, Y., & Harashima, H. (1981). Tracking of information within multichannel EEG record - causal analysis in EEG. In N. Yamaguchi & K. Fujisawa (Eds.), Recent advances in EEG and EMG data processing (pp. 133–146). Amsterdam: Elsevier/North Holland Biomedical Press.
Google Scholar
Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464.
Article CAS PubMed Google Scholar
Soon, C. S., Brass, M., Heinze, H. J., & Haynes, J. D. (2008). Unconscious determinants of free decisions in the human brain. Nature Neuroscience, 11(5), 543–545.
Article CAS PubMed Google Scholar
Tanaka, Y., Fujimura, N., Tsuji, T., Maruishi, M., Muranaka, H., & Kasai, T. (2009). Functional interactions between the cerebellum and the premotor cortex for error correction during the slow rate force production task: An fmri study. Experimental Brain Research, 193(1), 143–150.
Article Google Scholar
Tung, T. Q., Ryu, T., Lee, K. H., & Lee, D. (2007). Inferring gene regulatory networks from microarray time series data using transfer entropy. In P. Kokol, V. Podgorelec, D. Mičetič-Turk, M. Zorman, & M. Verlič (Eds.), Proceedings of the twentieth IEEE international symposium on computer-based medical systems (CBMS ’07), Maribor, Slovenia (pp. 383–388). Los Alamitos: IEEE.
Chapter Google Scholar
Verdes, P. F. (2005). Assessing causality from multivariate time series. Physical Review E, 72(2), 026222–026229.
Article CAS Google Scholar

Download references

Acknowledgements

JL and JH thank Thorsten Kahnt for discussions on the statistical analysis. JL thanks Mikail Rubinov for helpful suggestions. JL thanks the Australian Research Council Complex Open Systems Research Network (COSNet) for a travel grant that partially supported this work. JDH thanks the Max Planck Society, the Bernstein Computational Neuroscience Program of the German Federal Ministry of Education and Research (BMBF Grant 01GQ0411) and the Excellence Initiative of the German Federal Ministry of Education and Research (DFG Grant GSC86/1-2009). MP is grateful for a 2009 Research Grant from The Max Planck Institute for Mathematics in the Sciences (Leipzig, Germany) on Information-driven Self-Organization and Complexity Measures.

Author contributions: J.-D.H., J.H. and A.H. conceived the fMRI experiment. A.H. performed the fMRI experimental work. J.H. and A.H. pre-processed the data. J.L. and M.P. conceived the information-theoretical analysis. J.L. performed the information-theoretical analysis. J.H. performed the statistical analysis. J.L. and J.H. wrote the paper.

Author information

Authors and Affiliations

School of Information Technologies, The University of Sydney, NSW 2006, Sydney, Australia
Joseph T. Lizier
CSIRO, Information and Communications Technology Centre, PO Box 76, Epping, NSW, 1710, Australia
Joseph T. Lizier & Mikhail Prokopenko
Bernstein Center for Computational Neuroscience, Charité-Universitätsmedizin Berlin, Philippstraße 13, Haus 6, 10115, Berlin, Germany
Jakob Heinzle & John-Dylan Haynes
Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103, Leipzig, Germany
Annette Horstmann & John-Dylan Haynes
Graduate School of Mind and Brain, Humboldt Universität zu Berlin, Luisenstraße 56, 10099, Berlin, Germany
John-Dylan Haynes
Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany
Mikhail Prokopenko

Authors

Joseph T. Lizier
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Heinzle
View author publications
You can also search for this author in PubMed Google Scholar
Annette Horstmann
View author publications
You can also search for this author in PubMed Google Scholar
John-Dylan Haynes
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Prokopenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph T. Lizier.

Additional information

Action Editor: Jonathan David Victor

First two authors contributed equally to this work.

Appendices

Appendix A: Application to numerical data sets

In order to explore the properties of the technique presented in Section 2, we apply it to a number of artificial data sets in this section. In particular, we demonstrate: the efficacy of the technique when applied to small data sets with a small amount of nonlinear, collective coupling; how to use the statistical significance to guide selection of the number of joint voxels under analysis v; some robustness to undersampling, and to inference of directed links where only a logical overlap exists.

1.1 A.1 Collective, non-linear interregional coupling

The primary test of the technique involves two multivariate “regions” of 10 variables, $\mathbf{X}=\left\{ X_1,\ldots,X_{10}\right\} $ and $\mathbf{Y}=\left\{ Y_1,\ldots,Y_{10}\right\}$, in which the variables of Y influence X in a collective, non-linear fashion under a range of coupling strengths. The coupling strength is described by the number of variables C in X which are influenced by those of Y, and the level χ to which those elements in X are determined from Y. For a given C and χ, the value x _i,n + 1 of variable X _i at time step $n+1 = \left\{ 2 \ldots 140 \right\}$ is determined as:

$$ x_{i,n+1} = \Bigg\{ \begin{array}{cccc} \epsilon_x x_{i,n} + & \chi y_{j, n} y_{l, n} + & (1 - \epsilon_x - \chi) g & \textrm{ for } i \leq C \\ \epsilon_x x_{i,n} + & & (1 - \epsilon_x) g & \textrm{ for } i > C \end{array} \label{eq:oneWayCoupling}, $$

(12)

where g is a zero mean white noise process with σ = 1, and j and l are indices of variables Y _j and Y _l in Y randomly selected to provide a joint input to X _i for the duration of the time series. The initial values x _i,n = 1 are determined by the zero mean white noise process g, and we have:

$$ y_{j,n+1} = \epsilon_y y_{j,n} + (1 - \epsilon_y) g \label{eq:oneWayCouplingY}. $$

(13)

Our test data sets thus involve one-way coupling Y →X, where the coupling is determined in a non-linear manner from multiple values within the source region. For our first experiment here, we generate time-series sets X and Y for all combinations of $C=\left\{ 1,\ldots,10\right\}$ and $\chi=\left\{ 0.00,0.05,\ldots,0.30\right\}$ with ϵ _x = 0.7 and ϵ _y = 0.0. With the additional factors of the relatively low influence of Y on X (low χ/ϵ _x) and a small number of observations,^{Footnote 12} this has been specifically designed to be a particularly difficult data set from which to correctly detect a directed interregional link.

We measured the interregional TEs T _k,v(X →Y ) and T _k,v(Y →X ) and interregional MI I _v(X; Y ) with v = 2 and k = 1, using Kraskov-estimators with a window size of the two closest observations. We then computed their statistical significance using the techniques we presented in Section 2 with S = 2025 and P = 100. We correct for multiple comparisons across the many combinations of C and χ in each direction.

Figure 7(a) demonstrates that the interregional TE detects the interregional link Y →X fairly consistently for the data sets with larger numbers of coupled variables C and coupling strengths χ. No false positives are returned in the situation of zero coupling (χ = 0.00) or in the reverse direction X →Y (not shown).

The interregional MI does not detect the directed link at any coupling strength (results not shown). This is because the simultaneous values of X(n) and Y(n) at a given time point n are unrelated (Y(n) influences X(n + 1), but has no relationship to X(n) in this data set). Importantly, it does not produce any false positives here. We also measured the interregional MI with a 1-step time difference between X and Y; this breaks the symmetry of the measure, and makes detection of the influence of Y(n) on X(n + 1) possible. As shown in Fig. 7(b), the statistical significance of this measure detects the influence Y →X at some of the strongest couplings, and returns no false positives for X →Y. The measure is not as effective as the TE however: it correctly infers the influence for a smaller number of data sets (C, χ), and with generally larger and less consistent p-values for these larger (C, χ). This is perhaps because it ignores the mixing of the coupling from Y with the influence of the past of each X _i via the ϵ _x x _i,n terms in Eq. (12). The TE (which accounts for the past of the destination) is more sensitive to this mixing.

The success of our statistical inference with the interregional TE in this particularly difficult example is an important result. This type of non-linear coupling cannot be detected by linear methods (e.g. Granger causality), nor with the non-directional MI. Even when the MI has a directionality induced in it, it is not as sensitive as the TE here. Similarly, we verified that single-variate analysis (with v = 1 voxel) was much less effective: the TE could only detect the regional link at the very largest (C, χ) combination (see Appendix A.2). Finally, we note that a minimum coupling strength is required before detection by our method to ensure statistical significance, which is an important property to protect against false positives.^{Footnote 13}

1.2 A.2 Effect of multivariate analysis

Continuing with the same time series sets X and Y for various (C, χ) from Appendix A.1, we investigate the effect of altering the number of joint variables v included in the measure T _k,v(Y →X ).

Figure 8(a) shows that inference of the directed link Y →X at the larger (C, χ) combinations is stable for v between 2 and 6, with the correct inference made for roughly the same number of (C, χ) data sets here. As a more focused example, Fig. 8(b) shows the relevant p-values versus v for the particular data set (C = 8, χ = 0.25), demonstrating that inference of the directed link could be made here for v between 2 and 7.

Certainly, one would like to maximize number of joint variables v when using T _k,v(Y →X ), since this provides more scope for capturing multi-variate interactions. Also, increasing v even above the number of variables involved in interactions in the data can be advantageous. This is because it raises the proportion of sample sets R _x,i of v variables in the source which include a full set of variables that interact to produce an outcome in the sample destination set R _y,j. For example, increasing v above 2 here raises the proportion of our S sample sets which include both source variables Y _j and Y _l that causally effect one of the selected destination variables X _i (see Eq. (12)).

However, increasing v brings us closer to the limits imposed by the number of observations available to us. This is the case for whichever estimator we choose to use. For example, the Kraskov estimators in use here are known to have their error in measurement increase with the number of joint variables considered for a fixed number of observations (see Fig. 15 in Kraskov et al. 2004). Similarly, spurious relationships can appear more easily in the low-sample limit, making the distribution of measures on the surrogate data sets more spread out, and therefore raising the relevant p-value.

These plots demonstrate that the number of joint variables v can only be increased to a certain level before being limited by the number of available samples. The p-values with respect to v explicitly show where these limits are.

1.3 A.3 Overlapping data sets without direct relationships

We then investigate a number of instances where interregional data sets logically overlap in some way without having a direct relationship. These instances are known to present difficulty for inference techniques, which may infer a directed interregional link when only an indirect relationship is present (as described in “problem 4” in Ramsey et al. (2010)). We explore the conditions under which our technique may be susceptible to making these inferences.

First, we explore the pathway structure Z →Y →X. We generate data sets where the individual relationships between the directed pairs Z →Y and Y →X are each described by Eqs. (12) and (13), with C = 10, ϵ _x,ϵ _y = 0.7 and variable (χ,ϵ _z).

The p-values from our analysis are displayed in Fig. 9. Of course, we find that the actual directed link Z →Y inference depends on the coupling strength (as per Appendix A.1). It does not seem to have a particular dependence on the self-connection or memory ϵ _z in the source (not investigated in Appendix A.1).

Figure 9(b) shows that it is possible for our technique to infer a directed link Z →X where the real underlying relationship Z →Y →X is in fact an indirect pathway through Y. We find that inference of the indirect relationship is much less sensitive than for the direct relationship, occurring only where there is both high coupling χ and high memory ϵ _z in the source.^{Footnote 14} Importantly, we found that p-values for Z →X were always higher (i.e. weaker) than Y →X when both links were inferred.

If Y is not available, then inference of the indirect relationship may be desirable, since it still reveals structure in the available data. Where Y is available though, ideally only Z →Y and Y →X should be inferred. We suggest that extension of the complete transfer entropy (Lizier et al. 2008) to a similar interregional measure (and with similar statistical significance testing) could usefully address this issue. The complete TE conditions out the influence of other possible sources, e.g. $T_k(Y \rightarrow X \mid Z) = I(Y;X' \mid X^{(k)},Z)$. Extending the measure should still infer Y →X (since Y adds information not contained in Z) but not Z →X (since Z does not add any information not contained in Y). We leave extension of this measure and testing of the technique to future work.

Next, we explore the common cause structure with Z →Y and Z →X but no direct relationship between Y and X. We generate data sets where the individual relationships between the directed pairs Z →Y and Z →X are described by Eqs. (12) and (13) with C = 10: and ϵ _x,ϵ _y = 0.7 and variable (χ,ϵ _z) in Fig. 10(a); and alternately ϵ _z = 0.7 and variable (χ,ϵ _x,ϵ _y) in Fig. 10(b).

Figure 10 shows that it is possible for our technique to infer a directed link Y →X (with similar results for X →Y of course) where Y and X are only related by a common cause. We find that the inference only occurs under high coupling χ and high memory ϵ _x,ϵ _y in the destinations of the common cause (Fig. 10(b)), with a possible but less clear dependence on high memory ϵ _z in the common cause (Fig. 10(a)).^{Footnote 15} Crucially though, this inference is much less sensitive than for the relevant direct cause from Z. (The Z →Y relationship for Fig. 10(a) is the same as for the pathway structure, see Fig. 9(a) for results on the direct cause to compare to Fig. 10(a)). As expected, the interregional MI revealed a very strong relationship between Y and X (not shown), e.g. inferring a relationship for all χ > 0 for the data sets in Fig. 10(a).

Similar to our argument regarding the pathway structure, if Z is not available then inference of Y →X and X →Y may be useful in revealing structure in the available data. When the common cause Z is available, this is undesirable though. In this case, we again suggest that extension of the complete transfer entropy to an interregional measure could be expected to eliminate inference of spurious relationships due to a common cause.

Without such an extension in place though, the fact that the false positive links here are much weaker than the relevant actual direct links suggests the use of comparisons amongst connected triplets. That is, where one finds Z →Y, Y →X and Z →X, then:

1.
if Z →X is stronger than Z →Y or Y →X then it is unlikely that Z →X is a pathway type false positive;
2.
if Y →X is stronger than Z →X then it is unlikely that Y →X is a common cause type false positive.

Such comparisons cannot definitively rule out the relevant false positive situation, but can add evidence against the presence of these types of false positives.

1.4 A.4 Undersampling

We also test the technique against data sets which have been undersampled from the raw underlying data. Using the same relationship Y →X defined in Eqs. (12) and (13), we then define $\mathbf{X}^s = \{ X_1^s, \ldots, X_{10}^s \}$ where the constituent time series are undersampled by a factor of s as $X_i^s = \{ x_{i,1}, x_{i,1+s}, x_{i,1+2s}, \ldots \}$. Y ^s is similarly defined, and the technique is then applied to the data sets X ^s and Y ^s. For comparability, we generate 140 samples in the undersampled data sets. We use C = 10, χ = 0.3, and ϵ _x = 0.7.

As shown in Fig. 11, for these parameter values we find that there is some robustness in correct inference of the interregional relationship Y ^s →X ^s using the interregional TE up to an undersampling factor of s = 3. (Again, no significant link was inferred in the reverse direction X ^s →Y ^s). The correct relationship is detected more reliably with: higher source-destination coupling χ (not shown), a smaller undersampling factor s, and higher source memory ϵ _y (see Fig. 11).^{Footnote 16} The undersampled source values including y _j,n under consideration can have a causal effect on the destination x _i,n + s by either: influencing x _i,n + m, or influencing y _j,n + m for 1 ≤ m < s; and in both cases consequently influencing x _i,n + s. Higher χ and smaller s increase both effects, while ϵ _y increases the latter. Higher memory in the destination ϵ _x (with respect to noise g, given χ) should similarly increase the influence of the source under consideration and therefore the reliability of detection in the undersampled data.

Appendix B: Effect of multivariate analysis

We have reported in the main paper results from an analysis that considered interactions between sets of v = 3 voxels. We have compared the distribution of the calculated measure MI or TE from S = 3,000 samples against the mean of P = 300 surrogate measurements for each subset sample. The use of v = 3 was motivated by the trend of p-values versus v in the simulation in Section A.2 and Fig. 8(b), where we found that the p-values for our technique were minimized for v = 2 to 4. We confirmed the selection of v = 3 with S = 3,000, P = 300 by investigating the trend of p-values versus v for selected region pairs, finding that the p-values for our technique were typically minimized at the lower end of the v = 3 to 5 range (results not shown). This means that v = 3 balanced our desire to capture multivariate interactions with the need to remain within the limitations of the available number of samples.

At this point then, two important questions have to be raised. First, how strongly does the resulting structure depend on the choice of parameters. Second, what is the effect of including more than one voxel (v = 1) in the subsets, and thus taking into account multivariate interactions.

To answer these questions we have run several additional analyses with the following parameters. First, we computed the MI and TE measures as described in the main text again but for v = 1 and v = 5 leaving S = 3,000 and P = 300 unchanged. We also calculated the same MI and TE measures for the three sizes (v = [1,3,5]) but using only S = 1,000 samples and P = 100 permutations. Second, we calculated the MI, TE and MI modulation structures based on the average activations across all voxels in each ROI. Note that there is only one possible sample for this average analysis and thus S = 1. We use P = 300 and the standard TE (MI) significance tests in Section 2.2.1 coupled with the group level analysis described in Section 2.2.3. The average analysis is similar to standard functional connectivity studies in fMRI that look at correlations between ROI (Friston 1994). It is important to note that with the TE this average ROI analysis could not infer any interregional links at the group level.^{Footnote 17} We then compared the results of all 7 analyses, including the main analysis presented in the paper and the average ROI analysis, by calculating the correlation coefficients between the corresponding resulting MI, TE and MI-modulation structures. Importantly, we did not compare the number of significant subjects, but directly looked at the correlation coefficients between the mean values for MI, TE and MI-modulation within each subject. In Fig. 12 we show the similarity between the information structures obtained by the different analyses.

The results can be summarized by two main statements. First, although univariate structures are correlated to the multivariate structures, the multivariate structures are correlated more strongly amongst each other than they are to the univariate structures. This indicates that the multivariate analysis captures some structure that is not present in the univariate analysis. Second, these multivariate interactions are captured by 3-voxel as well as 5-voxel interactions in a very similar way. Hence, the multivariate nature of the interaction does not seem to include very high dimensional interactions that cannot be captured by 3-voxel interactions but are present in 5 voxel interactions.

In a second step, we compared the statistical results obtained from the seven analyses. To do this we counted the percentage p _s of stable significant connections that remain unchanged between two types of analysis. We defined $p_{s,ij}=\frac{2N_{s,ij}}{N_{i}+N_{j}}$, where N _s,ij is the number of links that are significant in both analyses i and j, thus called stable, and N _i and N _j are the numbers of significant links in each analysis, respectively. If p _s,ij = 1, all significant connections are the same in both analyses, and if p _s,ij = 0, there is no significant connection that shows up in both analyses. p _s,ij was calculated for all possible pairs of analyses i and j. We summarize the results by averaging over the three main types of analyses defined in Fig. 12. All results are given in percent as Mean ± SD. In the TE structure, there are no stable connections in the average over ROI analysis (avg in Fig. 12) compared to any other analysis. The percentage of stable significant connections is $33 \text{\%} \pm 14 \text{\%}$ for comparisons of a univariate to multivariate analysis (u–m in Fig. 12) and $80 \text{\%} \pm 5 \text{\%}$ for comparisons between multivariate analyses (m–m in Fig. 12). For the MI modulation, the corresponding percentages are: $2 \text{\%} \pm 2 \text{\%}$ (avg), $46 \text{\%} \pm 17 \text{\%}$ (u–m) and $85 \text{\%} \pm 3 \text{\%}$ (m–m). Again, these numbers show that the multivariate information measures yield stable results and that the results are clearly different from the two kinds of univariate analysis we have compared them to.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lizier, J.T., Heinzle, J., Horstmann, A. et al. Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity. J Comput Neurosci 30, 85–107 (2011). https://doi.org/10.1007/s10827-010-0271-2

Download citation

Received: 02 January 2010
Revised: 17 June 2010
Accepted: 12 August 2010
Published: 27 August 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s10827-010-0271-2

Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity

Abstract

Access this article

Subscribe and save

Buy Now