Abstract
Electromagnetic source imaging (ESI) and independent component analysis (ICA) are two popular and apparently dissimilar frameworks for M/EEG analysis. This letter shows that the two frameworks can be linked by choosing biologically inspired source sparsity priors. We demonstrate that ESI carried out by the sparse Bayesian learning (SBL) algorithm yields source configurations composed of a few active regions that are also maximally independent from one another. In addition, we extend the standard SBL approach to source imaging in two important directions. First, we augment the generative model of M/EEG to include artifactual sources. Second, we modify SBL to allow for efficient model inversion with sequential data. We refer to this new algorithm as recursive SBL (RSBL), a source estimation filter with potential for online and offline imaging applications. We use simulated data to verify that RSBL can accurately estimate and demix cortical and artifactual sources under different noise conditions. Finally, we show that on real error-related EEG data, RSBL can yield single-trial source estimates in agreement with the experimental literature. Overall, by demonstrating that ESI can produce maximally independent sources while simultaneously localizing them in cortical space, we bridge the gap between the ESI and ICA frameworks for M/EEG analysis.
1 . Introduction
The magnetoencephalogram and electroencephalogram (M/EEG) are so far the only noninvasive imaging modalities that allow the study of brain function at the temporal resolution of neuronal processes. However, it is well known that both modalities have low spatial resolution because the volume conductor blurs the electromagnetic activity generated by sources inside the brain (although this distortion is less severe in MEG). Furthermore, many configurations of sources in the brain can elicit the same topography at the sensor level, making source inference an extremely ill-posed inverse problem. This reality has driven the quest for inverse algorithms that can incorporate biological and mathematical constraints to estimate the latent source-space activity from limited sensor-space measurements. Despite impressive methodological advances in this research domain in recent years, several practical and theoretical issues remain unsettled.
A plethora of algorithms can solve the inverse problem, each of them, with their strengths, weaknesses, and use cases (Paz-Linares et al., 2017; Asadzadeh, Yousefi Rezaii, Beheshti, Delpak, & Meshgini, 2020). Here we focus on the sparse Bayesian learning (SBL) framework (Tipping, 2001) because it allows quantifying the uncertainty in our modeling assumptions in a principled manner (Lei, Wu, & Valdes-Sosa, 2015) while yielding an efficient implementation (Ojeda, Kreutz-Delgado, & Mullen, 2018). SBL algorithms are part of a larger family of empirical Bayes methods, some of which have been applied successfully to M/EEG source imaging (Friston et al., 2008; Owen, Wipf, Attias, Sekihara, & Nagarajan, 2012; Henson, Wakeman, Litvak, & Friston, 2011), in short, electromagnetic source imaging (ESI). In this framework, the activity of sources (or group of sources) is assumed to be sparse throughout the cortex, which can be typically modeled with a gaussian prior with a covariance components structure (Wipf & Nagarajan, 2008). SBL uses data to infer the relevance of each component (hyperparameters of the model) such that those sources that are unnecessary to explain the measurements are automatically discarded without user intervention. Despite the sophistication of many popular source estimation algorithms, including SBL, it remains unclear to what extent they can truly undo the mixing induced by the volume conductor (Biscay, Bosch-Bayard, & Pascual-Marqui, 2018; Anzolin et al., 2019).
Source demixing, on the other hand, is usually carried out by blind source separation (BSS) methods, among which independent component analysis (ICA) is a popular choice. ICA can linearly decompose sensor data into source components that are maximally independent in the statistical sense (Cichocki & Amari, 2002). Still, since these components have unknown spatial support, additional steps need to be taken to turn ICA into a brain imaging technique. This letter aims to build a bridge between ESI and ICA to demonstrate that subject to biologically inspired sparsity constraints, the former can also yield maximally independent sources while simultaneously localizing them in the cortical space.
In a prior attempt to unify source localization and separation, Valdés-Sosa, Vega-Hernández et al. (2009) proposed an inverse procedure that used a lead field matrix1 derived from a realistic model of the head to constrain the inverse problem while moving the signal decomposition from the sensor space to the cortical space; they called this approach “tomographic ICA.” They used a combination of nonnegative, orthogonal, sparse, and smoothness constraints to decompose cortical source maps into the outer product of independent spatial and temporal signatures. A similar data-driven minimum overlap component analysis algorithm has been proposed by Nolte, Marzetti, and Valdes Sosa (2009). Although these are steps in the right direction, the interpretation of model-free spatial and temporal components can be challenging because they may not match across subjects. Other authors have proposed an approximation of source maps by estimating the individual contribution of regions of interest (ROIs) obtained from a brain parcellation (Biscay et al., 2018). While they did not prove that this approach leads to statistically independent groups of sources, they did show that it yields a reasonable demixing. Conveniently, the use of ROI constraints forces the method to extract components that have known anatomical support, thereby facilitating group-level analysis. Here we build on this idea and use ROIs to induce block-sparsity within the SBL framework.
In addition to the use of biological constraints, another key element for successful source estimation is the treatment of the measurement noise. Typically this noise is assumed to be gaussian, although, in practice, raw M/EEG samples are affected by many other noise types, such as interference from the 50/60 Hz AC line, pseudo-random muscle activity, and mechanically induced artifacts, among others. Thus, before source estimation, data are usually heavily preprocessed (Bigdely-Shamlo, Mullen, Kothe, Su, & Robbins, 2015). ICA has been shown to be a useful tool for identifying and removing many types of artifact components (Jung et al., 2000). ICA-based data cleaning is typically a manual process based on the practitioner's experience, although several automated alternatives do exist (Tamburro, Fiedler, Stone, Haueisen, & Comani, 2018; Pion-Tonachini, Makeig, & Kreutz-Delgado, 2017; Radüntz, Scouten, Hochmuth, & Meffert, 2017).
Beyond artifact cleaning, it seems remarkable that ICA can learn columns of a mixing matrix consistent with bipolar (single or bilaterally symmetric) cortical source scalp projections without using any anatomical or biophysical constraints whatsoever (Makeig, Jung, Bell, Ghahremani, & Sejnowski, 1997). It has been shown empirically that the best ICA algorithms can extract approximately 30% of dipolar brain components (about 21 brain components out of 71 possible in a 71-channel montage) (Delorme, Palmer, Onton, Oostenveld, & Makeig, 2012). Usually the remaining nonbrain components may correspond to different stereotypical artifact scalp projections, as well as a set of residual scalp maps that are difficult to explain from a biological standpoint (Onton, Westerfield, Townsend, & Makeig, 2006). Although ICA has proven to be a useful technique for studying brain dynamics (Makeig & Onton, 2011), we must wonder if its performance can be improved, perhaps by making BSS of M/EEG data less “blind.” In other words, if we know a priori what kind of source activity we are looking for (dipolar cortical activity, EOG and EMG artifacts, and so on), why limit ourselves to a purely blind decomposition?
In this letter, we advocate using as much information as possible about brain and artifact sources to help solve the inverse problem. In that sense, the use of a prespecified lead field matrix in the generative model of the M/EEG forces inverse algorithms to explain the data in terms of dipolar sources because the lead field is precisely an overcomplete dictionary of dipolar projections of every possible source there is in a discretized model of the cortex. To furnish a more realistic generative model of the data, we propose augmenting the lead field dictionary with a dictionary of stereotypical artifact projections. This strong parameterization of brain and artifact sources within the SBL framework may render blind decomposition of M/EEG data unnecessary or at least suboptimal for brain imaging. Of theoretical relevance, we show the connections between ESI (as implemented with SBL) and Infomax ICA (Bell & Sejnowski, 1995), two approaches that are often perceived to be at odds with one another. Finally, we extend SBL to allow for continuous brain mapping of sequential M/EEG data; we call this new algorithm recursive SBL (RSBL). With this extension, we turn the traditional instantaneous source estimation obtained with SBL into an online-capable source estimation filter with potential for offline and online imaging applications.
In the section 2, we briefly explain ESI within the Bayesian framework and explain our augmented generative model. Then we give our main theoretical result linking SBL and Infomax ICA. Next, we explain the RSBL algorithm and how we built the artifact dictionary. In section 3, we validate the RSBL algorithm on simulated and real EEG data.
2 . Methods
2.1 . The Measurement Model
In ESI, the electromagnetic activity of the brain sources is defined on a dense grid of known cortical locations (the source space). Typically, a vector of voltage (EEG) or magnetic field (MEG) measurements at sample , , relates to the activity of sources, , through the following instantaneous linear equation (Dale & Sereno, 1993),
(2.1) |
where is the vector of current source density along the three orthogonal directions and represents the measurement noise vector. The source is projected onto the sensor space through the electric or magnetic lead field matrix (), where each column denotes the scalp projection of the ith unitary current dipole along a canonical axis. The lead field matrix is usually precomputed for a given model of the head derived from a subject-specific MRI (Hallez et al., 2007). Alternatively, if an individual MRI is not available, an approximated lead field matrix obtained from a high-resolution template can be used (Huang, Parra, & Haufe, 2016). The inverse problem of the M/EEG can be stated as the estimation of a source configuration that is likely to produce the scalp topography . To simplify the exposition, from now on, we focus on EEG. However, our results hold for MEG as well, keeping in mind that in the formulas, we swap the respective voltage by magnetic field measurements and the electric by the magnetic lead field matrix.
As we mentioned in section 1, the noise term is usually assumed to be white and gaussian. This simplification is acceptable when the measurements are not affected by nongaussian pseudo-random artifacts generated by eye blinks, lateral eye movements, facial and neck muscle activity, and body movement, among others. With this in mind, we start building our bridge between ESI and ICA by augmenting equation 2.1 with a term that accounts for the linear contribution of artifacts as follows,
(2.2) |
where is a vector of artifact sources and is a dictionary of artifact scalp projections (see Figure 1).
The columns of that correspond to muscle activity may be obtained based on a detailed electromechanical model of the body (Böl, Weikert, & Weichert, 2011). Another approach is to expand the lead field matrix to model putative EMG sources located all over the scalp (Janani et al., 2017). EOG sources can be modeled by including the projections of two current dipoles situated right behind the eyes (Fujiwara et al., 2009). Our approach to obtaining is rather empirical, inspired by ICA's success for identifying artifact scalp projections.
In section 2.7, we explain how we built the artifact dictionary using a set of stereotypical artifact scalp projections obtained by ICA, but for the sake of building up the theory, let's assume that is known by the time the data are collected. With this in mind, we rewrite equation 2.2 in a compact manner as follows,
(2.3) |
where is a known observation operator and is the augmented vector of unknown (latent) brain and artifact sources (see Figure 1). Note that, structurally, the standard generative model in equation 2.1 and the augmented one in equation 2.3 are identical. However, they differ in that in equation 2.3, we are explicitly modeling the instantaneous spatial contribution of nonbrain sources to the scalp topography . The assumption of gaussian measurement noise yields the following likelihood function,
(2.4) |
where is the sensor noise covariance, which can be estimated from empty room measurements of MEG data, from prestimulus data (Engemann & Gramfort, 2015; Wipf, Owen, Attias, Sekihara, & Nagarajan, 2010), or learned adaptively (Ojeda et al., 2018). Here we assume that, after accounting for most artifactual sources, common-mode sensor noise is homoskedastic and spatially uncorrelated .
2.2 . Block-Sparse Source Imaging
Equation 2.3 does not have a unique solution, so to obtain approximated source maps with biological interpretation, we need to introduce constraints. Since the neural generators of the EEG are the electrical currents produced by distributed neural masses that become locally synchronized in space and time (Nunez & Srinivasan, 2006), it makes sense to constrain source maps to be globally sparse (seeking to explain the observed scalp topographies by a few spots of cortical activity) and locally correlated (so that we obtain spatially smooth maps as opposed to maps formed by scattered, isolated sources). Artifactual sources, on the other hand, can be assumed to be spatially uncorrelated from one another and true brain sources.
We use the assumptions above to furnish the following source prior,
(2.5) |
where is the source noise covariance, which we assume has the following block-diagonal structure,
where the upper block corresponds to the covariance of groups of brain sources and has also the following block-diagonal structure,
and is the covariance of the artifactual sources. denotes a nonnegative scale vector that encodes the sparsity profile of the group of sources.
In general, the block structure of could be learned in a data-driven way (Zhang & Rao, 2013), but we do not pursue that route here and assume it is given by the 68 regions of interest (ROI, ) covering the entire cortical surface of the Desikan-Killiany atlas (Desikan et al., 2006). In this setting, the matrices model the intra-ROI source covariances and can be precomputed based on source distances, taking into account the local folding of the cortex as described in Ojeda et al. (2018).
Next, we specify the priors on the hyperparameter and (hyperpriors). Assuming that and are independent yields the factorization,
(2.6) |
And since they are scale hyperparameters, we follow the popular choice of assuming gamma hyperpriors with noninformative scale and shape parameters on a log-scale of and (Tipping, 2001). This choice of hyperprior has the effect of assigning a high probability to low values of , which has been shown to shrink the irrelevant components of to zero (Wipf & Nagarajan, 2008), thereby leading to a sparsifying behavior known as automatic relevance determination (ARD) (Neal, 1996; MacKay, 1992).
We use the Bayes theorem to express the posterior distribution of the sources as follows,
(2.7) |
It is well known that a model with gaussian likelihood, , and prior, , like ours, also has a gaussian posterior with the following mean and covariance,
(2.8) |
where represents the pseudoinverse of the observation operator and is the model data covariance,
(2.9) |
2.3 . Sparse Model Learning
Once the measurements are gathered, the maximum a posteriori estimate of the sources and their covariance can be obtained analytically using equation 2.8. However, these formulas depend on the specific values taken by the hyperparameters that control our model, and . Doing a comprehensive survey of the algorithms available to learn these hyperparameters is beyond the scope of this letter, so in this section, we instead focus on the general idea and assumptions that go into this task.
The denominator of equation 2.7 is called the evidence and is a probabilistic measure of how much the data match our modeling assumptions. For a linear gaussian model like ours, it is readily expressed as follows,
(2.10) |
Since we used noninformative hyperpriors (see section 2.2), we could optimize equation 2.10 with respect to and to obtain source estimates conditioned on the optimal model. However, pointwise hyperparameter estimates tend to be noisy and computationally expensive. One way to obtain robust estimates is to learn these hyperparameters based on an ensemble of measurements (Cotter, Rao, & Kreutz-Delgado, 2005), . Furthermore, if we ignore the autocorrelation of the data and assume that each single measurement vector in is independent and identically distributed (i.i.d.), we can conveniently approximate the evidence of the ensemble as follows,
(2.11) |
which is usually transformed into the so-called type II maximum likelihood (ML-II) cost function (Barber, 2012),
(2.12) |
Equation 2.12 embodies a trade-off between model complexity and accuracy. The complexity term can be seen as the volume of an ellipsoid defined by . The volume of the ellipsoid is reduced by shrinking the axes that correspond to sources that are not needed to explain the data. The second term is a squared Mahalanobis distance that measures the accuracy of our model. Finally, the ML-II cost function is used to learn the optimal hyperparameters by solving the following optimization problem:
(2.13) |
One of the key contributions of this letter is the realization that the augmented inverse problem of the M/EEG can be readily solved with existing off-the-shelf algorithms that tackle the optimization problem above (Faul & Tipping, 2002; Wipf & Nagarajan, 2008; Wipf et al., 2010; Zhang & Rao, 2013). In this letter, we solve equation 2.13 with the fast two-stage block-SBL algorithm proposed by Ojeda et al. (2018). Algorithm 1 shows the summary of SBL in pseudocode.
2.4 . Independent Component Analysis
In the analysis presented above, the matrix is prespecified. In this section, we analyze the generative model of equation 2.3 from the ICA viewpoint. ICA seeks to estimate the source time series (called activations in the ICA literature) from the data time series without knowing the gain (mixing) matrix . Here we assume that the latent sources are instantaneously independent, which yields the following prior distribution,
(2.14) |
To simplify the exposition, we assume the same number of sensors and sources, ; interested readers can find the case in Le, Karpenko, Ngiam, and Ng (2011) and Lewicki and Sejnowski (1998). From these premises, the objective of the algorithm is to learn the unmixing matrix such that we can estimate the sources with . The unmixing matrix can be learned up to a permutation and rescaling factor, which has the inconvenience that the order of the learned components can change depending on the starting point of the algorithm and data. As in section 2.3, we can use a data block to write an approximation to the block likelihood function,
(2.15) |
under the i.i.d. data assumption. We obtain each factor in equation 2.15 by integrating out the sources as follows,
(2.16) |
As noted by MacKay (2008), assuming that the data are collected in the noiseless limit, , transforms the gaussian likelihood into a Dirac delta function, in which case equation 2.16 leads to the Infomax algorithm of Bell and Sejnowski (1995). The learning algorithm essentially consists in finding the gradient of the log likelihood, , with respect to and updating on every iteration such that the likelihood of the data increases. As pointed out by Comon (1994), the ICA model is uniquely identifiable only if at most one component of is gaussian. Therefore, the prior densities are usually assumed to exhibit heavier tails than the gaussian and, in particular, the prior yields the popular contrast function . It is worth noting that unlike the block-sparsity prior introduced in section 2.2, this one is motivated not by a biological consideration but by a mathematical necessity.
2.5 . Link Between Block-Sparse Source Imaging and ICA
In this section we show that ESI subject to the block-sparsity constraints laid out earlier necessarily yields groups of sources that are independent (or as independent as possible). We start by rewriting the biologically motivated source prior of equation 2.5 as follows,
(2.17) |
where each factor is a gaussian pdf and i indexes a group of sources or an artifact component. To write equation 2.17 as the ICA prior in equation 2.14, we need to integrate out the hyperparameter from each factor as follows:
(2.18) |
Given our choice of hyperprior on renders each marginalized prior a heavy-tailed Student -distribution (Tipping, 2001), which is known to induce independence. In other words, if we use the biologically inspired block-sparsity prior of equation 2.5, the ICA likelihood function of equation 2.16 becomes exactly the evidence in equation 2.10.
Note that we have not said anything very profound here, but we are merely pointing out that these two frameworks can be related in a principled way by choosing sparsity-inducing priors. The main difference between the two frameworks is that in SBL, we use biological information to constrain the dictionary and source activity, while in ICA, both are learned from data.
2.6 . Recursive SBL
In this final theoretical section, we propose a simple modification to SBL to eliminate the i.i.d. data assumption while keeping the algorithm adaptive and computationally tractable. We start by introducing a spatiotemporal constraints in the form of a state (source) transition,
where the vector function models how the source activity evolves from one sample to the next and is a perturbation vector. Several linear (Yang, Aminoff, Tarr, & Kass, 2016; Fukushima et al., 2012; Fukushima, Yamashita, Knösche, & Sato, 2015; Lamus, Hämäläinen, Temereanca, Brown, & Purdon, 2012; Cheung, Riedner, Tononi, & Van Veen, 2010; Galka, Yamashita, Ozaki, Biscay, & Valdés-Sosa, 2004) and nonlinear (Olier, Trujillo-Barreto, & El-Deredy, 2013; Giraldo, den Dekker, & Castellanos-Dominguez, 2010; Valdes-Sosa, Sanchez-Bornot et al., 2009; Daunizeau, Kiebel, & Friston, 2009) models have been proposed to express the neural dynamics; for simplicity, here we use the time-invariant linear model proposed by Yamashita, Galka, Ozaki, Biscay, and Valdes-Sosa (2004). Yamashita's model considers nearby source interactions only and describes the evolution of the ith source by the following first-order autoregressive model,
(2.19) |
where the constants and are set so that the system is observable (Galka et al., 2004), contains the indices of the direct neighbors of dipole, and is its total number of neighbors. Dipole neighborhoods can be extracted from the tessellation of the cortical surface. In the absence of any other obvious transition model for artifact sources, we propose a simple random walk, which together with equation 2.19 yields the following linear transition function,
where the constant damping factor has the job of stabilizing the random walk.
The state perturbation process encompasses modeling errors as well as random inputs coming from distant sources, which we assume to be gaussian and serially uncorrelated, . These assumptions yield the following conditional source prior,
(2.20) |
where is the state covariance at and we assume .
Swapping out the static prior of equation 2.5 by the one in equation 2.20 transforms our generative model into a linear gaussian dynamic systems (LGDS). In this type of system, the source time series can be estimated optimally using the Kalman filter (Kalman, 1960). We first remove the dependence on the previous state, , by computing the predictive density,
where is the source posterior already determined in the previous time step, . In an LGDS, the predictive density is also gaussian with the following mean and covariance (Ozaki, 2012),
(2.21) |
where we use the notation (with ) to indicate quantities estimated in the step using data up to the sample. Equation 2.21 is known as the time update. The posterior is also a gaussian, so we readily write down its mean and covariance,
(2.22) |
where represents the Kalman gain, the analytical data covariance is , and is the innovation (one-step ahead prediction error). Equation 2.22 is known as the measurement update, and plugging equation 2.21 into it yields the following recursive formula,
(2.23) |
In our case, the source space is usually very large (thousands or tens of thousands of sources), making the direct use of the Kalman filter impractical due to the necessity to estimate the state covariance matrix . To make the filter tractable, Yamashita et al. (2004) proposed ignoring the contribution of the neurodynamic model in the evolution of the state covariance, , in equation 2.21. With this approximation, the Kalman gain becomes the pseudoinverse in equation 2.8 and the model data covariance is calculated as defined in equation 2.9, which we plug into the main recursive formula of the Kalman filter in equation 2.23. We continue estimating the hyperparameters and through the SBL algorithm outlined in section 2.3 with the difference that now the evidence is a function of the innovations:
We call this new approach recursive SBL (RSBL) and we summarize it in algorithm 2.
It is important to note that unlike most common use cases of the Kalman filter, in ESI, we do not have access to a perfect description of the brain dynamics, let alone a computationally tractable one. Thus, we usually rely on approximate neurodynamic models, of which equation 2.19 is an example. We also remark that our model yields a transition function that provides a simple local spatiotemporal filtering effect that only reflects nearby source interactions. However, the actual transition may be far more complicated. Since modeling errors are unavoidable, we argue that it makes sense from a practical standpoint to downplay the role of the neurodynamic model in the update of the state covariance. To compensate for this additional approximation, we exploit the block structure of the state vector induced by the hyperparameter vector to constrain the correction of the measurement update to regions of the state-space with biological significance.
Our approach differs from the one proposed by Yamashita et al. (2004) in two ways. First, they regularize the same amount everywhere in the source space (equivalent to considering a single parameter for the whole cortex), while we do so in groups of nearby sources encouraging sparsity in the ROI domain. Second, they do not reestimate the hyperparameters and we do.
2.7 . Computation of Dictionaries L and A
As we mentioned in section 2.1, a database of stereotypical artifact scalp projections can be obtained by running ICA on a large collection of EEG data sets. The database could include data from different studies, populations, and montages so that a “universal” artifact dictionary can be compiled offline. Note that the key idea is to use the precomputed artifact dictionary to approximate the scalp projection of stereotypical artifact components of new subjects without actually running ICA on their data. Toward that end, ideally, each subject would have its companion structural MRI and digitized channel locations fully describing the anatomical support on which EEG data were collected.
If only sensor positions were available, the procedure proposed by Darvas, Ermer, Mosher, and Leahy (2006) could be used to obtain individualized head models using a single template head. Most EEG studies, however, do not include MRI data or subject-specific sensor positions, just the sensor labels. Here we propose a co-registration procedure that requires sensor positions only, either measured with a digitizer or pulled from a standard montage file, and a template head model. We use a four-layer (scalp, outer skull, inner skull, and cortex) head model derived from the Colin27 MRI template with fiducial landmarks (nasion, inion, vertex, left and right preauricular points) and 339 sensors located on the surface of the scalp. The sensors are placed and named according to a superset of the 10/20 system. We define the brain source space as 8003 orientation-free dipoles scattered uniformly over the template's cortical layer, resulting in current source elements (8003 dipoles and their orientations along -, -, and -axes).
Although several algorithms can be used to automatically identify artifact scalp projections derived from ICA (Pion-Tonachini et al., 2017; Winkler, Haufe, & Tangermann, 2011), as proof of principle, here we rely on the expertise of the authors. Since inspecting each IC in a large database is a cumbersome task, we propose the following simplification. We first co-register each subject in the database with the head model template to map ICs defined on different sensor spaces to a common space. We collect the mapped ICs in a matrix of 339 (number of channels of the template) by the number of total ICs. We reduce dimensionality by applying the k-means algorithm to the columns of the IC matrix. After inspecting the cluster centroids, we label them as Brain, EOG, EMG, or Unknown (cluster of scalp maps of unknown origin). Then we store the indices of the EOG and EMG cluster centroid nearest neighbors for further use in the creation of the artifact dictionary .
We use the fiducial points marked on the participant's head at the start of data collection to estimate an affine transformation from the individual space to the space of the template. If fiducial points are not available, a common set of sensors between the template and the individual montage can be used to estimate the transformation. We use the affine transformation to map the montage of the participant onto the skin surface of the template; this is what we call in Figure 2 “individualized head model.” We compute the lead field matrix using the boundary element method implemented in the OpenMEEG package (Gramfort, Papadopoulo, Olivi, & Clerc, 2010). Although this procedure can incur significant errors due to differences in subject and template head shapes and errors in the sensor placement, in many cases, this is the best one can do given the lack of accurate spatial information. A promising alternative to explore is the co-registration procedure recently proposed by Duque-Muñoz et al. (2019) for optically pumped magnetometer sensors in which the sensor placement orientation is learned from data.
To produce an individualized matrix, we linearly warp the EOG and EMG cluster centroids to the space of the individualized head model so that each artifact scalp projection has the same column length as L and can be appended to it. Finally, we divide each column of by its norm so that the relative contribution to each source is determined by the amplitude of the source activation vector only.
2.8 . Data
To construct the artifact dictionary, we used data from two different studies made public under the umbrella of the BNCI Horizon 2020 project2 (Brunner et al., 2015). The data sets were selected due to their well-curated metadata and their relatively high number of sensors.
2.8.1 . Data Set 1: Error Related Potentials
The first study, 013-2015, provided EEG data from six subjects (two independent sessions per subject and 10 blocks per session) collected using an experimental protocol designed to study error potentials (ErrP) during a BCI task (Chavarriaga & del R. Millán, 2010). EEG samples were acquired at a rate of 512 Hz using a Biosemi ActiveTwo system and a 64-channel montage placed according to the extended 10/20 system.
2.8.2 . Data Set 2: Covert Shifts of Attention
The second study, 005-2015, provided EEG and EOG data from eight subjects collected using an experimental protocol designed to study the EEG correlates of attention shifts (Treder, Bahramisharif, Schmidt, Van Gerven, & Blankertz, 2011). The EEG was recorded using a Brain Products actiCAP system, digitized at a sampling rate of 1000 Hz. The montage employed had 64 channels placed according to the 10/10 system referenced to the nose. In addition, an EOG channel (labeled as EOGvu) was placed below the right eye. To measure vertical and horizontal eye movements, from the total of 64 EEG channels, two were converted into bipolar EOG channels by referencing Fp2 against EOGvu, and F10 against F9, thereby yielding a final montage of 62 EEG channels.
2.8.3 . Preprocessing and IC Scalp Map Clustering
After transforming each data file to the .set format, both studies were processed using the same pipeline written in Matlab (R2017b; MathWorks, USA) using the EEGLAB toolbox. The pipeline consisted of a 0.5 Hz high-pass forward-backward FIR filter and re-referencing to the common average, followed by the Infomax ICA decomposition of the continuous data. We pooled all the preprocessed files from all subjects in both data sets and randomly assigned them to one of two groups: 80% to the training set and 20% to the test set. The training set was used to construct the EEG database and the artifact dictionary, while the test set was used to evaluate the performance of the RSBL algorithm on real data in section 3.2. Note that we use the testing set to simulate new subjects whose artifacts are not explicitly characterized in the database. We calculated the lead field of the subjects in the testing set as described in section 2.7.
After performing the co-registration to the template, the training set resulted in a matrix of 339 channels by 6774 independent scalp maps (101 sessions and blocks yielding 64 ICs each plus 5 sessions yielding 62 ICs each). Figure 3 shows a visualization of the IC scalp maps using the t-distributed stochastic neighbor embedding (t-SNE) algorithm (Van Der Maaten & Hinton, 2008). The t-SNE algorithm allows us to represent each 339-dimensional IC scalp map as a dot in a 2D space in a way that similar and dissimilar scalp maps are modeled by nearby and distant points respectively with high probability. We ran the k-means algorithm with several numbers of clusters stopping at 13 after noticing that many small islands scattered at the periphery of Figure 3 started to be either mislabeled as Brain or labeled consistently as EOG, EMG, or Unknown. Unknown clusters were not further used in this letter. The gray points in the figure denote most of the scalp maps labeled as nonbrain.
After the artifact selection and co-registration processes, we obtained the following dictionary for each individualized head model in the testing set:
(2.24) |
where and are the respective scalp projections of the vertical and horizontal EOG ICs, are the projections of 11 representative EMG ICs, and we modeled spike artifacts affecting each individual channel with the columns of the identity matrix .
3 . Results
3.1 . Performance on Simulated Data
The objective of this experiment was to unmix the EEG signal and recover all simulated brain and artifact sources under different common-mode noise conditions. To that end, we simulated 2 seconds of evoked EEG data contaminated by lateral and vertical EOG activity (see Figure 4). By construction, we only simulated brain activity in the anterior (ACC) and posterior (PCC) cingulate cortex and designed their time courses to mimic the distribution of scalp voltages observed during ErrP studies (Gehring, Liu, Orr, & Carp, 2011). ErrPs are characterized by a negativity (Ne component) in the frontocentral channels within 100 msec after an erroneous response, followed by a positivity (Pe component) in the centroparietal channels around 250 msec post-error. The timing of these components can be reliably identified by inspecting the EEG activity of frontocentral channels (Fz, FCz, Cz) time-locked to the erroneous responses.
We simulated Ne and Pe spatial profiles with the column vectors and , each with elements. had 294 sources (98 dipoles 3 orientations) in the ACC set to one and zero elsewhere, and had 390 sources (130 dipoles 3 orientations) in the PCC set to one and zero elsewhere. The temporal profiles, and were simulated using gamma functions. We simulated the cortical activity by multiplying the spatial and temporal profiles as follows,
The two artifact sources simulated a lateral eye movement (from 750 msec to 250 msec) and eye blink (from 0 msec to 400 msec) events. All other artifact sources were set to zero. The ratios between the maximum amplitude of artifacts, and , to cortical source activity were set to 5 and 10, respectively. The simulated source activity was generated by concatenating the cortical () and artifact source matrices. We simulated the sensor space with a 64-channel montage placed according to the extended 10/20 system and computed as described in section 2.7. Scalp data were generated by projecting the sources onto the sensor space through and adding white gaussian noise. The variance of the noise was calculated with respect to the variance of the sensor data before adding EOG artifacts.
Table 1 summarizes the performance of the RSBL algorithm for different levels of noise. Column 1 shows the SNR in dB units, and column 2 shows the equivalent EEG signal to noise amplitude ratio (SNAR). The columns headed by ACC, PCC, , and report the correlation value between their respective simulated and estimated sources; high values indicate good estimation accuracy. The columns headed by ACC and PCC report the mean correlation between simulated ACC and PCC sources and every other estimated cortical source; low correlation values indicate low source leakage. We note that for extremely aggressive noise conditions (i.e., SNR values between 10 dB and 0 dB), the algorithm failed to consistently reconstruct all the simulated sources accurately. However, once the amplitude of the EEG was approximately one and a half higher than the amplitude of the noise, we obtained correlations in the 0.63 to 0.99 range. In all cases, the leakage was low, as indicated by correlation values below 0.04.
Table 1:
SNR (dB) | SNAR | ACC | ACC | PCC | PCC | ||
---|---|---|---|---|---|---|---|
10 | 0.316 | 0.0032 | 0.0200 | 0.016 | 0.0241 | 0.096 | 0.099 |
6 | 0.5 | 0.0200 | 0.0382 | 0.765 | 0.0327 | 0.109 | 0.111 |
0 | 1 | 0.1678 | 0.0189 | 0.964 | 0.0078 | 0.715 | 0.496 |
3 | 1.4 | 0.6325 | 0.0175 | 0.921 | 0.0175 | 0.739 | 0.803 |
6 | 2 | 0.8808 | 0.0129 | 0.869 | 0.0137 | 0.881 | 0.949 |
10 | 3.16 | 0.7211 | 0.0131 | 0.991 | 0.0098 | 0.969 | 0.983 |
20 | 10 | 0.8805 | 0.0121 | 0.993 | 0.0109 | 0.974 | 0.998 |
Signal-to-noise amplitude ratio.
Notes: The columns ACC and PCC report the correlation between the simulated and the estimated sources of each respective ROI. The shaded columns with the symbol in the heading report the mean correlation between the simulated sources and all other estimated sources, removing the respective ACC or PCC ROI; low values in these columns indicate low source leakage. The last two columns report the correlation between simulated and recovered artifact sources.
Figure 4 illustrates the data and sources for a simulation with SNR of 6 dB. The traces in panel A represent the raw, cleaned, and simulated EEG time series for a subset of channels. The cleaned EEG traces were obtained by projecting out the estimated EOG source activity. The panels on the right show the ground truth and estimated source activity in the ACC (B), PCC (C) areas, as well as EOG artifacts (D). Note that the estimated source time series are sparse in space and time (see also Figure 5). In particular, panels B and C demonstrate that source values that randomly oscillate at the noise level are ignored by the algorithm. That does not mean that those cortical areas are silent, but that the postsynaptic potentials produced by the collective firing of pyramidal cells in those regions are not coherent enough to create signals that can be reliably measured by the scalp sensors.
Figure 5 shows a snapshot of the scalp and source maps at the peak of the Ne and Pe components displayed in Figure 4. We linearly extrapolated the sensor values to the portion of head covered by the EEG cap, so that the topographic maps could be rendered on the 3D surface. In the top row, we see that although all the simulated sources within an ROI have the same amplitude, in the recovered maps (bottom row), the nontrivial source values are not identical (though correlated). The panels in the bottom row show the EEG scalp maps with the artifact activity projected out. Note that even when the scalp data are severely affected by the eye blink artifact at the peak of the Ne component (t 44 msec), the algorithm correctly estimated the orientation and location of its generators in the ACC.
3.2 . Source Reconstruction of Ongoing Real Error-Related EEG Data
In this experiment we used real data to demonstrate the performance of the RSBL algorithm. To that end, we selected a subject from the testing set that belonged to the ErrP study (data set 1 above). In the experimental protocol used to collect those data, the subjects were tasked with moving a cursor toward a target location either using a keyboard or mental commands (Chavarriaga & del R. Millán, 2010). The authors of that paper found consistent evoked potentials produced by errors induced by the computer. In other words, subjects elicited error potentials after watching the computer execute the wrong moves; this is sometimes referred to as feedback error–related negativity/positivity (Ward et al., 2013). These error potentials were characterized by two frontocentral positive peaks around 200 msec and 320 msec after the feedback, a frontocentral negativity around 250 msec, and a broader frontocentral negative deflection about 450 msec.
Figure 6 summarizes the results of applying the RSBL algorithm to one trial selected at random. We checked beforehand that the selected trial wasn't obviously affected by artifacts. Panel A shows the EEG signal of three frontocentral sensors (Fz, FCz, and Cz). Panel B shows the source magnitude time series averaged within the ACC and PCC regions, panel C shows the scalp and cortical maps at the latency of the Ne and Pe components, and panel D shows the maximum intensity projection of each source map in panel C in the sagittal plane. Panels B and C reveal that the error components observed at the scalp level are mostly generated by a complicated interplay of sources in the ACC and PCC regions. However, the sagittal maximum intensity projections indicate that other cortical sources also contribute to the observed EEG signal, although a lesser amount. This is common in single-trial analyses, and usually the sources that are unrelated to the experimental condition being studied clear out after trial averaging. We note that the result that both ACC and PCC sources contribute to Ne and Pe scalp topographies is in agreement with other imaging studies (Buzzell et al., 2017; Ojeda et al., 2018).
3.3 . EOG Artifact Removal on Real Data
In this example, we applied the RSBL algorithm to a trial contaminated by a lateral eye movement and eye blink activity. We used data from the same subject selected for the experiment in the previous section, but this time we analyzed an epoch with no error-related activity so that we could appreciate the artifact rejection performance of the algorithm minimizing task-related confounds. Figure 7 summarizes the data and results.
Figure 7A shows a subset of the raw and reconstructed (cleaned) EEG traces in black and red, respectively. There is a lateral eye movement artifact between 1250 msec and 800 msec and an eye blink between 250 msec and 1000 msec. Panel B shows the estimated EOG and EOG artifact source activity in blue and orange, respectively. We note that these artifact sources were active at the latencies where the EEG is affected and mostly zero elsewhere. This is a desired feature of the algorithms because this way, it only “corrects” the affected segments while leaving clean data unchanged. In panel C, we show the maximum intensity projection maps of the source map at the latency of the peak eye blink artifact, 0 msec. Each column displays a different projection. The rows display source estimates without and with artifact modeling enabled. Panel D shows the time series of the log evidence for generative models with artifact modeling on and off. In panel D, both traces differ mostly only when artifacts occur, where higher log evidence of the blue trace indicates that source estimation benefits from modeling artifacts. This is not a striking finding, but it illustrates the practical utility of the evidence metric for data modeling. Note that although in our implementation, we optimize the ML-II cost function (see equation 2.12), here we put this value back in log evidence units to facilitate its interpretation (i.e., the higher it is, the better our model captures the data).
Visual inspection of the bottom row of panel C reveals that some residual eye blink activity may have leaked into the frontal pole. In practice, it is extremely hard to totally remove artifactual activity because (1) the use of a lead field matrix derived from a template head model may misfit the anatomy of the subject, introducing errors in the dictionary, (2) errors in the sensor locations can cause the EEG topography to shift with respect to the expected brain and artifact source projections, (3) EMG scalp projections are difficult to characterize due to their variability, as opposed to EOG projections that are more stereotyped, and (4) unmodeled artifactual activity, such as muscle projections toward the back of the head that were largely ignored in this study, may be suboptimally accounted for. Despite all these issues, Figure 7 demonstrates that RSBL can yield reasonably robust source estimates in the presence of high-amplitude artifacts. Furthermore, panel D suggests that we could use dips in the log evidence to inform subsequent processing stages of artifactual events that were not successfully dealt with.
4 . Discussion and Further Directions
As we pointed out in section 1, source estimation imposing only the independence constraint does not always yield interpretable results. In SBL, on the other hand, we estimate the same number of cortical source activations per subject, each of which has known and fixed anatomical support. This eliminates the complications of clustering ICs and dealing with missing components in multisubject studies (Bigdely-Shamlo, Mullen, Kreutz-Delgado, & Makeig, 2013) while allowing the use of more straightforward and widespread statistical parametric mapping techniques (Penny, Friston, Ashburner, Kiebel, & Nichols, 2007). Otherwise, ICA is a powerful technique that solely depends on the data to work. However, if anatomical information is available, either as a template or an individual head model, we can upgrade our analysis using sparse source imaging to obtain localized brain components.
In this letter, we built the artifact dictionary manually aided by the combination of k-means clustering and t-SNE visualizations. Although our experiments show that this dictionary seems to capture artifacts well, that is not guaranteed to be the case for arbitrary data sets. We acknowledge that the data from the two experiments used here are most likely capturing a small subset of artifact projections. Toward creating a universal artifact dictionary, we provide a tool that allows adding more scalp projections derived from new data.3 We encourage users of our software to submit their dictionaries to the repository so that others can benefit from it.
Future work on the RSBL algorithm should deal with the fact that we currently ignore the state covariance matrix in the time update, equation 2.21. This simplification has three objectives: (1) reduce the computational footprint by removing dense matrix multiplications in the source space, (2) allow the direct application of the sparsity constraints via the multiplication of a row-wise sparse Kalman gain, , in the measurement update, and (3) downplay the inaccuracy of the transition model . Model inaccuracies can be dealt with by plugging in another model that may capture more sophisticated source dynamics (e.g., nonlinear, higher order). If the new model is linear, the time update stays the same as in equation 2.21. Otherwise, the Unscented Kalman filter update can be used (Julier & Uhlmann, 2004). In our opinion, the most interesting avenue for improvement is in the estimation of the source covariance, and that is what we focus on next.
One approach worth exploring based on the sparse covariance estimation algorithm by Bien and Tibshirani (2011), is to add a lasso penalty to the following penalized likelihood function,
(4.1) |
where is a regularization parameter. Then solve the following optimization problem,
(4.2) |
using Bien and Tibshirani's majorization-minimization algorithm. Here we seek to approximate the one-step-ahead covariance, , by a reasonable sparse positive semidefinite matrix. Since the optimization is done in the covariance space, this algorithm may be very expensive. Perhaps an easier way forward is to move the sparsity constraint from the covariance matrix to the source vector as follows (Charles, Asif, Romberg, & Rozell, 2011),
(4.3) |
where the first term to minimize over is the error, the second term is an penalty that enforces the temporal smoothness of the source dynamics, and the third term is a lasso penalty that enforces the sparsity of the source vector. Other approaches to explore may come from the sparse unscented Kalman filter literature (Kang & Xu, 2018; Padilla & Rowley, 2010). We should note that although these alternative approaches for sparsifying the covariance matrix or the source vector may result in increased accuracy, they can be significantly more computationally demanding than RSBL, where optimization is done in the space of the ROIs. This trade-off between accuracy and speed may be something worth characterizing as well.
5 . Conclusion
This letter contributes to closing the gap between two popular M/EEG analysis frameworks, electromagnetic source imaging (ESI) and independent component analysis (ICA). While ESI and ICA are typically used for source localization and separation, respectively, here we showed that both frameworks can be linked by choosing biologically inspired source sparsity priors. Toward building a bridge between these frameworks, we proposed extending the standard generative model of the M/EEG signal to include nonbrain artifact sources such as EOG, EMG, and single-channel spikes. We used a cortical atlas to furnish group sparsity priors that encourage brain source configurations to be sparse in the number of active cortical regions. Although our generative model can be inverted by off-the-shelf sparse Bayesian learning (SBL) solvers, we proposed an efficient extension to SBL that allows for model inversion while assimilating sequential data. We called this new algorithm recursive SBL (RSBL). On simulated EEG data, we showed that the parameterization of our augmented generative model induces the segregation of cortical activity to a few regions that are maximally independent of one another while minimally overlapping artifactual activity. Our simulations also demonstrated high resilience to sensor noise and biological artifacts. We used real error-related EEG data to show that RSBL can yield single-trial source estimates in agreement with the experimental literature. Our code is open source and packaged as a plugin for EEGLAB, which can be freely downloaded from https://github.com/aojeda/dsi.
Acknowledgments
We thank Marius Klug and Klaus Gramann from TU Berlin for their contributions to an extended preprint version of this letter. This research was supported by NIMH training fellowships in Cognitive Neuroscience T32MH020002 and Biological Psychiatry and Neuroscience T32MH18399 (A.O.), UC San Diego Chancellor's Research Excellence Scholarship (J.M., A.O.), and UC San Diego School of Medicine start-up funds (J.M.). The RSBL algorithm is copyrighted for commercial use (UC San Diego Copyright #SD2019-810) and free for research and educational purposes.
Notes
The lead field is a matrix that projects source activity defined on an anatomical space onto the sensor space and is obtained by solving Maxwell's equations for MEG or EEG on a head model.
References
- Anzolin, A., Presti, P., Van De Steen, F., Astolfi, L., Haufe, S., & Marinazzo, D. (2019). Quantifying the effect of demixing approaches on directed connectivity estimated between reconstructed EEG sources. Brain Topography, 32(4), 655–674. [DOI] [PubMed] [Google Scholar]
- Asadzadeh, S., Yousefi Rezaii, T., Beheshti, S., Delpak, A., & Meshgini, S. (2020). A systematic review of EEG source localization techniques and their applications on diagnosis of brain abnormalities. Journal of Neuroscience Methods, 339, 108740. [DOI] [PubMed] [Google Scholar]
- Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge: Cambridge University Press. [Google Scholar]
- Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129–1159. [DOI] [PubMed] [Google Scholar]
- Bien, J., & Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix. Biometrika, 98(4), 807–820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K.-M., & Robbins, K. A. (2015). The PREP pipeline: Standardized preprocessing for large-scale EEG analysis. Frontiers in Neuroinformatics, 9, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bigdely-Shamlo, N., Mullen, T., Kreutz-Delgado, K., & Makeig, S. (2013). Measure projection analysis: A probabilistic approach to EEG source comparison and multisubject inference. NeuroImage, 72, 287–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Biscay, R. J., Bosch-Bayard, J. F., & Pascual-Marqui, R. D. (2018). Unmixing EEG Inverse solutions based on brain segmentation. Frontiers in Neuroscience, 12(May). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Böl, M., Weikert, R., & Weichert, C. (2011). A coupled electromechanical model for the excitation-dependent contraction of skeletal muscle. Journal of the Mechanical Behavior of Biomedical Material, 4, 1299–1310. [DOI] [PubMed] [Google Scholar]
- Brunner, C., Birbaumer, N., Blankertz, B., Guger, C., Kübler, A., Mattia, D., … Müller-Putz, G. R. (2015). BNCI Horizon 2020: Towards a roadmap for the BCI community. Brain-Computer Interfaces, 2(1), 1–10. [Google Scholar]
- Buzzell, G. A., Richards, J. E., White, L. K., Barker, T. V., Pine, D. S., & Fox, N. A. (2017). Development of the error-monitoring system from ages 9–35: Unique insight provided by MRI-constrained source localization of EEG. NeuroImage, 157, 13–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charles, A., Asif, M. S., Romberg, J., & Rozell, C. (2011). Sparsity penalties in dynamical system estimation. In Proceedings of the 2011 45th Annual Conference on Information Sciences and Systems. doi:10.1109/CISS.2011.5766179 [Google Scholar]
- Chavarriaga, R., & del R. Millán, J. (2010). Learning from EEG error-related potentials in noninvasive brain-computer interfaces. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 18(4), 381–388. [DOI] [PubMed] [Google Scholar]
- Cheung, B. L. P., Riedner, B. A., Tononi, G., & Van Veen, B. D. (2010). Estimation of cortical connectivity from EEG using state-space models. IEEE Transactions on Biomedical Engineering, 57, 2122–2134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cichocki, A. & Amari, S.-i. (2002). Adaptive blind signal and image processing. Chichester, UK: Wiley. [Google Scholar]
- Comon, P. (1994). Independent component analysis: A new concept? Signal Processing, 36(3), 287–314. [Google Scholar]
- Cotter, S., Rao, B. D., & Kreutz-Delgado, K. (2005). Sparse solutions to linear in verse problems with multiple measurement vectors. IEEE Transactions on Signal Processing, 53(7), 2477–2488. [Google Scholar]
- Dale, A. M., & Sereno, M. I. (1993). Improved localization of cortical activity by combining EEG and MEG with MRI cortical surface reconstruction: A linear approach. Journal of Cognitive Neuroscience, 5(2), 162–176. [DOI] [PubMed] [Google Scholar]
- Darvas, F., Ermer, J. J., Mosher, J. C., & Leahy, R. M. (2006). Generic head models for atlas-based EEG source analysis. Human Brain Mapping, 27(2), 129–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daunizeau, J., Kiebel, S. J., & Friston, K. J. (2009). Dynamic causal modelling of distributed electromagnetic responses. NeuroImage, 47(2), 590–601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delorme, A., Palmer, J., Onton, J., Oostenveld, R., & Makeig, S. (2012). Independent EEG sources are dipolar. PLOS One, 7(2), e30135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Desikan, R. S., Ségonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., … Killiany, R. J. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980. [DOI] [PubMed] [Google Scholar]
- Duque-Muñoz, L., Tierney, T. M., Meyer, S. S., Boto, E., Holmes, N., Roberts, G., … Barnes, G. R. (2019). Data-driven model optimization for optically pumped magnetometer sensor arrays. Human Brain Mapping, 40(15), 4357–4369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Engemann, D. A., & Gramfort, A. (2015). Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals. NeuroImage, 108, 328–342. [DOI] [PubMed] [Google Scholar]
- Faul, A. C., & Tipping, M. E. (2002). Analysis of sparse Bayesian learning. In Dietterich T. G., Becker S., & Ghahramani Z. (Eds.), Advances in neural information processing systems, 14 (pp. 383–389). Red Hook, NY: Curran. [Google Scholar]
- Friston, K., Harrison, L., Daunizeau, J., Kiebel, S., Phillips, C., Trujillo-Barreto, N., … Mattout, J. (2008). Multiple sparse priors for the M/EEG inverse problem. NeuroImage, 39(3), 1104–1120. [DOI] [PubMed] [Google Scholar]
- Fujiwara, Y., Yamashita, O., Kawawaki, D., Doya, K., Kawato, M., Toyama, K., & Sato, M.-a. (2009). A hierarchical Bayesian method to resolve an inverse problem of MEG contaminated with eye movement artifacts. NeuroImage, 45(2), 393–409. [DOI] [PubMed] [Google Scholar]
- Fukushima, M., Yamashita, O., Kanemura, A., Ishii, S., Kawato, M., & Sato, M. A. (2012). A state-space modeling approach for localization of focal current sources from MEG. IEEE Transactions on Biomedical Engineering, 59(6), 1561–1571. [DOI] [PubMed] [Google Scholar]
- Fukushima, M., Yamashita, O., Knösche, T. R., & Sato, M.-a. (2015). MEG source reconstruction based on identification of directed source interactions on whole-brain anatomical networks. NeuroImage, 105:408–427. [DOI] [PubMed] [Google Scholar]
- Galka, A., Yamashita, O., Ozaki, T., Biscay, R., & Valdés-Sosa, P. (2004). A solution to the dynamical inverse problem of EEG generation using spatiotemporal Kalman filtering. NeuroImage, 23(2), 435–453. [DOI] [PubMed] [Google Scholar]
- Gehring, W. J., Liu, Y., Orr, J. M., & Carp, J. (2011). The error-related negativity (ERN/Ne). New York: Oxford University Press. [Google Scholar]
- Giraldo, E., den Dekker, a. J., & Castellanos-Dominguez, G. (2010). Estimation of dynamic neural activity using a Kalman filter approach based on physiological models. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 2914–2917). Piscataway, NJ: IEEE. [DOI] [PubMed] [Google Scholar]
- Gramfort, A., Papadopoulo, T., Olivi, E., & Clerc, M. (2010). OpenMEEG: Open-source software for quasistatic bioelectromagnetics. Biomedical Engineering Online, 9, 45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hallez, H., Vanrumste, B., Grech, R., Muscat, J., De Clercq, W., Vergult, A., … Lemahieu, I. (2007). Review on solving the forward problem in EEG source analysis. Journal of NeuroEngineering and Rehabilitation, 4(1), 46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henson, R. N., Wakeman, D. G., Litvak, V., & Friston, K. J. (2011). A parametric empirical Bayesian framework for the EEG/MEG inverse problem: Generative models for multi-subject and multi-modal integration. Frontiers in Human Neuroscience, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang, Y., Parra, L. C., & Haufe, S. (2016). The New York Head—A precise standardized volume conductor model for EEG source localization and tES targeting. NeuroImage, 140, 159–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janani, A. S., Grummett, T. S., Lewis, T. W., Fitzgibbon, S. P., Whitham, E. M., DelosAngeles, D., … Pope, K. J. (2017). Evaluation of a minimum-norm based beamforming technique, sLORETA, for reducing tonic muscle contamination of EEG at sensor level. Journal of Neuroscience Methods, 288, 17–28. [DOI] [PubMed] [Google Scholar]
- Julier, S. J., & Uhlmann, J. K. (2004). Unscented filtering and nonlinear estimation. In Proceedings of the IEEE, 92(3), 401–422. [Google Scholar]
- Jung, T. P., Makeig, S., Humphries, C., Lee, T. W., Mckeown, M. J., Iragui, V., & Sejnowski, T. J. (2000). Removing electroencephalographic artifacts by blind source separation. Psychophysiology. [PubMed] [Google Scholar]
- Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 81(1), 35–45. [Google Scholar]
- Kang, W., & Xu, L. (2018). Sparsity-based Kalman filters for data assimilation. arXiv:1810.04236. [Google Scholar]
- Lamus, C., Hämäläinen, M. S., Temereanca, S., Brown, E. N., & Purdon, P. L. (2012). A spatiotemporal dynamic distributed solution to the MEG inverse problem. NeuroImage, 63, 894–909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Le, Q. V., Karpenko, A., Ngiam, J., & Ng, A. (2011). ICA with reconstruction cost for efficient overcomplete feature learning. In Shawe-Taylor J., Zemel R., Bartlett P., Pereira F., & Weinberger K. (Eds.), Advances in neural information processing systems, 24 (pp. 1017–1025). Red Hook, NY: Curran. [Google Scholar]
- Lei, X., Wu, T., & Valdes-Sosa, P. A. (2015). Incorporating priors for EEG source imaging and connectivity analysis. Frontiers in Neuroscience, 9(August), 284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewicki, M. S., & Sejnowski, T. J. (1998). Learning nonlinear overcomplete representations for efficient coding. In Jordan M., Kearns M., Solla S. (Eds.), Advances in neural information processing systems, 10 (pp. 556–562). Cambridge, MA: Curran. [Google Scholar]
- MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4(3), 415–447. [Google Scholar]
- MacKay, D. J. C. (2008). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press. [Google Scholar]
- Makeig, S., Jung, T. P., Bell, A. J., Ghahramani, D., & Sejnowski, T. J. (1997). Blind separation of auditory event-related brain responses into independent components. In Proceedings of the National Academy of Sciences of the United States of America, 94, 10979–10984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makeig, S., & Onton, J. (2011). ERP features and EEG dynamics. New York: Oxford University Press. [Google Scholar]
- Neal, R. M. (1996). Lecture notes in statistics: Vol. 118, Bayesian learning for neural network. New York: Springer. [Google Scholar]
- Nolte, G., Marzetti, L., & Valdes Sosa, P. (2009). Minimum overlap component analysis (MOCA) of EEG/MEG data for more than two sources. Journal of Neuroscience Methods, 183, 72–76. [DOI] [PubMed] [Google Scholar]
- Nunez, P. L., & Srinivasan, R. (2006). Electric fields of the brain. New York: Oxford University Press. [Google Scholar]
- Ojeda, A., Kreutz-Delgado, K., & Mullen, T. (2018). Fast and robust block-sparse Bayesian learning for EEG source imaging. NeuroImage, 174(8), 449–462. [DOI] [PubMed] [Google Scholar]
- Olier, I., Trujillo-Barreto, N. J., & El-Deredy, W. (2013). A switching multi-scale dynamical network model of EEG/MEG. NeuroImage, 83, 262–287. [DOI] [PubMed] [Google Scholar]
- Onton, J., Westerfield, M., Townsend, J., & Makeig, S. (2006). Imaging human EEG dynamics using independent component analysis. Neuroscience and Biobehavioral Reviews, 30(6), 808–822. [DOI] [PubMed] [Google Scholar]
- Owen, J. P., Wipf, D. P., Attias, H. T., Sekihara, K., & Nagarajan, S. S. (2012). Performance evaluation of the Champagne source reconstruction algorithm on simulated and real M/EEG data. NeuroImage, 60(1), 305–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ozaki, T. (2012). Time series modeling of neuroscience data. Boca Raton, FL: CRC Press. [Google Scholar]
- Padilla, L. E., & Rowley, C. W. (2010). An adaptive-covariance-rank algorithm for the unscented Kalman filter. In Proceedings of the 49th IEEE Conference on Decision and Control (pp. 1324–1329). Piscataway, NJ: IEEE. [Google Scholar]
- Paz-Linares, D., Vega-Hernández, M., Rojas-López, P. A., Valdés-Hernández, P. A., Martínez-Montes, E., & Valdés-Sosa, P. A. (2017). Spatiotemporal EEG source imaging with the hierarchical Bayesian elastic net and elitist lasso models. Frontiers in Neuroscience, 11, 635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penny, W., Friston, K., Ashburner, J., Kiebel, S., & Nichols, T. (2007). Statistical parametric mapping. Amsterdam: Elsevier. [Google Scholar]
- Pion-Tonachini, L., Makeig, S., & Kreutz-Delgado, K. (2017). Crowd labeling latent Dirichlet allocation. Knowledge and Information Systems, 53(3), 749–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Radüntz, T., Scouten, J., Hochmuth, O., & Meffert, B. (2017). Automated EEG artifact elimination by applying machine learning algorithms to ICA-based features. Journal of Neural Engineering, 14, 046004. [DOI] [PubMed] [Google Scholar]
- Tamburro, G., Fiedler, P., Stone, D., Haueisen, J., & Comani, S. (2018). A new ICA- based fingerprint method for the automatic removal of physiological artifacts from EEG recordings. PeerJ, 6, e4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tipping, M. E. (2001). Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research, 1, 211–245. [Google Scholar]
- Treder, M. S., Bahramisharif, A., Schmidt, N. M., Van Gerven, M. A., & Blankertz, B. (2011). Brain-computer interfacing using modulations of alpha activity induced by covert shifts of attention. Journal of NeuroEngineering and Rehabilitation, 8(1), 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valdes-Sosa, P. A., Sanchez-Bornot, J. M., Sotero, R. C., Iturria-Medina, Y., Aleman- Gomez, Y., Bosch-Bayard, J., … Ozaki, T. (2009). Model driven EEG/fMRI fusion of brain oscillations. Human Brain Mapping, 30, 2701–2721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valdés-Sosa, P. A., Vega-Hernández, M., Sánchez-Bornot, J. M., Martínez-Montes, E., & Bobes, M. A. (2009). EEG source imaging with spatio-temporal tomographic nonnegative independent component analysis. Human Brain Mapping, 30(6), 1898–1910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Der Maaten, L.& Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2625. [Google Scholar]
- Ward, T., Bernier, R., Mukerji, C., Perszyk, D., McPartland, J. C., Johnson, E., … Perszyk, D. (2013). Feedback-related negativity. In Encyclopedia of autism spectrum disorders (pp. 1256–1257). New York: Springer. [Google Scholar]
- Winkler, I. M., Haufe, S., & Tangermann, M. (2011). Automatic classification of artifactual ICA: Components for artifact removal in EEG signals. Behavioral and Brain Functions, 7(1), 30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wipf, D. P., & Nagarajan, S. S. (2008). A new view of automatic relevance determination. In Platt J., Koller D., Singer Y., & Roweis S. (Eds.), Advances in neural information processing systems, 20 (pp. 1625–1632). Red Hook, NY: Curran. [Google Scholar]
- Wipf, D. P., Owen, J. P., Attias, H. T., Sekihara, K., & Nagarajan, S. S. (2010). Robust Bayesian estimation of the location, orientation, and time course of multiple correlated neural sources using MEG. NeuroImage, 49(1), 641–655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamashita, O., Galka, A., Ozaki, T., Biscay, R., & Valdes-Sosa, P. (2004). Recursive penalized least squares solution for dynamical inverse problems of EEG generation. Human Brain Mapping, 235(4), 221–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang, Y., Aminoff, E. M., Tarr, M. J., & Kass, R. E. (2016). A state-space model of cross-region dynamic connectivity in MEG/EEG. In Lee D., Sugiyama M., Luxburg U., Guyon I., & Garnett R. (Eds.), Advances in neural information processing systems, 29. Red Hook, NY: Curran. [Google Scholar]
- Zhang, Z., & Rao, B. D. (2013). Extension of SBL algorithms for the recovery of block sparse signals with intra-block correlation. IEEE Transactions on Signal Processing, 61(8), 2009–2015. [Google Scholar]