A study of the fixed points and spurious solutions of the deflation-based FastICA algorithm

Tianwen Wei ORCID: orcid.org/0000-0001-7488-5727^1,2

405 Accesses
5 Citations
Explore all metrics

Abstract

The FastICA algorithm is one of the most popular algorithms in the domain of independent component analysis (ICA). Despite its success, it is observed that FastICA occasionally yields outcomes that do not correspond to any true solutions (known as demixing vectors) of the ICA problem. These outcomes are commonly referred to as spurious solutions. Although FastICA is a well-studied ICA algorithm, the occurrence of spurious solutions is not yet completely understood by the ICA community. In this contribution, we aim at addressing this issue. In the first part of this work, we are interested in characterizing the relationship between demixing vectors, local optimizers of the contrast function and (attractive or unattractive) fixed points of FastICA algorithm. We will show that there exists an inclusion relationship between these sets. In the second part, we investigate the possible scenarios where spurious solutions occur. It will be shown that when certain bimodal Gaussian mixture distributions are involved, there may exist spurious solutions that are attractive fixed points of FastICA. In this case, popular nonlinearities such as “Gauss” or “tanh” tend to yield spurious solutions, whereas “kurtosis” gives much more reliable results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Free Component Analysis: Theory, Algorithms and Applications

Article 11 April 2022

Component Elimination Strategies to Fit Mixtures of Multiple Scale Distributions

On the Optimal Non-linearities for Gaussian Mixtures in FastICA

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

In this paper, notation $\subset$ stands for the “subset” rather than the “proper subset” inclusion. Hence, ${\mathbb {D}}\subset {\mathbb {L}}$ does not exclude ${\mathbb {D}}={\mathbb {L}}$.

References

Wei T (2014) On the spurious solutions of the FastICA algorithm. In: 2014 IEEE workshop on statistical signal processing (SSP) (SSP’14), Gold Coast, Australia, June 2014, pp 161–164
Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, London
Google Scholar
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley-Interscience, New York
Book Google Scholar
Amari S-I, Cichocki A (2002) Adaptive blind signal and image processing. Wiley, New York
Google Scholar
Cardoso JF, Souloumiac A (1993) Blind beamforming for non-gaussian signals. IEEE Proc F 140(6):362–370
Google Scholar
Comon P (1994) Independent component analysis: a new concept? Signal Process 36(3):287–314
Article MATH Google Scholar
Hyvärinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):1483–1492
Article Google Scholar
Zarzoso V, Comon P (2010) Robust independent component analysis by iterative maximization of the kurtosis contrast with algebraic optimal step size. IEEE Trans Neural Netw 21(2):248–261
Article Google Scholar
Hyvärinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3):626–634
Article Google Scholar
Delfosse N, Loubaton P (1995) Adaptive blind separation of independent sources: a deflation approach. Signal Process 45(1):59–83
Article MATH Google Scholar
Oja E, Yuan Z (2006) The FastICA algorithm revisited: convergence analysis. IEEE Trans Neural Netw 17(6):1527–1541
Article Google Scholar
Regalia PA, Kofidis E (2003) Monotonic convergence of fixed-point algorithms for ICA. IEEE Trans Neural Netw 14(4):943–949
Article Google Scholar
Tichavsky P, Koldovsky Z, Oja E (2006) Performance analysis of the FastICA algorithm and Cramér-Rao bounds for linear independent component analysis. IEEE Trans Signal Process 54(4):1189–1203
Article Google Scholar
Ollila E, Kim H-J, Koivunen V (2008) Compact Cramér-Rao bound expression for independent component analysis. IEEE Trans Signal Process 56(4):1421–1428
Article MathSciNet Google Scholar
Basiri S, Ollila E, Koivunen V (2014) Fast and robust bootstrap method for testing hypotheses in the ICA model. In: acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on, May 2014, pp 6–10
Shen H, Kleinsteuber M, Hüper K (2008) Local convergence analysis of FastICA and related algorithms. IEEE Trans Neural Netw 19(6):1022–1032
Article Google Scholar
Hyvärinen A (1997) One-unit contrast functions for independent component analysis: a statistical analysis. In: Proceedings of the IEEE NNSP workshop ’97. Neural networks for signal processing VII
Ollila E (2010) The deflation-based FastICA estimator: Statistical analysis revisited. IEEE Trans Signal Process 58(3):1370–1381
Article MathSciNet Google Scholar
Miettinen J, Nordhausen K, Oja H, Taskinen S (2014) Deflation-based fastica with adaptive choices of nonlinearities. IEEE Trans Signal Process 62(21):5716–5724
Article MathSciNet Google Scholar
Wei T (2015) A convergence and asymptotic analysis of the generalized symmetric FastICA algorithm (submitted). ArXiv
Wei T (2015) An overview of the asymptotic performance of the family of the FastICA algorithms (accepted). In: latent variable analysis and signal separation, ser. lecture notes in computer science, August 2015
Douglas S (2003) On the convergence behavior of the FastICA algorithm. In: Proceedings of the 4th symposium on independent component analysis blind source separation, Nara, Japan, pp 409–414
Vrins F, Verleysen M (2005) On the entropy minimization of a linear mixture of variables for source separation. Signal Process 85(5):1029–1044
Article MATH Google Scholar
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
Article Google Scholar
Dermoune A, Wei T (2013) FastICA algorithm: Five criteria for the optimal choice of the nonlinearity function. IEEE Trans Signal Process 61(8):2078–2087
Article Google Scholar
Himberg J, Hyvärinen A (2003) ICASSO: software for investigating the reliability of ICA estimates by clustering and visualization. In: In Proceedings of the 2003 IEEE workshop on neural networks for signal processing (NNSP2003), pp 259–268

Download references

Acknowledgments

The author would like to express the deepest gratitude to Prof. A. Dermoune for his invaluable guidance. The author would also like to thank the anonymous referees for carefully reading the manuscript and for giving us many helpful and constructive suggestions resulting in the present work.

Author information

Authors and Affiliations

Laboratoire de Mathématiques de Besancon, Université de Franche-Comté, Besancon, France
Tianwen Wei
Department of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
Tianwen Wei

Authors

Tianwen Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianwen Wei.

Appendix

1.1 Proof of Theorem 7

Without loss of generality, we may suppose that the mixing matrix ${\mathbf {A}}$ is an identity matrix. Let ${\mathbf {v}}$ be a fixed point, and write ${\mathbf {v}}=(v_1,\ldots ,v_d)^{\mathsf {T}}$. By (14), we have for any $i\in \{1,\ldots ,d\}$,

$$\begin{aligned} {\mathbb {E}}\Big [\sum _{j=1}^d({v_j s_j})^3s_i\Big ] = {\mathbb {E}}\Big [\sum _{j=1}^d({v_j s_j})^4\Big ]v_i, \end{aligned}$$

or equivalently, after some algebraic simplifications,

$$\begin{aligned} \kappa _i v_i^3 + 3 v_i = \Big (\sum _{j=1}^d v_j^4 \kappa _j +3\Big )v_i, \end{aligned}$$

(26)

where $\kappa _i\mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[s_i^4]-3\ne 0$ by assumption ${\mathcal {A}}_2$ (see Sect. 3.1). Denote by ${\mathcal {I}}$ the set of indices i such that $v_i\ne 0$. It follows from (26) that for $i\in {\mathcal {I}}$

$$\begin{aligned} v_i^2 = \frac{1}{\kappa _i}\sum _{j=1}^d v_j^4 \kappa _j. \end{aligned}$$

(27)

Since $\sum _{i=1}^d v_i^2=1$, we deduce from (27)

$$\begin{aligned} \sum _{i\in {\mathcal {I}}} \Big ( \frac{1}{\kappa _i}\sum _{j=1}^d v_j^4 \kappa _j \Big ) = 1, \end{aligned}$$

or equivalently

$$\begin{aligned} \sum _{j=1}^d v_j^4 \kappa _j= \Big (\sum _{i\in {\mathcal {I}}} \kappa _i^{-1} \Big )^{-1}. \end{aligned}$$

Then, we can rewrite (26) as

$$\begin{aligned} v_i^2 = \kappa _i^{-1} \Big (\sum _{j\in {\mathcal {I}}} \kappa _j^{-1} \Big )^{-1},\quad i\in {\mathcal {I}}. \end{aligned}$$

Now let us calculate ${\mathbf {M}}\mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {s}}{\mathbf {s}}^{\mathsf {T}}]$. We have

$$\begin{aligned} {\mathbf {M}}_{ij}= & {} {\mathbb {E}}\Big [3 \sum _k (v_k s_k)^2 s_is_j \Big ] = 6v_iv_j\\ {\mathbf {M}}_{ii}= & {} {\mathbb {E}}\Big [3 \sum _k (v_k s_k)^2 s_i^2 \Big ]=3(\kappa _i v_i^2 + 2v_i^2 + 1). \end{aligned}$$

From this, we deduce that ${\mathbf {M}}=3(2{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}+ {\mathbf {I}}+{\mathbf {D}})$, where ${\mathbf {D}}$ is a diagonal matrix with the ith diagonal entry ${\mathbf {D}}_i=\kappa _i v_i^2$. Since

$$\begin{aligned} \alpha ({\mathbf {v}})={\mathbb {E}}\Big [3- \Big (\sum _{i=1}^dv_is_i\Big )^4\Big ]=-\sum _i v_i^4 \kappa _i, \end{aligned}$$

it follows from (27) that ${\mathbf {D}}_i=-\alpha ({\mathbf {v}})$ for $i\in {\mathcal {I}}$ and ${\mathbf {D}}_i=0$ otherwise. Besides, since ${\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {I}}]=3{\mathbf {I}}$, we have ${\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {s}})({\mathbf {I}}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})]=3{\mathbf {I}}-{\mathbf {M}}=-3({\mathbf {D}}+ 2{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})$. From this, we deduce that

$$\begin{aligned} \varvec{f}'({\mathbf {v}})= & {} \frac{({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}){\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {x}})({\mathbf {I}}-{\mathbf {x}}{\mathbf {x}}^{\mathsf {T}})]}{ |\alpha ({\mathbf {v}}) |} \nonumber \\= & {} \frac{-3({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})({\mathbf {D}}+2{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})}{ |\alpha ({\mathbf {v}}) |} \nonumber \\= & {} 3{\mathrm {sign}}(\alpha ({\mathbf {v}})) ({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})\bar{\mathbf {D}}, \end{aligned}$$

(28)

where $\bar{\mathbf {D}}\mathop {=}\limits ^{\mathrm {def}}-{\mathbf {D}}/|\alpha ({\mathbf {v}})|$. Clearly, the diagonal entry of $\bar{\mathbf {D}}$ satisfies $\bar{\mathbf {D}}_i=1$ for $i\in {\mathcal {I}}$ and $\bar{\mathbf {D}}_i=0$ otherwise. Besides, it is easy to see that $\bar{\mathbf {D}}{\mathbf {v}}={\mathbf {v}}$, which implies ${\mathrm {span}}({\mathbf {v}})\subset {\mathrm {range}}(\bar{\mathbf {D}})$. Denote by $\#{\mathcal {I}}$ the cardinal of ${\mathcal {I}}$. Since $\dim ({\mathrm {span}}({\mathbf {v}}))=1$ and $\dim ({\mathrm {range}}(\bar{\mathbf {D}}))=\#{\mathcal {I}}\ge 1$, this inclusion becomes an equality if and only if $\#{\mathcal {I}}=1$, i.e., there is exactly one entry $v_i\ne 0$, or equivalently, ${\mathbf {v}}={\mathbf {e}}_i\in {\mathbb {D}}$ for some i. If this is the case, then we have immediately $\varvec{f}'({\mathbf {v}})=0$ by (28). Otherwise, take any vector ${\mathbf {u}}=(u_1,\ldots ,u_d)^{\mathsf {T}}$ such that $\Vert {\mathbf {u}}\Vert =1$, ${\mathbf {u}}^{\mathsf {T}}{\mathbf {v}}=0$ and $u_i=0$ for $i\ne {\mathcal {I}}$. We have

$$\begin{aligned} \varvec{f}'({\mathbf {v}}){\mathbf {u}}=3{\mathrm {sign}}(\alpha ({\mathbf {v}})) ({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})\bar{\mathbf {D}}{\mathbf {u}}=3{\mathrm {sign}}(\alpha ({\mathbf {v}})){\mathbf {u}}. \end{aligned}$$

On the one hand, by the submultiplicativity of spectral norm,

$$\begin{aligned} \Vert \varvec{f}'({\mathbf {v}})\Vert \le 3\Vert {\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}\Vert \Vert \bar{\mathbf {D}}\Vert =3; \end{aligned}$$

on the other hand, there also holds

$$\begin{aligned} \Vert \varvec{f}'({\mathbf {v}})\Vert =\sup _{\mathbf {w}\in {\mathcal {S}}}\Vert \varvec{f}'({\mathbf {v}}){\mathbf {w}}\Vert \ge \Vert \varvec{f}'({\mathbf {v}}){\mathbf {u}}\Vert =3. \end{aligned}$$

It then follows that $\Vert \varvec{f}'({\mathbf {v}})\Vert =3$ for any ${\mathbf {v}}\in {\mathbb {F}}\backslash {\mathbb {D}}$. This fact also implies ${\mathbb {D}}={\mathbb {L}}$.

1.2 Proof of Proposition 8

Without loss of generality, in what follows we take ${\mathbf {A}}={\mathbf {I}}$ for simplicity.

1.2.1 Case $d=2$

Let us first consider the simplest case $d=2$. If $\alpha ({\mathbf {e}}_1)$ and $\alpha ({\mathbf {e}}_2)$ have the same sign, say, positive, then ${\mathbf {e}}_1$ and ${\mathbf {e}}_2$ are local minimizers of the contrast function ${\mathcal {J}}(\cdot )$ on ${\mathcal {S}}$. Write

$$\begin{aligned} {\mathbf {w}}(\theta )= & {} \cos (\theta ){\mathbf {e}}_1+ \sin (\theta ){\mathbf {e}}_2 = \left(\begin{array}{l}\cos (\theta ) \\ \sin (\theta )\end{array}\right), \end{aligned}$$

(29)

$$\begin{aligned} f(\theta )= & {} {\mathcal {J}}({\mathbf {w}}(\theta ))={\mathbb {E}}[G(\cos (\theta )s_1 + \sin (\theta )s_2)] . \end{aligned}$$

(30)

Then, it is easy to see that $\theta _1=0$ and $\theta _2=\pi /2$ are local minimizers of $f(\theta )$ on ${\mathbb {R}}$. From that we deduce immediately that f reaches its local maximum at some internal point $\theta _0\in (\theta _1,\theta _2)$. The corresponding vector ${\mathbf {v}}\mathop {=}\limits ^{\mathrm {def}}\big (\cos (\theta _0), \sin (\theta _0)\big )^{\mathsf {T}}$ is then a local maximizer of ${\mathcal {J}}({\mathbf {w}})$ on ${\mathcal {S}}$.

We actually proved the following result:

Lemma 10

Let $s_1$ and $s_2$ be two random variables such that the quantity ${\mathbb {E}}[g'(s_i) - g(s_i)s_i]$ has the same sign for $i=1,2.$ Then, $f(\theta )\mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[G(\cos (\theta )s_1 + \sin (\theta )s_2)]$ has a local optimum at some $\theta _0\in (0,2\pi )$. Moreover, the angle $\theta _0$ satisfies

$$\begin{aligned} {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta _0)^{\mathsf {T}}{\mathbf {s}}\big ){\mathbf {s}}\Big ] = {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta _0)^{\mathsf {T}}{\mathbf {s}}\big ){\mathbf {w}}(\theta _0)^{\mathsf {T}}{\mathbf {s}}\Big ]{\mathbf {w}}(\theta _0) \end{aligned}$$

(31)

Equality (31) comes directly from (14).

1.2.2 Case $d>2$

Suppose ${\mathbf {e}}_i$ and ${\mathbf {e}}_j$ are two demixing vectors such that both $\alpha ({\mathbf {e}}_i)$ and $\alpha ({\mathbf {e}}_j)$ are positive. Write

$$\begin{aligned} f^{(i,j)}(\theta ) \mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[G(\cos (\theta )s_i + \sin (\theta )s_j)]. \end{aligned}$$

By Lemma 10, there exists $\theta '\in (0,2\pi )$ such that $\theta '$ maximizes $f^{(i,j)}(\theta )$ and satisfies

$$\begin{aligned} {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}\big ){\mathbf {s}}_{ij}\Big ]= {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}\big ){\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}\Big ]{\mathbf {w}}(\theta '), \end{aligned}$$

(32)

where ${\mathbf {s}}_{ij}=(s_i,s_j)^{\mathsf {T}}$. Consider vector ${\mathbf {u}}=(u_1,\ldots ,u_d)$ with $u_i=\cos (\theta ')$, $u_j=\sin (\theta ')$ and $u_k=0$ for $k\ne i,j$. Clearly, ${\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}=\cos (\theta ')s_i + \sin (\theta ')s_j={\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}$. It then follows from (32) that

$$\begin{aligned} {\mathbb {E}}[g({\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {s}}] = {\mathbb {E}}[g({\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}]{\mathbf {u}}, \end{aligned}$$

(33)

which implies ${\mathbf {v}}\in {\mathbb {F}}$ by (14).

In the particular case that $s_1$ and $s_2$ have the same distribution, we must have $\cos (\theta ')=\sin (\theta ')=1/\sqrt{2}$ by symmetry. This means $\theta '=\pi /4$.

1.3 Proof of Proposition 9

Suppose that ${\mathbf {u}}\in {\mathbb {R}}^2$ is a spurious attractive fixed point in the case $d=2$ with $s_1{\sim } {\mathcal {D}}_1$ and $s_2{\sim } {\mathcal {D}}_2$ . Then $\Vert \varvec{f}'({\mathbf {u}})\Vert <1$ and

$$\begin{aligned} {\mathbf {u}}^{\mathsf {T}}{\mathbf {x}}= cs_1 + \sqrt{1-c^2}s_2 \end{aligned}$$

for some real scalar $c\in (0,1)$. Now let us consider the case $d=n>2$ with $s_i{\sim } {\mathcal {D}}_1$ and $s_j{\sim }{\mathcal {D}}_2$ for some indices $i\ne j$. In the sequel, we assume $i=1$, $j=2$ for simplicity. Take

$$\begin{aligned} {\mathbf {v}}=c{\mathbf {a}}_1 + \sqrt{1-c^2}{\mathbf {a}}_2\in {\mathbb {R}}^d. \end{aligned}$$

It is easy to see that ${\mathbf {v}}^{\mathsf {T}}{\mathbf {x}}=cs_1 + \sqrt{1-c^2}s_2$ and ${\mathbf {v}}\in {\mathbb {F}}$. Next, we show $\Vert \varvec{f}'({\mathbf {v}})\Vert =\Vert \varvec{f}'({\mathbf {u}})\Vert <1$, where $\varvec{f}'({\mathbf {v}})$ and $\varvec{f}'({\mathbf {u}})$ are, respectively, $n\times n$ and $2\times 2$ matrices. Note that these two “$\varvec{f}'(\cdot )$” are different mappings for they are determined by different ICA models. Denote respectively by ${\mathbf {A}}_{\mathbf {u}}$ and ${\mathbf {A}}_{\mathbf {v}}$ the mixing matrices for each case. Since the mixing matrix ${\mathbf {A}}_{\mathbf {u}}$ is orthogonal, for any ${\mathbf {w}}\in {\mathbb {R}}^2$ we have

$$\begin{aligned}&\Vert ({\mathbf {I}}-{\mathbf {w}}{\mathbf {w}}^{\mathsf {T}}){\mathbb {E}}[g'({\mathbf {w}}^{\mathsf {T}}{\mathbf {x}})({\mathbf {I}}-{\mathbf {x}}{\mathbf {x}}^{\mathsf {T}})]\Vert \nonumber \\&\quad =\Vert ({\mathbf {I}}-{\mathbf {A}}_{\mathbf {u}}^{\mathsf {T}}{\mathbf {w}}{\mathbf {w}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}){\mathbb {E}}[g'({\mathbf {w}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}{\mathbf {x}})({\mathbf {}I}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})]\Vert . \end{aligned}$$

(34)

Similar equality also holds for ${\mathbf {A}}_{\mathbf {v}}$ and ${\mathbf {w}}\in {\mathbb {R}}^n$. Denote

$$\begin{aligned}&{\mathbf {B}}_{\mathbf {u}}\mathop {=}\limits ^{\mathrm {def}}({\mathbf {I}}-{\mathbf {A}}_{\mathbf {u}}^{\mathsf {T}}{\mathbf {u}}{\mathbf {u}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}){\mathbb {E}}[g'({\mathbf {u}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}{\mathbf {s}})({\mathbf {I}}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})] \\&{\mathbf {B}}_{\mathbf {v}}\mathop {=}\limits ^{\mathrm {def}}({\mathbf {I}}-{\mathbf {A}}_{\mathbf {v}}^{\mathsf {T}}{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {v}}){\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {v}}{\mathbf {s}})({\mathbf {I}}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})]. \end{aligned}$$

Using (34) and (16), we get

$$\begin{aligned} \Vert \varvec{f}'({\mathbf {u}})\Vert =\frac{\Vert {\mathbf {B}}_{\mathbf {u}}\Vert }{|\alpha ({\mathbf {u}})|},\quad \Vert \varvec{f}'({\mathbf {v}})\Vert =\frac{\Vert {\mathbf {B}}_{\mathbf {v}}\Vert }{|\alpha ({\mathbf {v}})|}. \end{aligned}$$

(35)

Notice that

$$\begin{aligned} {\mathbf {u}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}= & {} (c,\sqrt{1-c^2})\in {\mathbb {R}}^2 \\ {\mathbf {v}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {v}}= & {} (c,\sqrt{1-c^2},0,\ldots ,0)\in {\mathbb {R}}^d \end{aligned}$$

by the construction of ${\mathbf {u}}$ and ${\mathbf {v}}$. This implies ${\mathbf {u}}^{\mathsf {T}}{\mathbf {x}}$ and ${\mathbf {v}}^{\mathsf {T}}{\mathbf {x}}$ have the same distribution and therefore $\alpha ({\mathbf {u}})=\alpha ({\mathbf {v}})$. Besides, it is easily seen that

$$\begin{aligned} {\mathbf {B}}_{\mathbf {v}}=\left(\begin{array}{ll} {\mathbf {B}}_{\mathbf {u}}&{} {\mathbf {0}} \\ {\mathbf {0}} &{} {\mathbf {0}} \end{array}\right). \end{aligned}$$

(36)

From (36), we can deduce that $\Vert {\mathbf {B}}_{\mathbf {u}}\Vert =\Vert {\mathbf {B}}_{\mathbf {v}}\Vert$. Finally, combining this result with (35) gives $\Vert \varvec{f}'({\mathbf {v}})\Vert <1$.

1.4 Probability distributions used in Table 1

1.4.1 Generalized Gaussian distribution

The generalized Gaussian density function with parameter $\alpha$, zero mean and unit variance is given by

$$\begin{aligned} f_{\alpha }(x)=\frac{\alpha \beta _{\alpha }}{2\Gamma (1/\alpha )}\exp {\{-(\beta _{\alpha }|x|)^{\alpha }\}}, \end{aligned}$$

where $\alpha$ is a positive parameter that controls the distributions exponential rate of decay, $\Gamma$ is the Gamma function and

$$\begin{aligned} \beta _{\alpha }=\sqrt{\frac{\Gamma (3/\alpha )}{\Gamma (1/\alpha )}}. \end{aligned}$$

This generalized Gaussian family encompasses the ordinary standard normal distribution for $\alpha =2$ , the Laplace distribution for $\alpha =1$ and the uniform distribution in the limit $\alpha \rightarrow \infty$.

1.4.2 Bimodal distribution with Gaussian mixture

The bimodal distribution used in this paper consists of a mixture of two Gaussian distribution. Define random variable

$$\begin{aligned} X=Z Y_1 + (1-Z)Y_2, \end{aligned}$$

where $Y_i{\sim }{\mathcal {N}}(\mu _i,\sigma _i^2)$ and $Z{\sim }{\mathcal {B}}(p)$ are mutually independent random variables. Here, ${\mathcal {B}}(p)$ denotes the Bernoulli distribution with parameter p, i.e., ${\mathbb {P}}(Z=1)=p$ and ${\mathbb {P}}(Z=0)=1-p$. It is easy to see that the probability density function (PDF) of X is given by

$$\begin{aligned} f_X(x)= p f_{Y_1}(x) + (1-p)f_{Y_2}(x), \end{aligned}$$

where $f_{Y_i}$ is the PDF of $Y_i$ for $i=1,2$.

Now take any $\mu _1,\mu _2$ such that $\mu _1\mu _2<0$ and $|\mu _1\mu _2|\le 1$, then let $\sigma ^2_1=\sigma ^2_2=1-|\mu _1\mu _2|$ and

$$\begin{aligned} p =\frac{|\mu _2|}{|\mu _1|+|\mu _2|}. \end{aligned}$$

Defined in such a way, X is a random variable with zero mean, unit variance and two modes at $\mu _1$ and $\mu _2$. Since the PDF of X is completely determined by $\mu _1,\mu _2$, we use them as controlling parameter and denote by “Bimod$(\mu _1,\mu _2)$” the distribution of X. Note that if $\mu _1=-\mu _2$, then $p=1/2$ and the distribution of X is symmetrical. In this case, we write simply “Bimod ($\mu$)” with $\mu =|\mu _1|$. Note also that the “bpsk” distribution is actually Bimod (1).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, T. A study of the fixed points and spurious solutions of the deflation-based FastICA algorithm. Neural Comput & Applic 28, 13–24 (2017). https://doi.org/10.1007/s00521-015-2033-6

Download citation

Received: 16 March 2015
Accepted: 11 August 2015
Published: 04 September 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s00521-015-2033-6

A study of the fixed points and spurious solutions of the deflation-based FastICA algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Free Component Analysis: Theory, Algorithms and Applications

Component Elimination Strategies to Fit Mixtures of Multiple Scale Distributions

On the Optimal Non-linearities for Gaussian Mixtures in FastICA

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Theorem 7

1.2 Proof of Proposition 8

1.2.1 Case \(d=2\)

Lemma 10

1.2.2 Case \(d>2\)

1.3 Proof of Proposition 9

1.4 Probability distributions used in Table 1

1.4.1 Generalized Gaussian distribution

1.4.2 Bimodal distribution with Gaussian mixture

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A study of the fixed points and spurious solutions of the deflation-based FastICA algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Free Component Analysis: Theory, Algorithms and Applications

Component Elimination Strategies to Fit Mixtures of Multiple Scale Distributions

On the Optimal Non-linearities for Gaussian Mixtures in FastICA

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Theorem 7

1.2 Proof of Proposition 8

1.2.1 Case \(d=2\)

Lemma 10

1.2.2 Case \(d>2\)

1.3 Proof of Proposition 9

1.4 Probability distributions used in Table 1

1.4.1 Generalized Gaussian distribution

1.4.2 Bimodal distribution with Gaussian mixture

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation