SWGMM: a semi-wrapped Gaussian mixture model for clustering of circular–linear data

Anandarup Roy¹,
Swapan K. Parui¹ &
Utpal Roy²

746 Accesses
13 Citations
Explore all metrics

Abstract

Finite mixture models are widely used to perform model-based clustering of multivariate data sets. Most of the existing mixture models work with linear data; whereas, real-life applications may involve multivariate data having both circular and linear characteristics. No existing mixture models can accommodate such correlated circular–linear data. In this paper, we consider designing a mixture model for multivariate data having one circular variable. In order to construct a circular–linear joint distribution with proper inclusion of correlation terms, we use the semi-wrapped Gaussian distribution. Further, we construct a mixture model (termed SWGMM) of such joint distributions. This mixture model is capable of approximating the distribution of multi-modal circular–linear data. An unsupervised learning of the mixture parameters is proposed based on expectation maximization method. Clustering is performed using maximum a posteriori criterion. To evaluate the performance of SWGMM, we choose the task of color image segmentation in LCH space. We present comprehensive results and compare SWGMM with existing methods. Our study reveals that the proposed mixture model outperforms the other methods in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Information-Theoretic Clustering for Gaussian Mixture Model via Divergence Factorization

Finite Kibble’s Bivariate Gamma Mixtures for Color Image Segmentation

Clustering Using Student t Mixture Copulas

Article Open access 13 February 2021

References

Agiomyrgiannakis Y, Stylianou Y (2009) Wrapped gaussian mixture models for modeling and high-rate quantization of phase data of speech. IEEE Trans Audio Speech Lang Process 17(4):775–786
Article Google Scholar
Amayri O, Bouguila N (2013) Beyond hybrid generative discriminative learning: spherical data classification. Pattern Anal Appl pp 1–21
Bahlmann C (2006) Directional features in online handwriting recognition. Pattern Recognit 39:115–125
Article Google Scholar
Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6:1345–1382
MathSciNet MATH Google Scholar
Boutemedjet S, Bouguila N, Ziou D (2009) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31(8):1429–1443
Article Google Scholar
Fernández-Durán JJ (2007) Models for circular–linear and circular–circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics 63:579–585
Article MathSciNet MATH Google Scholar
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
Article Google Scholar
Freixenet J, Muñoz X, Raba D, Martí J, Cufí X (2002) Yet another survey on image segmentation: region and boundary information integration. In: Proceedings of European conference on computer vision-Part III. Springer, Berlin, pp 408–422
Gong H, Shi J (2011) Conditional entropies as over-segmentation and under-segmentation metrics for multi-part image segmentation. Technical Report MS-CIS-11-17, University of Pennsylvania Department of Computer and Information Science
Jammalamadaka SR, Sengupta A (2001) Topics in Circular Statistics. World Scientific Publication Co Inc, New York
Johnson RA, Wehrly TE (1978) Some angular-linear distributions and related regression models. J Am Stat Assoc 73(363):602–606
Article MathSciNet MATH Google Scholar
Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166
Article Google Scholar
Mardia KV, Jupp P (2000) Directional Statistics. Wiley, New York
Mardia KV, Sutton TW (1978) A model for cylindrical variables with applications. J Royal Stat Soc Ser B (Methodological) 40(2):229–233
MATH Google Scholar
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of international conference computer vision, pp 416–423
Martin DR (2002) An empirical approach to grouping and segmentation. Ph.D. thesis, EECS Department, University of California, Berkeley
Mclachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York
Meilǎ M (2002) Comparing clusterings: an axiomatic view. In: Proceedings of international conference on machine learning. ACM, New York, pp 577–584
Peel D, Mclachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10:339–348
Article Google Scholar
Roy A, Parui SK, Roy U (2007) A beta mixture model based approach to text extraction from color images. In: Proceedings of international conference on advances in pattern recognition. World Scientific, New York, pp 321–326
Roy A, Parui SK, Roy U (2011) Color image segmentation using a semi-wrapped gaussian mixture model. In: proceedings of international conference on pattern recognition and machine intelligence. Springer, Berlin, pp 148–153
Roy A, Parui SK, Roy U (2012) A mixture model of circular–linear distributions for color image segmentation. Int J Comput Appl 58(9):6–11
Google Scholar
Shieh GS, Zheng SR, Shimizu K (2006) A bivariate generalized von Mises distribution with applications to circular genomes. Technical report, Institute of Statistical Science, Academia Sinica, Taiwan
Unnikrishnan R, Pantofaru C, Hebert M (2007) Toward objective evaluation of image segmentation algorithms. IEEE Trans Pattern Anal Mach Intell 29(6):929–944
Article Google Scholar
Wang Q, Kulkarni SR, Verd$\acute{u}$ S (2009) Divergence estimation for multidimensional densities via $k$-nearest-neighbor distances. IEEE Trans Inf Theory 55(5):2392–2405
Yang AY, Wright J, Ma Y, Sastry SS (2008) Unsupervised segmentation of natural images via lossy data compression. Comput Vis Image Underst 110:212–225
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata, 700108, India
Anandarup Roy & Swapan K. Parui
Department of Computer and System Sciences, Visva-Bharati University, Santiniketan, 731235, India
Utpal Roy

Authors

Anandarup Roy
View author publications
You can also search for this author in PubMed Google Scholar
Swapan K. Parui
View author publications
You can also search for this author in PubMed Google Scholar
Utpal Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anandarup Roy.

Appendices

Appendix A: MLE for wrapped Gaussian distribution

Let us assume that a wrapped-Gaussian distribution has the pdf as in Eq. 6. First we write the log likelihood for $n$ given samples $\varvec{\Upsilon } = \{\theta _1, \ldots , \theta _n\}$. The log likelihood takes the following form.

$$\begin{aligned} \Lambda (\varvec{\Upsilon },\mu ^{c},\sigma ^{c})=\sum \limits ^n_{i=1}\ln \left( \sum \limits _{w\in \mathbb {Z}} \mathcal {N}_{\mu ,\sigma }(\theta _i+2w\pi )\right) . \end{aligned}$$

(20)

In order to make Eq. 20 more tractable, we use the Jensen’s inequality. For each sample $\theta _i$, we compute the following.

$$\begin{aligned} \alpha (w|\theta _i) = \frac{{\mathcal {N}}_{\mu ,\sigma }(\theta _i + 2w\pi )}{\sum \limits _{w^{\prime }\in {\mathbb {Z}}}{{\mathcal {N}}_{\mu ,\sigma }}(\theta _i + 2w^{\prime }\pi )},\ \forall w,i. \end{aligned}$$

(21)

Next, we include $\alpha (w|\theta _i)$ inside the log likelihood to obtain the following form.

$$\begin{aligned} \Lambda ({\varvec{\Upsilon}},\mu ^{c},\sigma ^{c})=\sum \limits ^n_{i=1}\ln \left( \sum \limits _{w\in {\mathbb {Z}}} \alpha (w|\theta _i)\frac{{\mathcal {N}}_{\mu ,\sigma }(\theta _i+2w\pi )}{\alpha (w|\theta _i)}\right) . \end{aligned}$$

(22)

Now we may apply the Jensen’s inequality for concave functions. Then we have the following modified form of $\Lambda$.

$$\begin{aligned} \Lambda (\varvec{\Upsilon },\mu ^{c},\sigma ^{c})&\ge \sum \limits ^n_{i=1}\sum \limits _{w\in \mathbb {Z}} \alpha (w|\theta _i)\ln \left( \frac{{\mathcal {N}}_{\mu ,\sigma }(\theta _i+2w\pi )}{\alpha (w|\theta _i)}\right) \nonumber \\&= \sum \limits ^n_{i=1}\sum \limits _{w\in \mathbb {Z}} \alpha (w|\theta _i)\left[ \ln \left( {\mathcal {N}}_{\mu ,\sigma }(\theta _i+2w\pi )\right) -\ln {\alpha (w|\theta _i)}\right] \end{aligned}$$

(23)

Only the first term of Eq. 23 is significant during maximizing $\Lambda (\varvec{\Upsilon },\mu ^{c},\sigma ^{c})$. The rest of the work is straightforward by maximizing Eq. 23 with respect to $\mu ^{c}$ and $\sigma ^{c}$. In this way, we finally get the following estimates.

$$\begin{aligned} \mu ^{c}&= \frac{\sum \limits ^{n}_{i=1}\sum \limits _{w\in \mathbb {Z}}\alpha (w|\theta _i)(\theta _i + 2w\pi )}{n}, \end{aligned}$$

(24)

$$\begin{aligned} \left( \sigma ^{c}\right) ^2&= \frac{\sum \limits ^{n}_{i=1}\sum \limits _{w\in \mathbb {Z}}\alpha (w|\theta _i)(\theta _i + 2w\pi -\mu ^{c})^2}{n}. \end{aligned}$$

(25)

The above procedure can easily be extended for semi-wrapped Gaussian distributions involving one circular variable.

Appendix B: EM for wrapped Gaussian mixture model

The wrapped Gaussian mixture model (WGMM) for a random variable $\Theta$ has the following form:

$$\begin{aligned} f(\theta ) = \sum \limits _{h=1}^K P_h\mathcal {N}_{\mu ^c_h, \sigma ^c_h}^{w}(\theta ), \end{aligned}$$

(26)

where $P_h$ is, as before, the mixing proportion of the $h$th mixture component.

Let $\varvec{\Upsilon } = \{\theta _1, \ldots , \theta _n\}$ be a set of $n$ samples drawn from WGMM. Then the log likelihood for the mixture distribution is given as follows.

$$\begin{aligned} \Phi (\varvec{\Upsilon })=\sum \limits ^n_{i=1}\ln \sum \limits ^K_{h=1} \left( P_h\mathcal {N}_{\mu ^c_h,\sigma ^c_h}^{w}(\theta _i)\right) +\lambda \left( 1-\sum \limits ^K_{h=1}P_h\right) , \end{aligned}$$

(27)

where $\lambda$ is the lagrange multiplier.

In order to estimate the mixture parameters, we may apply the EM methodology. Here, the cluster information for a pixel is taken as the hidden variable. Then the distribution of hidden variables can be written as follows:

$$\begin{aligned} \beta (h|\theta _i) = \frac{P_h\mathcal {N}_{\mu ^c_h,\sigma ^c_h}^{w}(\theta _i)}{\sum \limits ^K_{l=1}P_l\mathcal {N}_{\mu ^c_l,\sigma ^c_l}^{w}(\theta _i)}. \end{aligned}$$

(28)

With the insertion of $\beta (h|\theta _i)$ and after applying the Jensen’s inequality, we have the following modified form for the log likelihood.

$$\begin{aligned} \Phi (\varvec{\Upsilon }) \ge \sum \limits ^n_{i=1}\sum \limits ^K_{h=1} \beta (h|\theta _i) \ln \left[ P_h\mathcal {N}_{\mu ^c_h,\sigma ^c_h}^{w}(\theta _i)\right] +\lambda \left( 1-\sum \limits ^K_{h=1}P_h\right) . \end{aligned}$$

(29)

The mixing proportion $P_h$ is estimated by Eq. 30.

$$\begin{aligned} P_h = \frac{1}{n}\sum \limits ^n_{i=1}\beta (h|\theta _i). \end{aligned}$$

(30)

Let us now turn to estimating the mixture parameters. This can be done by maximizing the log likelihood w.r.t. $\mu ^c_h$ and $\sigma ^c_h$. Since we do not need the mixing proportions any more, we have the following expression to estimate $\mu {^c_h}$ and $\sigma {^c_h}$.

$$\begin{aligned} \Phi _{\mu {^c_h},\sigma ^c_h}(\varvec{\Upsilon }) = \sum \limits ^n_{i=1}\sum \limits ^K_{h=1} \beta (h|\theta _i) \ln \sum \limits _{w\in \mathbb {Z}}\mathcal {N}_{\mu _h,\sigma _h}(\theta _i+2w\pi ). \end{aligned}$$

(31)

This equation can be made simpler with the help of $\alpha (w|\theta _i)$ defined in Eq. 21. In particular, with $\alpha (w|\theta _i)$ (for $\mu _h$ and $\sigma _h$), and using the Jensen’s inequality again, we can express $\Phi _{\mu ^c_h,\sigma ^c_h}(\varvec{\Upsilon })$ in the following “sum of logarithm” form:

$$\begin{aligned} \Phi _{\mu ^c_h,\sigma ^c_h}(\varvec{\Upsilon }) \ge \sum \limits ^n_{i=1}\sum \limits ^K_{h=1} \beta (h|\theta _i) \sum \limits _{w\in \mathbb {Z}}\alpha (w|\theta _i)\ln \left( \mathcal {N}_{\mu _h,\sigma _h}(\theta _i+2w\pi )\right) . \end{aligned}$$

(32)

Now since $\beta (h|\theta _i)$ is invariant under the change of $w$, we can write $q(h,w|\theta _i) = \beta (h|\theta _i).\alpha (w|\theta _i)$ inside the summation over $w$. In other words, we take both the cluster and wrapping information as the hidden information. Then the hidden variable distribution becomes $q(h,w|\theta _i)$. So, finally we have the following log likelihood to maximize for the parameters.

$$\begin{aligned} \Phi _{\mu ^c_h,\sigma ^c_h}(\varvec{\Upsilon }) = \sum \limits ^n_{i=1}\sum \limits ^K_{h=1} \sum \limits _{w\in \mathbb {Z}}q(h,w|\theta _i) \ln \left( \mathcal {N}_{\mu _h,\sigma _h}(\theta _i+2w\pi )\right) . \end{aligned}$$

(33)

On the basis of this, the estimates of $\mu ^c_h$ and $\sigma ^c_h$ are obtained as follows:

$$\begin{aligned} \mu ^c_h&= \frac{\sum \limits ^{n}_{i=1}\sum \limits _{w\in \mathbb {Z}}q(h,w|\theta _i)(\theta _i + 2w\pi )}{\sum \limits ^{n}_{i=1}\sum \limits _{w\in \mathbb {Z}}q(h,w|\theta _i)}, \end{aligned}$$

(34)

$$\begin{aligned} \left( \sigma ^c_h\right) ^2&= \frac{\sum \limits ^{n}_{i=1}\sum \limits _{w\in \mathbb {Z}}q(h,w|\theta _i)(\theta _i + 2w\pi -\mu ^c_h)^2}{\sum \limits ^{n}_{i=1}\sum \limits _{w\in \mathbb {Z}}q(h,w|\theta _i)}. \end{aligned}$$

(35)

This completes the estimation for WGMM. Now, since we have only one circular variable involved in SWGMM, Eqs. 34 and 35 can be extended for SWGMM in a straightforward way.

Appendix C: computations for SWGMM

For notational simplicity, we assume that the mean vector $\mu ^{\rm cl}$ (in Eq. 7) is a zero vector. Then, expanding Eq. 7, the semi-wrapped Gaussian density becomes:

$$\begin{aligned} \mathcal {N}_{\varvec{0}, \Sigma ^{\rm cl}}^{\rm sw}(\varvec{p})&= \frac{1}{(2\pi )^{k/2}|\Sigma |^{1/2}}\times \left[ \exp \left( -\frac{1}{2}\varvec{p}^T \Sigma ^{-1}\varvec{p}\right) + \right. \nonumber \\&\exp \left( -\frac{1}{2}(\varvec{p}-2\pi )^T \Sigma ^{-1}(\varvec{p}-2\pi )\right) + \nonumber \\&\left. \exp \left( -\frac{1}{2}(\varvec{p}+2\pi )^T \Sigma ^{-1}(\varvec{p}+2\pi )\right) \right] \end{aligned}$$

(36)

Let the $(i,j)$th element of the matrix $\Sigma ^{-1}$ be denoted by $a_{i,j}$. The inner products $(\varvec{p}\pm 2\pi )^T \Sigma ^{-1}(\varvec{p}\pm 2\pi )$ can be expressed as follows:

$$\begin{aligned} (\varvec{p}-2\pi )^T \Sigma ^{-1}(\varvec{p}-2\pi )&= \varvec{p}^T \Sigma ^{-1}\varvec{p} + 4\pi ^2 a_{1,1} - 4\pi C \\ (\varvec{p}+2\pi )^T \Sigma ^{-1}(\varvec{p}+2\pi )&= \varvec{p}^T \Sigma ^{-1}\varvec{p} + 4\pi ^2 a_{1,1} + 4\pi C \end{aligned}$$

where $C$ is the product $\varvec{p}^T\times C_1$ and $C_1$ is the first column of $\Sigma ^{-1}$. Hence, we compute $\varvec{p}^T \Sigma ^{-1}\varvec{p}$ once and obtain the other inner products by simple addition and subtraction. Also note that the product term $C$ is a byproduct when we compute $\varvec{p}^T \Sigma ^{-1}\varvec{p}$ and hence needs no extra computation.

Finally, we can express $(\varvec{p}_i \pm 2\pi )(\varvec{p}_i \pm 2\pi )^T$ in terms of $\varvec{p}_i\varvec{p}_i^T$ by changing only the first row and the first column of $\varvec{p}_i\varvec{p}_i^T$. This simplifies the estimation of covariance matrix (Eq. 14).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roy, A., Parui, S.K. & Roy, U. SWGMM: a semi-wrapped Gaussian mixture model for clustering of circular–linear data. Pattern Anal Applic 19, 631–645 (2016). https://doi.org/10.1007/s10044-014-0418-2

Download citation

Received: 05 December 2013
Accepted: 07 September 2014
Published: 05 October 2014
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10044-014-0418-2

SWGMM: a semi-wrapped Gaussian mixture model for clustering of circular–linear data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Information-Theoretic Clustering for Gaussian Mixture Model via Divergence Factorization

Finite Kibble’s Bivariate Gamma Mixtures for Color Image Segmentation

Clustering Using Student t Mixture Copulas

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: MLE for wrapped Gaussian distribution

Appendix B: EM for wrapped Gaussian mixture model

Appendix C: computations for SWGMM

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

SWGMM: a semi-wrapped Gaussian mixture model for clustering of circular–linear data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Information-Theoretic Clustering for Gaussian Mixture Model via Divergence Factorization

Finite Kibble’s Bivariate Gamma Mixtures for Color Image Segmentation

Clustering Using Student t Mixture Copulas

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: MLE for wrapped Gaussian distribution

Appendix B: EM for wrapped Gaussian mixture model

Appendix C: computations for SWGMM

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation