[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3327757.3327774guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Representation learning of compositional data

Published: 03 December 2018 Publication History

Abstract

We consider the problem of learning a low dimensional representation for compositional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learning directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method.

References

[1]
J. Aitchison. The statistical analysis of compositional data (with discussion). Journal of the Royal Statistical Society B, 44(2):139-177, 1982.
[2]
J. Aitchison. Principal component analysis of compositional data. Biometrika, 70(1):57-65, 1983.
[3]
J. Aitchison. The Statistical Analysis of Compositional Data. Chapman and Hall, New York, 1986.
[4]
J. Aitchison. Principles of compositional data analysis. Multivariate Analysis and its Applications, 24:73-81, 1994.
[5]
J. Aitchison and J.-J. Egozcue. Compositional data analysis: Where are we and where should we be heading? Mathematical Geology Journal, 37:829-850, 2005.
[6]
S.-I. Amari. Information Geometry and Its Applications. Springer-Verlag, Berlin, 2016.
[7]
O. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory. Wiley Publishers, 1978.
[8]
J.-D. Boissonnat, F. Nielsen, and R. Nock. Bregman Voronoi diagrams. Discrete Comput. Geom., 44(2):281-307, 2010.
[9]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[10]
J. Chiquet, M. Mariadassous, and Stéphane Robin. Variational inference for probabilistic Poisson PCA. Annals of Applied Statistics (to appear), 2018.
[11]
D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (ELUs). In 4th ICLR, 2016.
[12]
M. Collins, S. DasGupta, and R. Schapire. A generalization of principal components analysis to the exponential family. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, NIPS*15, 2002.
[13]
J.-J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, and C. Barceló-Vidal. Isometric logratio transformations for compositional data analysis. Mathematical Geology Journal, 35:279-300, 2003.
[14]
G. Reid G. B. Gloor. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can J Microbiol., 12:1-12, 2016.
[15]
G. B. Gloor, J. R. Wu, V. Pawlowsky-Glahn, and J. J. Egozcue. It's all relative: analyzing microbiome data as compositions. Ann Epidemiol., 26:322-9, 2016.
[16]
M. Greenacre. Compositional Data Analysis in Practice. Chapman and Hall, New York, 2018.
[17]
L. Lahti, J. Salojärvi, A. Salonen, M. Scheffer, and W. M. de Vos. Tipping elements in the human intestinal ecosystem. Nat. Commun., 5:4344, 2014.
[18]
L. Lahti, S. Sudarshan, T. Blake, and J. Salojarvi. Microbiome R package. version 1.1.10012, 2017.
[19]
A. J. Landgraf. Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters. PhD thesis, Ohio State University, 2015.
[20]
D. Lovell, W. Müller, J. Taylor, A. Zwart, and C. Helliwell. Caution! compositions! can constraints on omics data lead analyses astray? Technical Report EP10994, CSIRO, 2010.
[21]
A.F. Andersson L.W. Hugerth. Analysing microbial community composition through amplicon sequencing: From sampling to hypothesis testing. Frontiers in Microbiology, 8:1561, 2017.
[22]
P. Maréchal. On a functional operation generating convex functions, part 1: duality. J. of Optimization Theory and Applications, 126:175-189, 2005.
[23]
P. Maréchal. On a functional operation generating convex functions, part 2: algebraic properties. J. of Optimization Theory and Applications, 126:375-366, 2005.
[24]
J. A. Martín-Fernández, V. Pawlowsky-Glahn, J. J. Egozcue, and R. Tolosona-Delgado. The statistical analysis of compositional data (with discussion). Math Geosci, 50:273-298, 2018.
[25]
R. Nock, A.-K. Menon, and C.-S. Ong. A scaled Bregman theorem with applications. In NIPS*29, pages 19-27, 2016.
[26]
R. Nock, F. Nielsen, and S.-I. Amari. On conformal divergences and their population minimizers. IEEE Trans. IT, 62:1-12, 2016.
[27]
V. Shankar O. Paliy. Application of multivariate statistical techniques in microbial ecology. Molecular ecology, 25:1032-57, 2016.
[28]
S. J. D. O'Keefe, J. V. Li, L. Lahti, J. Ou, F. Carbonero, K. Mohammed, J. M. Posma, J. Kinross, E. Wahl, E. Ruder, K. Vipperla, V. Naidoo, L. Mtshali, S. Tims, P. G. B. Puylaert, J. DeLany, A. Krasinskas, A. C. Benefiel, H. O. Kaseb, K. Newton, J. K. Nicholson, W. M. de Vos, H. R. Gaskins, and E. G. Zoetendal. Fat, fiber and cancer risk in African Americans and rural Africans. Nat. Commun., 6:6342, 2015.
[29]
V. Pawlowsky-Glahn and A. Buccianti. Compositional Data Analysis, theory and applications. Wiley, 2011.
[30]
Ke Sun and Stéphane Marchand-Maillet. An information geometry of statistical manifold learning. In 31st ICML, pages 1-9, 2014.
[31]
R. Tolosona-Delgado V. Pawlowsky-Glahn, J. J. Egozcue. Lecture notes on compositional data analysis. Technical report, Girona University, 2007.
[32]
L. van der Maaten and G. E. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579-2605, 2008.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems
December 2018
11021 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2018

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 672
    Total Downloads
  • Downloads (Last 12 months)121
  • Downloads (Last 6 weeks)10
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media