Correspondence
Published: 28 March 2014

A fair comparison

Paul I Costea¹,
Georg Zeller¹,
Shinichi Sunagawa¹ &
…
Peer Bork¹

Nature Methods volume 11, page 359 (2014)Cite this article

7813 Accesses
57 Citations
9 Altmetric
Metrics details

Subjects

To the Editor:

Recently, Paulson et al.¹ introduced a normalization method, reporting that it improves clustering of meta-genomic abundance data, which is very important for many applications in the fast-growing area of microbiome research. However, in our view, the perceived improvement is due to a postprocessing procedure that is preferentially combined with some, but not all, normalizations included in their method comparison, rather than to the proposed normalization itself.

Paulson et al.¹ compared their normalization method to three existing ones using a data set from a study of microbial communities in the mouse gut and concluded that their method, called cumulative-sum scaling (CSS), “substantially improved” the separation between two known clusters present in the data¹. As the authors kindly provided us with the source code, we were able to reproduce their first figure (Supplementary Fig. 1). However, this was possible only when we applied a logarithm transformation to the data normalized with their CSS method but not to the data normalized by the other methods. Combining the log transformation with each of the normalizations shows that differences in cluster separation are due mainly to this additional transformation and not to the normalization itself (Fig. 1). Thus, conceptually simpler methods, such as relative-abundance normalization (also called total-sum scaling (TSS)), should not be dismissed on these grounds.

**Figure 1: Clustering analysis of different normalization methods.**

To understand the large effect of the log transformation on this comparison, it is important to note that it is nonlinear, a feature that can fundamentally change the distribution of the data (skewing reduction, for example). Because the transformation is undefined for input values ≤0, one typically adds a small value (pseudocount) to non-negative input data to avoid log(0). However, owing to the nonlinearity of the log, this value also affects the transformation result (Supplementary Fig. 2). Paulson et al.¹ set the pseudocount to 1 as a way to preserve zero counts. However, as the four normalizations compared produce output values whose ranges differ by several orders of magnitude, the same pseudocount may not be optimal for all of them. It should instead be chosen to ensure a consistent treatment: for instance, by setting it to a value smaller than the minimum abundance value before transformation (Supplementary Fig. 2 and Supplementary Note).

Methodological improvements are crucial in highly complex fields such as metagenomics. We feel, however, that in a comparison of different approaches, it is important to minimize the potential confounding sources by ensuring equal treatment of all methods under study.

References

Paulson, J.N., Stine, O.C., Bravo, H.C. & Pop, M. Nat. Methods 10, 1200–1202 (2013).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

European Molecular Biology Laboratory, Heidelberg, Germany
Paul I Costea, Georg Zeller, Shinichi Sunagawa & Peer Bork

Authors

Paul I Costea
View author publications
You can also search for this author in PubMed Google Scholar
Georg Zeller
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Sunagawa
View author publications
You can also search for this author in PubMed Google Scholar
Peer Bork
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peer Bork.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

Supplementary Figures 1 and 2, and Supplementary Note (PDF 593 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Costea, P., Zeller, G., Sunagawa, S. et al. A fair comparison. Nat Methods 11, 359 (2014). https://doi.org/10.1038/nmeth.2897

Download citation

Published: 28 March 2014
Issue Date: April 2014
DOI: https://doi.org/10.1038/nmeth.2897

This article is cited by

Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures
- Huang Lin
- Shyamal Das Peddada
Nature Methods (2024)
PhyloMed: a phylogeny-based test of mediation effect in microbiome
- Qilin Hong
- Guanhua Chen
- Zheng-Zheng Tang
Genome Biology (2023)
Phylogeny-guided microbiome OTU-specific association test (POST)
- Caizhi Huang
- Benjamin J. Callahan
- Jung-Ying Tzeng
Microbiome (2022)
Machine learning and deep learning applications in microbiome research
- Ricardo Hernández Medina
- Svetlana Kutuzova
- Simon Rasmussen
ISME Communications (2022)
Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization
- Emily Goren
- Chong Wang
- Peng Liu
BMC Bioinformatics (2021)

A fair comparison

Subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

This article is cited by

Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures

PhyloMed: a phylogeny-based test of mediation effect in microbiome

Phylogeny-guided microbiome OTU-specific association test (POST)

Machine learning and deep learning applications in microbiome research

Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization

Reply to: "A fair comparison"

Search

Quick links

Subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures

PhyloMed: a phylogeny-based test of mediation effect in microbiome

Phylogeny-guided microbiome OTU-specific association test (POST)

Machine learning and deep learning applications in microbiome research

Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization

Search

Quick links