To the Editor:
Recently, Paulson et al.1 introduced a normalization method, reporting that it improves clustering of meta-genomic abundance data, which is very important for many applications in the fast-growing area of microbiome research. However, in our view, the perceived improvement is due to a postprocessing procedure that is preferentially combined with some, but not all, normalizations included in their method comparison, rather than to the proposed normalization itself.
Paulson et al.1 compared their normalization method to three existing ones using a data set from a study of microbial communities in the mouse gut and concluded that their method, called cumulative-sum scaling (CSS), “substantially improved” the separation between two known clusters present in the data1. As the authors kindly provided us with the source code, we were able to reproduce their first figure (Supplementary Fig. 1). However, this was possible only when we applied a logarithm transformation to the data normalized with their CSS method but not to the data normalized by the other methods. Combining the log transformation with each of the normalizations shows that differences in cluster separation are due mainly to this additional transformation and not to the normalization itself (Fig. 1). Thus, conceptually simpler methods, such as relative-abundance normalization (also called total-sum scaling (TSS)), should not be dismissed on these grounds.
To understand the large effect of the log transformation on this comparison, it is important to note that it is nonlinear, a feature that can fundamentally change the distribution of the data (skewing reduction, for example). Because the transformation is undefined for input values ≤0, one typically adds a small value (pseudocount) to non-negative input data to avoid log(0). However, owing to the nonlinearity of the log, this value also affects the transformation result (Supplementary Fig. 2). Paulson et al.1 set the pseudocount to 1 as a way to preserve zero counts. However, as the four normalizations compared produce output values whose ranges differ by several orders of magnitude, the same pseudocount may not be optimal for all of them. It should instead be chosen to ensure a consistent treatment: for instance, by setting it to a value smaller than the minimum abundance value before transformation (Supplementary Fig. 2 and Supplementary Note).
Methodological improvements are crucial in highly complex fields such as metagenomics. We feel, however, that in a comparison of different approaches, it is important to minimize the potential confounding sources by ensuring equal treatment of all methods under study.
References
Paulson, J.N., Stine, O.C., Bravo, H.C. & Pop, M. Nat. Methods 10, 1200–1202 (2013).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Figures 1 and 2, and Supplementary Note (PDF 593 kb)
Rights and permissions
About this article
Cite this article
Costea, P., Zeller, G., Sunagawa, S. et al. A fair comparison. Nat Methods 11, 359 (2014). https://doi.org/10.1038/nmeth.2897
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.2897
This article is cited by
-
Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures
Nature Methods (2024)
-
PhyloMed: a phylogeny-based test of mediation effect in microbiome
Genome Biology (2023)
-
Phylogeny-guided microbiome OTU-specific association test (POST)
Microbiome (2022)
-
Machine learning and deep learning applications in microbiome research
ISME Communications (2022)
-
Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization
BMC Bioinformatics (2021)