[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Distance-based linkage of personal microbiome records for identification and its privacy implications

Published: 01 February 2024 Publication History

Abstract

Due to its high potential for analysis in clinical settings, research on the human microbiome has been flourishing for several years. As an increasing amount of data on the microbiome is gathered and stored, analysing the temporal and individual stability of microbiome readings, and the succeeding privacy risks, has gained importance. In 2015, Franzosa et al. demonstrated the feasibility of matching and linking individuals in microbiome-based datasets from the Human Microbiome Project, which could lead to re-identification of individuals, and thus poses privacy implications for microbiome study designs. Their technique is based on the construction of body site-specific metagenomic codes that maintain a certain stability over time.
In this paper, we establish a distance-based technique for personal microbiome identification, which is combined with a solution for avoiding spurious, false positive matches. In a direct comparison with the approach from Franzosa et al., which assumes that information is available as microbial records, rather than at the more detailed (but less likely to be shared) nucleic acid level, our method improves upon the identification results on most of the considered datasets. Our main finding is an increase of the average percentage of true positive identifications of 30% on the widely studied microbiome of the gastrointestinal tract. While we particularly recommend our method for application on the gut microbiome, we also observed substantial identification success on other body sites. Our results demonstrate the potential of privacy threats in microbiome data gathering, storage, sharing, and analysis, and thus underline the need for solutions to protect the microbiome as personal and sensitive medical data. We also show that the method is robust to various hyper-parameter settings.
Based on our observations, we further identify challenges in personal microbiome identification research, specifically, the scarcity of benchmark data and associated data analysis tasks. Based on our experience, we propose solutions for a more systematic and comparable evaluation, considering also aspects of costs entailed with applying privacy-preserving methods.

References

[1]
E.K. Costello, C.L. Lauber, M. Hamady, N. Fierer, J.I. Gordon, R. Knight, Bacterial community variation in human body habitats across space and time, Science 326 (2009) 1694–1697,. https://www.science.org/doi/10.1126/science.1177486.
[2]
E. Distrutti, L. Monaldi, P. Ricci, S. Fiorucci, Gut microbiota role in irritable bowel syndrome: new therapeutic strategies, World J. Gastroenterol. 22 (2016) 2219–2241,. http://www.wjgnet.com/1007-9327/full/v22/i7/2219.htm.
[3]
J. Domingo-Ferrer, J. Mateo-Sanz, Practical data-oriented microaggregation for statistical disclosure control, IEEE Trans. Knowl. Data Eng. 14 (2002) 189–201,. http://ieeexplore.ieee.org/document/979982/.
[4]
H.L. Dunn, Record linkage, Am. J. Public Health Nation's Health 36 (1946) 1412–1416,. https://ajph.aphapublications.org/doi/full/10.2105/AJPH.36.12.1412.
[5]
A.K. Elmagarmid, P.G. Ipeirotis, V.S. Verykios, Duplicate record detection: a survey, IEEE Trans. Knowl. Data Eng. 19 (2007) 1–16,. http://ieeexplore.ieee.org/document/4016511/.
[6]
I.P. Fellegi, A.B. Sunter, A theory for record linkage, J. Am. Stat. Assoc. 64 (1969) 1183–1210,. http://www.tandfonline.com/doi/abs/10.1080/01621459.1969.10501049.
[7]
N. Fierer, M. Hamady, C.L. Lauber, R. Knight, The influence of sex, handedness, and washing on the diversity of hand surface bacteria, Proc. Natl. Acad. Sci. 105 (2008) 17994–17999,. https://pnas.org/doi/full/10.1073/pnas.0807920105.
[8]
E.A. Franzosa, K. Huang, J.F. Meadow, D. Gevers, K.P. Lemon, B.J.M. Bohannan, C. Huttenhower, Identifying personal microbiomes using metagenomic codes, Proc. Natl. Acad. Sci. 112 (2015) https://pnas.org/doi/full/10.1073/pnas.1423854112.
[9]
B.C.M. Fung, K. Wang, R. Chen, P.S. Yu, Privacy-preserving data publishing: a survey of recent developments, ACM Comput. Surv. 42 (2010) 1–53,. https://dl.acm.org/doi/10.1145/1749603.1749605.
[10]
E.A. Grice, H.H. Kong, S. Conlan, C.B. Deming, J. Davis, A.C. Young, Comparative Sequencing Program, G.G Bouffard, R.W. Blakesley, P.R. Murray, E.D. Green, M.L. Turner, J.A. Segre, Topographical and temporal diversity of the human skin microbiome, Science 324 (2009) 1190–1192,. https://www.science.org/doi/10.1126/science.1171700.
[11]
Q. He, X. Niu, R.Q. Qi, M. Liu, Advances in microbial metagenomics and artificial intelligence analysis in forensic identification, Front. Microbiol. 13 (2022),. https://www.frontiersin.org/articles/10.3389/fmicb.2022.1046733/full.
[12]
M. Hittmeir, R. Mayer, A. Ekelhart, A baseline for attribute disclosure risk in synthetic data, in: ACM Conference on Data and Application Security and Privacy, ACM, New Orleans LA USA, 2020, pp. 133–143,.
[13]
M. Hittmeir, R. Mayer, A. Ekelhart, Distance-based techniques for personal microbiome identification, in: Proceedings of the 17th International Conference on Availability, Reliability and Security, ACM, Vienna, Austria, 2022, pp. 1–13,. https://dl.acm.org/doi/10.1145/3538969.3538985.
[14]
M. Hittmeir, R. Mayer, A. Ekelhart, Utility and privacy assessment of synthetic microbiome data, in: Data and Applications Security and Privacy XXXVI, Springer International Publishing, Cham, 2022, pp. 15–27,.
[15]
R.E. Ley, P.J. Turnbaugh, S. Klein, J.I. Gordon, Human gut microbes associated with obesity, Nature 444 (2006) 1022–1023,. https://www.nature.com/articles/4441022a.
[16]
G. Li, Y. Wang, X. Su, Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices, Comput. Methods Programs Biomed. 108 (2012) 1–9,. https://linkinghub.elsevier.com/retrieve/pii/S0169260711000459.
[17]
Z. Lin, A.B. Owen, R.B. Altman, Genomic research and human subject privacy, Science 305 (2004),. 183–183 https://www.science.org/doi/10.1126/science.1095019.
[18]
M. Llugiqi, R. Mayer, An empirical analysis of synthetic-data-based anomaly detection, in: International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Springer International Publishing, Vienna, Austria, 2022,.
[19]
W.W. Lowrance, F.S. Collins, Identifiability in genomic research, Science 317 (2007) 600–602,. https://www.science.org/doi/10.1126/science.1147699.
[20]
B.A. Malin, Protecting genomic sequence anonymity with generalization lattices, Methods Inf. Med. 44 (2005) 687–692.
[21]
R. Mayer, A. Karlowicz, M. Hittmeir, K-anonymity on metagenomic features in microbiome databases, in: Proceedings of the 18th International Conference on Availability, Reliability and Security, ACM, Benevento, Italy, 2023, pp. 1–11,. https://dl.acm.org/doi/10.1145/3600160.3600178.
[22]
Q. McNemar, Note on the sampling error of the difference between correlated proportions or percentages, Psychometrika 12 (1947) 153–157,. http://link.springer.com/10.1007/BF02295996.
[23]
MetaHIT Consortium (additional members), M. Arumugam, J. Raes, E. Pelletier, D. Le Paslier, T. Yamada, D.R. Mende, G.R. Fernandes, J. Tap, T. Bruls, J.M. Batto, M. Bertalan, N. Borruel, F. Casellas, L. Fernandez, L. Gautier, T. Hansen, M. Hattori, T. Hayashi, M. Kleerebezem, K. Kurokawa, M. Leclerc, F. Levenez, C. Manichanh, H.B. Nielsen, T. Nielsen, N. Pons, J. Poulain, J. Qin, T. Sicheritz-Ponten, S. Tims, D. Torrents, E. Ugarte, E.G. Zoetendal, J. Wang, F. Guarner, O. Pedersen, W.M. De Vos, S. Brunak, J. Doré, J. Weissenbach, S.D. Ehrlich, P. Bork, Enterotypes of the human gut microbiome, Nature 473 (2011) 174–180,. https://www.nature.com/articles/nature09944.
[24]
A. Meyerson, R. Williams, On the complexity of optimal K-anonymity, in: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems - PODS '04, ACM Press, Paris, France, 2004, p. 223,. http://portal.acm.org/citation.cfm?doid=1055558.1055591.
[25]
G. Musso, R. Gambino, M. Cassader, Obesity, diabetes, and gut microbiota, Diabetes Care 33 (2010) 2277–2284,. https://diabetesjournals.org/care/article/33/10/2277/28104/Obesity-Diabetes-and-Gut-MicrobiotaThe-hygiene.
[26]
A. Narayanan, V. Shmatikov, Robust de-anonymization of large sparse datasets, in: 2008 IEEE Symposium on Security and PrivacyIEEE Symposium on Security and Privacy (sp 2008), (ISSN ) IEEE, Oakland, CA, USA, 2008, pp. 111–125,. http://ieeexplore.ieee.org/document/4531148/.
[27]
N. Patki, R. Wedge, K. Veeramachaneni, The synthetic data vault, in: IEEE International Conference on Data Science and Advanced Analytics, IEEE, Montreal, QC, Canada, 2016, pp. 399–410,.
[28]
H. Ping, J. Stoyanovich, B. Howe, DataSynthesizer: privacy-preserving synthetic datasets, in: International Conference on Scientific and Statistical Database Management, ACM, Chicago IL USA, 2017, pp. 1–5,.
[29]
F. Prasser, F. Kohlmayer, R. Lautenschläger, K.A. Kuhn, ARX - a comprehensive tool for anonymizing biomedical data, in: AMIA Annual Symposium Proceedings 2014, 2014, pp. 984–993.
[30]
G.B. Rogers, D.J. Keating, R.L. Young, M.L. Wong, J. Licinio, S. Wesselingh, From gut dysbiosis to altered brain function and mental illness: mechanisms and pathways, Mol. Psychiatry 21 (2016) 738–748,. https://www.nature.com/articles/mp201650.
[31]
S.E. Schmedes, A.E. Woerner, N.M. Novroski, F.R. Wendt, J.L. King, K.M. Stephens, B. Budowle, Targeted sequencing of clade-specific markers from skin microbiomes for forensic human identification, Forensic Sci. Int. Genet. 32 (2018) 50–61,. https://linkinghub.elsevier.com/retrieve/pii/S1872497317302107.
[32]
A.J. Sherier, A.E. Woerner, B. Budowle, Population informative markers selected using Wright's fixation index and machine learning improves human identification using the skin microbiome, Appl. Environ. Microbiol. 87 (2021),. https://journals.asm.org/doi/10.1128/AEM.01208-21.
[33]
A.J. Sherier, A.E. Woerner, B. Budowle, Determining informative microbial single nucleotide polymorphisms for human identification, Appl. Environ. Microbiol. 88 (2022),. https://journals.asm.org/doi/10.1128/aem.00052-22.
[34]
L. Sweeney, Achieving k-anonymity privacy protection using generalization and suppression, Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10 (2002) 571–588,.
[35]
Sweeney, L.; Abu, A.; Winn, J. (2013): Identifying participants in the personal genome project by name (a re-identification experiment). arXiv:1304.7605 [cs].
[36]
The Human Microbiome Project Consortium, Structure, function and diversity of the healthy human microbiome, Nature 486 (2012) 207–214,. https://www.nature.com/articles/nature11234.
[37]
The NIH HMP Working Group, J. Peterson, S. Garges, M. Giovanni, P. McInnes, L. Wang, J.A. Schloss, V. Bonazzi, J.E. McEwen, K.A. Wetterstrand, C. Deal, C.C. Baker, V. Di Francesco, T.K. Howcroft, R.W. Karp, R.D. Lunsford, C.R. Wellington, T. Belachew, M. Wright, C. Giblin, H. David, M. Mills, R. Salomon, C. Mullins, B. Akolkar, L. Begg, C. Davis, L. Grandison, M. Humble, J. Khalsa, A.R. Little, H. Peavy, C. Pontzer, M. Portnoy, M.H. Sayre, P. Starke-Reed, S. Zakhari, J. Read, B. Watson, M. Guyer, The NIH human microbiome project, Genome Res. 19 (2009) 2317–2323,. http://genome.cshlp.org/lookup/doi/10.1101/gr.096651.109.
[38]
P. Vangay, B.M. Hillmann, D. Knights, Microbiome learning repo (ML repo): a public repository of microbiome regression and classification tasks, GigaScience 8 (2019),. https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giz042/5481665.
[39]
J. Wagner, J.N. Paulson, X. Wang, B. Bhattacharjee, H. Corrada Bravo, Privacy-preserving microbiome analysis using secure computation, Bioinformatics 32 (2016) 1873–1879,. https://academic.oup.com/bioinformatics/article/32/12/1873/1743948.
[40]
Z. Wang, H. Lou, Y. Wang, R. Shamir, R. Jiang, T. Chen, GePMI: a statistical model for personal intestinal microbiome identification, NPJ Biofilms Microbiomes 4 (2018) 20,. https://www.nature.com/articles/s41522-018-0065-2.
[41]
H. Watanabe, I. Nakamura, S. Mizutani, Y. Kurokawa, H. Mori, K. Kurokawa, T. Yamada, Minor taxa in human skin microbiome contribute to the personal identification, PLoS ONE 13 (2018),. https://dx.plos.org/10.1371/journal.pone.0199947.
[42]
D.R. Wilson, Beyond probabilistic record linkage: using neural networks and complex features to improve genealogical record linkage, in: The 2011 International Joint Conference on Neural Networks, IEEE, San Jose, CA, USA, 2011, pp. 9–14,. http://ieeexplore.ieee.org/document/6033192/.
[43]
A.E. Woerner, N.M. Novroski, F.R. Wendt, A. Ambers, R. Wiley, S.E. Schmedes, B. Budowle, Forensic human identification with targeted microbiome markers using nearest neighbor classification, Forensic Sci. Int. Genet. 38 (2019) 130–139,. https://linkinghub.elsevier.com/retrieve/pii/S1872497318303739.
[44]
J. Yang, T. Tsukimi, M. Yoshikawa, K. Suzuki, T. Takeda, M. Tomita, S. Fukuda, Cutibacterium acnes (propionibacterium acnes) 16S rRNA genotyping of microbial samples from possessions contributes to owner identification, mSystems 4 (2019),. https://journals.asm.org/doi/10.1128/mSystems.00594-19.
[45]
T. Yatsunenko, F.E. Rey, M.J. Manary, I. Trehan, M.G. Dominguez-Bello, M. Contreras, M. Magris, G. Hidalgo, R.N. Baldassano, A.P. Anokhin, A.C. Heath, B. Warner, J. Reeder, J. Kuczynski, J.G. Caporaso, C.A. Lozupone, C. Lauber, J.C. Clemente, D. Knights, R. Knight, J.I. Gordon, Human gut microbiome viewed across age and geography, Nature 486 (2012) 222–227,. https://www.nature.com/articles/nature11053.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computers and Security
Computers and Security  Volume 136, Issue C
Jan 2024
668 pages

Publisher

Elsevier Advanced Technology Publications

United Kingdom

Publication History

Published: 01 February 2024

Author Tags

  1. Human microbiome
  2. Data privacy
  3. Re-identification
  4. Record linkage
  5. Mitigation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media