[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2506583.2506684acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

Bacterial pan-genomes: data representation and analysis

Published: 22 September 2013 Publication History

Abstract

Bacterial genomes at NCBI represent a large collection of strains with different levels of sequence and assembly quality as well as sampling density. Among these, there are densely-sampled sets of related genomes, usually human pathogens, whose organization and protein content could be directly analyzed within the concept of pan-genome. Even in groups of close genomes, protein families appear with very different frequencies, with "core proteins" at one end and "dispensable proteins" at another and "accessory proteins" in between.
In order to organize genomes available in the NCBI repositories in related groups (species-level clades), we use a distance method based on a robust distance between sets of ribosomal proteins. The threshold is selected to have one species per clade in most of the cases, with some clades containing genomes from a few species. Within each clade, we then build trees based on similarity of protein content using hierarchical clustering with tight parameters.
In order to identify protein families for genomes within a clade accurately and reliably, we use a combined approach taking into account both sequence similarity and genome context: First, proteins are clustered in tentative clusters using inclusive parameters. Then, within each of tentative clusters, local genome context and protein phylogenetic tree are used to separate paralogs. The combined approach allows defining core and conservative clusters for the pan-genome more accurately than by sequence-based clustering. For computational efficiency, protein redundancy and near-redundancy is eliminated, with one representative sequence from each near-redundant group used.

Index Terms

  1. Bacterial pan-genomes: data representation and analysis

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
      September 2013
      987 pages
      ISBN:9781450324342
      DOI:10.1145/2506583
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 September 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Bacterial
      2. clustering
      3. computational
      4. core clusters
      5. genomics
      6. indexing
      7. infrastructure
      8. orthologs
      9. pangenome
      10. paralogs
      11. pathogens
      12. protein clusters

      Qualifiers

      • Tutorial
      • Research
      • Refereed limited

      Conference

      BCB'13
      Sponsor:
      BCB'13: ACM-BCB2013
      September 22 - 25, 2013
      Wshington DC, USA

      Acceptance Rates

      BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%;
      Overall Acceptance Rate 254 of 885 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 68
        Total Downloads
      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 31 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media