[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Skip header Section
The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern MatchingJuly 2008
Publisher:
  • Springer Publishing Company, Incorporated
ISBN:978-0-387-78908-8
Published:25 July 2008
Pages:
352
Skip Bibliometrics Section
Reflects downloads up to 31 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

The Burrows-Wheeler Transform is a text transformation scheme that has found applications in different aspects of the data explosion problem, from data compression to index structures and search. The BWT belongs to a new class of compression algorithms, distinguished by its ability to perform compression by sorted contexts. More recently, the BWT has also found various applications in addition to text data compression, such as in lossless and lossy image compression, tree-source identification, bioinformatics, machine translation, shape matching, and test data compression. This book will serve as a reference for seasoned professionals or researchers in the area, while providing a gentle introduction, making it accessible for senior undergraduate students or first year graduate students embarking upon research in compression, pattern matching, full text retrieval, compressed index structures, or other areas related to the BWT. Key Features Comprehensive resource for information related to different aspects of the Burrows-Wheeler Transform including: Gentle introduction to the BWT History of the development of the BWT Detailed theoretical analysis of algorithmic issues and performance limits Searching on BWT compressed data Hardware architectures for the BWT Explores non-traditional applications of the BWT in areas such as: Bioinformatics Joint source-channel coding Modern information retrieval Machine translation Test data compression for systems-on-chip Teaching materials ideal for classroom use on courses in: Data Compression and Source Coding Modern Information Retrieval Information Science Digital Libraries

Cited By

  1. ACM
    Clare A, Daykin J, Mills T and Zarges C Evolutionary search techniques for the Lyndon factorization of biosequences Proceedings of the Genetic and Evolutionary Computation Conference Companion, (1543-1550)
  2. Mondal U (2019). Achieving lossless compression of audio by encoding its constituted components (LCAEC), Innovations in Systems and Software Engineering, 15:1, (75-85), Online publication date: 1-Mar-2019.
  3. ACM
    Adjeroh D, Allaga M, Tan J, Lin J, Jiang Y, Abbasi A and Zhou X String-Based Models for Predicting RNA-Protein Interaction Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, (661-666)
  4. Daykin J, Groult R, Guesnet Y, Lecroq T, Lefebvre A, Léonard M and Prieur-Gaston É (2016). Binary block order Rouen Transform, Theoretical Computer Science, 656:PB, (118-134), Online publication date: 20-Dec-2016.
  5. Beal R and Adjeroh D (2016). Compressed parameterized pattern matching, Theoretical Computer Science, 609:P1, (129-142), Online publication date: 4-Jan-2016.
  6. Martinez H, Tarraga J, Medina I, Barrachina S, Castillo M, Dopazo J and Quintana-Orti E (2015). Concurrent and Accurate Short Read Mapping on Multicore Processors, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12:5, (995-1007), Online publication date: 1-Sep-2015.
  7. Crochemore M, Grossi R, Kärkkäinen J and Landau G (2015). Computing the Burrows-Wheeler transform in place and in small space, Journal of Discrete Algorithms, 32:C, (44-52), Online publication date: 1-May-2015.
  8. Kalajdzic K, Ali S and Patel A (2015). Rapid lossless compression of short text messages, Computer Standards & Interfaces, 37:C, (53-59), Online publication date: 1-Jan-2015.
  9. Beal R and Adjeroh D (2015). Efficient pattern matching for RNA secondary structures, Theoretical Computer Science, 592:C, (59-71), Online publication date: 9-Aug-2015.
  10. Chikhi R, Limasset A, Jackman S, Simpson J and Medvedev P On the Representation of de Bruijn Graphs Proceedings of the 18th Annual International Conference on Research in Computational Molecular Biology - Volume 8394, (35-55)
  11. Smyth W (2013). Computing regularities in strings, European Journal of Combinatorics, 34:1, (3-14), Online publication date: 1-Jan-2013.
  12. ACM
    Tan J and Adjeroh D Predicting Protein Families using Protein Shape Context Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, (733-734)
  13. ACM
    Beal R, Adjeroh D and Abbasi A The Forward Stem Matrix Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, (575-584)
  14. ACM
    Martínez H, Tárraga J, Medina I, Barrachina S, Castillo M, Dopazo J and Quintana-Ortí E A dynamic pipeline for RNA sequencing on multicore processors Proceedings of the 20th European MPI Users' Group Meeting, (235-240)
  15. ACM
    Alatabbi A, Crochemore M, Daykin J and Mouchard L Lyndon fountains and the Burrows-Wheeler transform Proceedings of the CUBE International Information Technology Conference, (441-446)
  16. Kärkkäinen J, Mikkola P and Kempa D Grammar precompression speeds up burrows---wheeler compression Proceedings of the 19th international conference on String Processing and Information Retrieval, (330-335)
  17. Cox A, Jakobi T, Rosone G and Schulz-Trieglaff O Comparing DNA sequence collections by direct comparison of compressed text indexes Proceedings of the 12th international conference on Algorithms in Bioinformatics, (214-224)
  18. Kimura K, Koike A and Nakai K Seed-set construction by equi-entropy partitioning for efficient and sensitive short-read mapping Proceedings of the 11th international conference on Algorithms in bioinformatics, (151-162)
  19. Bauer M, Cox A and Rosone G Lightweight BWT construction for very large string collections Proceedings of the 22nd annual conference on Combinatorial pattern matching, (219-231)
  20. ACM
    Nong G, Zhang S and Chan W (2011). Computing the inverse sort transform in linear time, ACM Transactions on Algorithms, 7:2, (1-16), Online publication date: 1-Mar-2011.
  21. Beal R and Adjeroh D p-Suffix sorting as arithmetic coding Proceedings of the 22nd international conference on Combinatorial Algorithms, (44-56)
  22. Beal R and Adjeroh D Parameterized longest previous factor Proceedings of the 22nd international conference on Combinatorial Algorithms, (31-43)
  23. Kärkkäinen J and Puglisi S Medium-space algorithms for inverse BWT Proceedings of the 18th annual European conference on Algorithms: Part I, (451-462)
  24. Ferragina P Data structures Proceedings of the 18th annual European conference on Algorithms: Part II, (1-16)
  25. Yokoo H Extension and faster implementation of the GRP transform for lossless compression Proceedings of the 21st annual conference on Combinatorial pattern matching, (338-347)
  26. Abel J (2010). Post BWT stages of the Burrows–Wheeler compression algorithm, Software—Practice & Experience, 40:9, (751-777), Online publication date: 1-Aug-2010.
Contributors
  • West Virginia University
  • University of Canterbury
  • University of Central Florida

Reviews

Stefan A Robila

The focus of the book, the Burrows-Wheeler transform (BWT), is an algorithm that was initially published in 1994, and used in data compression and pattern matching. This lossless block transform was originally introduced for text processing. It doesn't change any of the characters; it only permutes their order. BWT is a sequence of two steps: first, it generates all possible circular shifts of the block; next, it sorts all these rotations and forms the transform result from the last symbol of each block. The strength of this approach is that if the original data has several substrings that occur often, then the BWT result will have several places where a single character is repeated multiple times in a row, increasing its appeal for data compression. At first sight, writing such a large book on quite a narrow topic seems excessive. However, the authors claim that the text constitutes a thorough discussion of the BWT transform, providing a reference volume for "seasoned professionals or researchers," as well as a learning tool for students conducting research in the field. Indeed, apart from classic and current results in the field, the book includes ample background sections that allow the reader a comprehensive understanding. The authors come from three distinct institutions, but share a strong background in text compression that allows for a seamless integration of their respective contributions to this work. Major strengths of the book are the clear writing style and the easy-to-follow flow of the material. In a deep-learning approach, the text starts with a trivial but extremely illustrative example of the transform, allowing the reader to understand the basic principles after only a few pages. The first chapter also includes an explanation of how to reconstruct the original string, biographies of Burrows and Wheeler (the transform authors), and a concise description of the book's organization. Chapter 2 provides a detailed explanation of how the BWT transform works and how it can be efficiently implemented. In the third chapter, coding principles are explained. Since it is lossless, the BWT does not reduce the size of the text. In order to be employed for compression, it needs to be followed by a second phase that codes the repetitive text that is being generated. Various other approaches are discussed, such as entropy coding, move-to-front lists, inversion frequencies, distance coding, and wavelet trees. The next three chapters focus on the theoretical analysis of the transform and its variants. Chapter 4 discusses suffix trees and suffix arrays that, as precursors of the BWT transform, allow the authors to discuss the technique in similar terms. Chapter 5 provides the justification for the similarity between BWT and suffix trees, and discusses the complexity of the algorithm from the point of view of space and time, as well as its performance for compression use. Although relatively recently introduced, BWT variants have already been designed; the authors discuss them in the sixth chapter. Chapters 7 and 8 focus on beyond-compression use of the technique. Text pattern matching is one such area, discussed in detail in a separate chapter, followed by various other applications, such as full-text indexing, DNA sequence compression, genome annotation, test data compression, image compression and analysis, and channel coding. Most refreshing is the unbiased analysis of how efficient the BWT use is in each case-sometimes indicating that the experimental results suggest that the BWT is not appropriate. The book ends with a concluding chapter. The value of this book resides in its thoroughness and its cross-disciplinary appeal. On one hand, a student or faculty reader will discover theoretical topics related to information theory and pattern recognition. On the other, computational scientists will be attracted to the many applications of the BWT transform. All in all, it is a worthwhile lecture. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations