Takagi et al., 2017 - Google Patents
Linear-size CDAWG: New repetition-aware indexing and grammar compressionTakagi et al., 2017
View PDF- Document ID
- 8925712751487582936
- Author
- Takagi T
- Goto K
- Fujishige Y
- Inenaga S
- Arimura H
- Publication year
- Publication venue
- International Symposium on String Processing and Information Retrieval
External Links
Snippet
In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O (̃ e _T\log n) bits …
- 238000007906 compression 0 title abstract description 10
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
- G06F17/30625—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30964—Querying
- G06F17/30979—Query processing
- G06F17/30985—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30908—Information retrieval; Database structures therefor; File system structures therefor of semistructured data, the undelying structure being taken into account, e.g. mark-up language structure data
- G06F17/30914—Mapping or conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
-
- H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Belazzougui et al. | Composite repetition-aware data structures | |
Policriti et al. | LZ77 computation based on the run-length encoded BWT | |
JP4805315B2 (en) | Computer representation by data structure and related encoding / decoding method | |
Takagi et al. | Linear-size CDAWG: New repetition-aware indexing and grammar compression | |
Beller et al. | Computing the longest common prefix array based on the Burrows–Wheeler transform | |
Claude et al. | Universal indexes for highly repetitive document collections | |
Giuliani et al. | Novel results on the number of runs of the Burrows-Wheeler-Transform | |
Belazzougui et al. | Fully dynamic de Bruijn graphs | |
Belazzougui et al. | Fast label extraction in the CDAWG | |
Navarro et al. | On stricter reachable repetitiveness measures | |
Mieno et al. | Minimal unique substrings and minimal absent words in a sliding window | |
Jansson et al. | Linked dynamic tries with applications to LZ-compression in sublinear time and space | |
Frosini et al. | Logarithmic equal-letter runs for BWT of purely morphic words | |
Bille et al. | Compressed subsequence matching and packed tree coloring | |
Manzini | XBWT tricks | |
Akagi et al. | Grammar index by induced suffix sorting | |
Mishra et al. | Fast pattern matching in compressed text using wavelet tree | |
JP5169456B2 (en) | Document search system, document search method, and document search program | |
Prezza et al. | Faster online computation of the succinct longest previous factor array | |
Badkobeh et al. | On two LZ78-style grammars: Compression bounds and compressed-space computation | |
Inenaga et al. | Computing minimal absent words and extended bispecial factors with CDAWG space | |
Bille et al. | Finger search in grammar-compressed strings | |
Bonizzoni et al. | Divide and conquer computation of the multi-string BWT and LCP array | |
Inenaga | Suffix trees, DAWGs and CDAWGs for forward and backward tries | |
Belazzougui et al. | Flexible indexing of repetitive collections |