[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Takagi et al., 2017 - Google Patents

Linear-size CDAWG: New repetition-aware indexing and grammar compression

Takagi et al., 2017

View PDF
Document ID
8925712751487582936
Author
Takagi T
Goto K
Fujishige Y
Inenaga S
Arimura H
Publication year
Publication venue
International Symposium on String Processing and Information Retrieval

External Links

Snippet

In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O (̃ e _T\log n) bits …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • G06F17/30625Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30946Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30964Querying
    • G06F17/30979Query processing
    • G06F17/30985Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30908Information retrieval; Database structures therefor; File system structures therefor of semistructured data, the undelying structure being taken into account, e.g. mark-up language structure data
    • G06F17/30914Mapping or conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • HELECTRICITY
    • H03BASIC ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Similar Documents

Publication Publication Date Title
Belazzougui et al. Composite repetition-aware data structures
Policriti et al. LZ77 computation based on the run-length encoded BWT
JP4805315B2 (en) Computer representation by data structure and related encoding / decoding method
Takagi et al. Linear-size CDAWG: New repetition-aware indexing and grammar compression
Beller et al. Computing the longest common prefix array based on the Burrows–Wheeler transform
Claude et al. Universal indexes for highly repetitive document collections
Giuliani et al. Novel results on the number of runs of the Burrows-Wheeler-Transform
Belazzougui et al. Fully dynamic de Bruijn graphs
Belazzougui et al. Fast label extraction in the CDAWG
Navarro et al. On stricter reachable repetitiveness measures
Mieno et al. Minimal unique substrings and minimal absent words in a sliding window
Jansson et al. Linked dynamic tries with applications to LZ-compression in sublinear time and space
Frosini et al. Logarithmic equal-letter runs for BWT of purely morphic words
Bille et al. Compressed subsequence matching and packed tree coloring
Manzini XBWT tricks
Akagi et al. Grammar index by induced suffix sorting
Mishra et al. Fast pattern matching in compressed text using wavelet tree
JP5169456B2 (en) Document search system, document search method, and document search program
Prezza et al. Faster online computation of the succinct longest previous factor array
Badkobeh et al. On two LZ78-style grammars: Compression bounds and compressed-space computation
Inenaga et al. Computing minimal absent words and extended bispecial factors with CDAWG space
Bille et al. Finger search in grammar-compressed strings
Bonizzoni et al. Divide and conquer computation of the multi-string BWT and LCP array
Inenaga Suffix trees, DAWGs and CDAWGs for forward and backward tries
Belazzougui et al. Flexible indexing of repetitive collections