The BWT (Burrows Wheeler Transform) has long fascinated people for its ability to capture complex correlations with a very simple inverse transform. Unfortunately despite that inverse transform being very simple, it is also slow. I will briefly review the inverse BWT (i-BWT) and then look at ways to speed it up. Jump to the end for the punch line : speed results and source code. Let's briefly revi
I'm very interested in what types of interesting data structures are out there HN. Totally your preference.I'll start: bloom filters. Lets you test if a value is definitely NOT in a list of pre-stored values (or POSSIBLY in a list - with adjustable probability that influences storage of the values.) Good use-case: routing. Say you have a list of 1 million IPs that are black listed. A trivial algor
Consider a situation where we have a set of intervals and we need following operations to be implemented efficiently. 1) Add an interval 2) Remove an interval 3) Given an interval x, find if x overlaps with any of the existing intervals. Interval Tree: The idea is to augment a self-balancing Binary Search Tree (BST) like Red Black Tree, AVL Tree, etc to maintain set of intervals so that all operat
A wavelet tree on the string "abracadabra". At each node the symbols of the string are projected onto two partitions of the alphabet, and a bitvector denotes to which partition each symbol belongs. Note that only the bitvectors are stored; the strings in the nodes are only for illustratory purposes. The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes
TL;DR; We are changing std::sort in LLVM’s libcxx. That’s a long story of what it took us to get there and all possible consequences, bugs you might encounter with examples from open source. We provide some benchmarks, perspective, why we did this in the first place and what it cost us with exciting ideas from Hyrum’s Law to reinforcement learning. All changes went into open source and thus I can
This is an upcoming high performance computing book titled “Algorithms for Modern Hardware” by Sergey Slotin. Its intended audience is everyone from performance engineers and practical algorithm researchers to undergraduate computer science students who have just finished an advanced algorithms course and want to learn more practical ways to speed up a program than by going from $O(n \log n)$ to $
What the research is: The Ribbon filter is a new data structure that is more space-efficient than the popular Bloom filters that are widely used for optimizing data retrieval. One of the ways that Bloom, and now Ribbon, filters solve real engineering problems is by providing smooth configurability unmatched by other filters. Bloom filters work by overapproximating a set of keys associated with som
Posted by Sergiu Ciumac on June 12, 2020 · 23 mins read Audio Fingerprinting I have been developing the SoundFingerprinting open source project for the last ten years. One of the questions I often receive is “how does music recognition works?” For the library users, it is somewhat similar to a one-way hash function. You provide a file at the input, and after a certain number of conversions, you ge
ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code implements [1, 2], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on an
はじめに 本書は,筆者が長年書き溜めた様々な実務的な最適化問題についてまとめたものである. 本書は,Jupyter Laboで記述されたものを自動的に変換したものであり,以下のサポートページで公開している. コードも一部公開しているが,ソースコードを保管した Github 自体はプライベートである. 本を購入した人は,サポートページで公開していないプログラムを 圧縮ファイル でダウンロードすることができる. ダウンロードしたファイルの解凍パスワードは<本に記述>である. 作者のページ My HP 本書のサポートページ Support Page 出版社のページ Pythonによる実務で役立つ最適化問題100+ (1) ―グラフ理論と組合せ最適化への招待― Pythonによる実務で役立つ最適化問題100+ (2) ―割当・施設配置・在庫最適化・巡回セールスマン― Pythonによる実務で役立つ
crcany is a suite of programs that generalize CRC calculations, and that generate C code to compute and combine CRCs efficiently. Any CRC can be computed given the set of parameters that describe it. Those parameters are provided in the form as used by Greg Cook's catalog of over one-hundred CRCs, found at https://reveng.sourceforge.io/crc-catalogue/all.htm . That set of parameters were first defi
It turns out that finite state machines are useful for things other than expressing computation. Finite state machines can also be used to compactly represent ordered sets or maps of strings that can be searched very quickly. In this article, I will teach you about finite state machines as a data structure for representing ordered sets and maps. This includes introducing an implementation written
Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound. We introduce the Ri
A software library of stochastic streaming algorithms "A truly excellent example of theoretically-informed algorithm engineering" -- Graham Cormode The Business Challenge: Analyzing Big Data Quickly. In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles,
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く