(p,q)-biclique counting and enumeration for large sparse bipartite graphs
In this paper, we study the problem of (p, q)-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite G = (U, V, E), and two integer parameters p and q, we aim to efficiently count and enumerate all (p, q)-bicliques in G, ...
Evaluating query languages and systems for high-energy physics data
In the domain of high-energy physics (HEP), query languages in general and SQL in particular have found limited acceptance. This is surprising since HEP data analysis matches the SQL model well: the data is fully structured and queried using mostly ...
Distributed hop-constrained s-t simple path enumeration at billion scale
Hop-constrained s-t simple path (HC-s-t path) enumeration is a fundamental problem in graph analysis and has received considerable attention recently. Straightforward distributed solutions are inefficient and suffer from poor scalabiltiy when addressing ...
ETO: accelerating optimization of DNN operators by high-performance tensor program reuse
Recently, deep neural networks (DNNs) have achieved great success in various applications, where low inference latency is important. Existing solutions either manually tune the kernel library or utilize search-based compilation to reduce the operator ...
Babelfish: efficient execution of polyglot queries
Today's users of data processing systems come from different domains, have different levels of expertise, and prefer different programming languages. As a result, analytical workload requirements shifted from relational to polyglot queries involving ...
Butterfly counting on uncertain bipartite graphs
When considering uncertain bipartite networks, the number of instances of the popular graphlet structure the butterfly may be used as an important metric to quickly gauge information about the network. This Uncertain Butterfly Count has practical usages ...
METRO: a generic graph neural network framework for multivariate time series forecasting
Multivariate time series forecasting has been drawing increasing attention due to its prevalent applications. It has been commonly assumed that leveraging latent dependencies between pairs of variables can enhance prediction accuracy. However, most ...
LargeEA: aligning entities for large-scale knowledge graphs
Entity alignment (EA) aims to find equivalent entities in different knowledge graphs (KGs). Current EA approaches suffer from scalability issues, limiting their usage in real-world EA scenarios. To tackle this challenge, we propose LargeEA to align ...
HVS: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search
Approximate nearest neighbor search (ANNS) is a fundamental problem that has a wide range of applications in information retrieval and data mining. Among state-of-the-art in-memory ANNS methods, graph-based methods have attracted particular interest ...
Origami: a high-performance mergesort framework
Mergesort is a popular algorithm for sorting real-world workloads as it is immune to data skewness, suitable for parallelization using vectorized intrinsics, and relatively simple to multi-thread. In this paper, we introduce Origami, an in-memory merge-...
Learning to be a statistician: learned estimator for number of distinct values
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems, such as columnstore compression and data profiling. In this work, we focus on how to derive accurate NDV estimations from random (online/offline) ...
ParChain: a framework for parallel hierarchical agglomerative clustering using nearest-neighbor chain
This paper studies the hierarchical clustering problem, where the goal is to produce a dendrogram that represents clusters at varying scales of a data set. We propose the ParChain framework for designing parallel hierarchical agglomerative clustering (...
Answering regular path queries through exemplars
Regular simple path query (RPQ) is one of the fundamental operators in graph analytics. In an RPQ, the input is a graph, a source node and a regular expression. The goal is to identify all nodes that are connected to the source through a simple path ...
HET: scaling out huge embedding model training via cache-enabled distributed framework
Embedding models have been an effective learning paradigm for high-dimensional data. However, one open issue of embedding models is that their representations (latent factors) often result in large parameter space. We observe that existing distributed ...
FINEdex: a fine-grained learned index scheme for scalable and concurrent memory systems
Index structures in memory systems become important to improve the entire system performance. The promising learned indexes leverage deep-learning models to complement existing index structures and obtain significant performance improvements. Existing ...
TaGSim: type-aware graph similarity learning and computation
Computing similarity between graphs is a fundamental and critical problem in graph-based applications, and one of the most commonly used graph similarity measures is graph edit distance (GED), defined as the minimum number of graph edit operations that ...
Analysis of influence contribution in social advertising
Online Social Network (OSN) providers usually conduct advertising campaigns by inserting social ads into promoted posts. Whenever a user engages in a promoted ad, she may further propagate the promoted ad to her followers recursively and the propagation ...
Scabbard: single-node fault-tolerant stream processing
Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O ...
Enabling personal consent in databases
Users have the right to consent to the use of their data, but current methods are limited to very coarse-grained expressions of consent, as "opt-in/opt-out" choices for certain uses. In this paper we identify the need for fine-grained consent management ...
Subjects
Currently Not Available