[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Volume 16, Issue 11July 2023
Reflects downloads up to 30 Dec 2024Bibliometrics
research-article
A Two-Level Signature Scheme for Stable Set Similarity Joins

We study the set similarity join problem, which retrieves all pairs of similar sets from two collections of sets for a given distance function. Existing exact solutions employ a signature-based filter-verification framework: If two sets are similar, ...

Scalable Reasoning on Document Stores via Instance-Aware Query Rewriting

Data trees, typically encoded in JSON, are ubiquitous in data-driven applications. This ubiquity makes urgent the development of novel techniques for querying heterogeneous JSON data in a flexible manner. We propose a rule language for JSON, called ...

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions

We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over videos from limited user interactions. The user only provides a handful of positive and negative examples of what they are looking for. EQUI-VOCAL utilizes these initial ...

Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines

Recent advances in Graph Neural Networks (GNNs) have changed the landscape of modern graph analytics. The complexity of GNN training and the scalability challenges have also sparked interest from the systems community, with efforts to build systems that ...

Epoxy: ACID Transactions across Diverse Data Stores

Developers are increasingly building applications that incorporate multiple data stores, for example to manage heterogeneous data. Often, these require transactional safety for operations across stores, but few systems support such guarantees. To solve ...

Analyzing Vectorized Hash Tables across CPU Architectures

Data processing systems often leverage vector instructions to achieve higher performance. When applying vector instructions, an often overlooked data structure is the hash table, even though it is fundamental in data processing systems for operations ...

Exploiting Cloud Object Storage for High-Performance Analytics

Elasticity of compute and storage is crucial for analytical cloud database systems. All cloud vendors provide disaggregated object stores, which can be used as storage backend for analytical query engines. Until recently, local storage was unavoidable ...

A Randomized Blocking Structure for Streaming Record Linkage

A huge amount of data, in terms of streams, are collected nowadays via a variety of sources, such as sensors, mobile devices, or even raw log files. The unprecedented rate at which these data are generated and collected calls for novel record linkage ...

REmatch: A Novel Regex Engine for Finding All Matches

In this paper, we present the REmatch system for information extraction. REmatch is based on a recently proposed enumeration algorithm for evaluating regular expressions with capture variables supporting the all-match semantics. It tells a story of what ...

ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning

The performance of worst-case optimal join algorithms depends on the order in which the join attributes are processed. Selecting good orders before query execution is hard, due to the large space of possible orders and unreliable execution cost ...

Triangular Stability Maximization by Influence Spread over Social Networks

In many real-world applications such as social network analysis and online advertising/marketing, one of the most important and popular problems is called influence maximization (IM), which finds a set of k seed users that maximize the expected number ...

CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space

Median absolute deviation (MAD), the median of the absolute deviations from the median, has been found useful in various applications such as outlier detection. Together with median, MAD is more robust to abnormal data than mean and standard deviation (...

Fast Search-by-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests

The vast amounts of data collected in various domains pose great challenges to modern data exploration and analysis. To find "interesting" objects in large databases, users typically define a query using positive and negative example objects and train a ...

Semi-Oblivious Chase Termination for Linear Existential Rules: An Experimental Study

The chase procedure is a fundamental algorithmic tool in databases that allows us to reason with constraints, such as existential rules, with a plethora of applications. It takes as input a database and a set of constraints, and iteratively completes ...

research-article
Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server

Cardinality estimation is widely believed to be one of the most important causes of poor query plans. Prior studies evaluate the impact of cardinality estimation on plan quality on a set of Select-Project-Join queries on PostgreSQL DBMS. Our empirical ...

WALTZ: Leveraging Zone Append to Tighten the Tail Latency of LSM Tree on ZNS SSD

We propose WALTZ, an LSM tree-based key-value store on the emerging Zoned Namespace (ZNS) SSD. The key contribution of WALTZ is to leverage the zone append command, which is a recent addition to ZNS SSD specifications, to provide tight tail latency. The ...

Accelerating Aggregation Queries on Unstructured Streams of Data

Analysts and scientists are interested in querying streams of video, audio, and text to extract quantitative insights. For example, an urban planner may wish to measure congestion by querying the live feed from a traffic camera. Prior work has used deep ...

QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting

SQL query performance is critical in database applications, and query rewriting is a technique that transforms an original query into an equivalent query with a better performance. In a wide range of database-supported systems, there is a unique problem ...

Consistent Range Approximation for Fair Predictive Modeling

This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximation (CRA) of ...

SUREL+: Moving from Walks to Sets for Scalable Subgraph-Based Graph Representation Learning

Subgraph-based graph representation learning (SGRL) has recently emerged as a powerful tool in many prediction tasks on graphs due to its advantages in model expressiveness and generalization ability. Most previous SGRL models face computational issues ...

Estimating Single-Node PageRank in Õ (min{dt, √m}) Time

PageRank is a famous measure of graph centrality that has numerous applications in practice. The problem of computing a single node's PageRank has been the subject of extensive research over a decade. However, existing methods still incur large time ...

Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis

There have been many decades of work on optimizing query processing in database management systems. Recently, modern machine learning (ML), and specifically reinforcement learning (RL), has gained increased attention as a means to develop a query ...

BP-Tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-Trees

B-trees are the go-to data structure for in-memory indexes in databases and storage systems. B-trees support both point operations (i.e., inserts and finds) and range operations (i.e., iterators and maps). However, there is an inherent tradeoff between ...

HENCE-X: Toward Heterogeneity-Agnostic Multi-Level Explainability for Deep Graph Networks

Deep graph networks (DGNs) have demonstrated their outstanding effectiveness on both heterogeneous and homogeneous graphs. However their black-box nature does not allow human users to understand their working mechanisms. Recently, extensive efforts have ...

Automatic Road Extraction with Multi-Source Data Revisited: Completeness, Smoothness and Discrimination

Extracting roads from multi-source data, such as aerial images and vehicle trajectories, is an important way to maintain road networks in the filed of urban computing. In this paper, we revisit the problem of road extraction and aim to boost its ...

Asymptotically Better Query Optimization Using Indexed Algebra

Query optimization is essential for the efficient execution of queries. The necessary analysis, if we can and should apply optimizations and transform the query plan, is already challenging. Traditional techniques focus on the availability of columns at ...

Normalizing Property Graphs

Normalization aims at minimizing sources of potential data inconsistency and costs of update maintenance incurred by data redundancy. For relational databases, different classes of dependencies cause data redundancy and have resulted in proposals such ...

A Deep Dive into Common Open Formats for Analytical DBMSs

This paper evaluates the suitability of Apache Arrow, Parquet, and ORC as formats for subsumption in an analytical DBMS. We systematically identify and explore the high-level features that are important to support efficient querying in modern OLAP DBMSs ...

Saibot: A Differentially Private Data Search Platform

Recent data search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters submit a training dataset, and these platforms search for augmentations---join or union-compatible datasets--...

JoinBoost: Grow Trees over Normalized Data Using Only SQL

Although dominant for tabular data, ML libraries that train tree models over normalized databases (e.g., LightGBM, XGBoost) require the data to be denormalized as a single table, materialized, and exported. This process is not scalable, slow, and poses ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.