PVLDB: Vol 16, No 11

Volume 16, Issue 11July 2023

Volume 16, Issue 11

July 2023

Editor:

Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University

Publisher:

VLDB Endowment

ISSN:2150-8097

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFFront matter (Cover, Contents, Organization, Letter from the editors in chief)

Select All

Export Citations Save to Binder

research-article

A Two-Level Signature Scheme for Stable Set Similarity Joins

Pages 2686–2698https://doi.org/10.14778/3611479.3611480

We study the set similarity join problem, which retrieves all pairs of similar sets from two collections of sets for a given distance function. Existing exact solutions employ a signature-based filter-verification framework: If two sets are similar, ...

research-article

Scalable Reasoning on Document Stores via Instance-Aware Query Rewriting

Pages 2699–2713https://doi.org/10.14778/3611479.3611481

Data trees, typically encoded in JSON, are ubiquitous in data-driven applications. This ubiquity makes urgent the development of novel techniques for querying heterogeneous JSON data in a flexible manner. We propose a rule language for JSON, called ...

research-article

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions

Pages 2714–2727https://doi.org/10.14778/3611479.3611482

We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over videos from limited user interactions. The user only provides a handful of positive and negative examples of what they are looking for. EQUI-VOCAL utilizes these initial ...

research-article

Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines

Pages 2728–2741https://doi.org/10.14778/3611479.3611483

Recent advances in Graph Neural Networks (GNNs) have changed the landscape of modern graph analytics. The complexity of GNN training and the scalability challenges have also sparked interest from the systems community, with efforts to build systems that ...

research-article

Epoxy: ACID Transactions across Diverse Data Stores

Pages 2742–2754https://doi.org/10.14778/3611479.3611484

Developers are increasingly building applications that incorporate multiple data stores, for example to manage heterogeneous data. Often, these require transactional safety for operations across stores, but few systems support such guarantees. To solve ...

research-article

Analyzing Vectorized Hash Tables across CPU Architectures

Pages 2755–2768https://doi.org/10.14778/3611479.3611485

Data processing systems often leverage vector instructions to achieve higher performance. When applying vector instructions, an often overlooked data structure is the hash table, even though it is fundamental in data processing systems for operations ...

research-article

Exploiting Cloud Object Storage for High-Performance Analytics

Pages 2769–2782https://doi.org/10.14778/3611479.3611486

Elasticity of compute and storage is crucial for analytical cloud database systems. All cloud vendors provide disaggregated object stores, which can be used as storage backend for analytical query engines. Until recently, local storage was unavoidable ...

research-article

A Randomized Blocking Structure for Streaming Record Linkage

Pages 2783–2791https://doi.org/10.14778/3611479.3611487

A huge amount of data, in terms of streams, are collected nowadays via a variety of sources, such as sensors, mobile devices, or even raw log files. The unprecedented rate at which these data are generated and collected calls for novel record linkage ...

research-article

REmatch: A Novel Regex Engine for Finding All Matches

Pages 2792–2804https://doi.org/10.14778/3611479.3611488

In this paper, we present the REmatch system for information extraction. REmatch is based on a recently proposed enumeration algorithm for evaluating regular expressions with capture variables supporting the all-match semantics. It tells a story of what ...

research-article

ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning

Pages 2805–2817https://doi.org/10.14778/3611479.3611489

The performance of worst-case optimal join algorithms depends on the order in which the join attributes are processed. Selecting good orders before query execution is hard, due to the large space of possible orders and unreliable execution cost ...

research-article

Triangular Stability Maximization by Influence Spread over Social Networks

Pages 2818–2831https://doi.org/10.14778/3611479.3611490

In many real-world applications such as social network analysis and online advertising/marketing, one of the most important and popular problems is called influence maximization (IM), which finds a set of k seed users that maximize the expected number ...

research-article

CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space

Pages 2832–2844https://doi.org/10.14778/3611479.3611491

Median absolute deviation (MAD), the median of the absolute deviations from the median, has been found useful in various applications such as outlier detection. Together with median, MAD is more robust to abnormal data than mean and standard deviation (...

research-article

Fast Search-by-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests

Pages 2845–2857https://doi.org/10.14778/3611479.3611492

The vast amounts of data collected in various domains pose great challenges to modern data exploration and analysis. To find "interesting" objects in large databases, users typically define a query using positive and negative example objects and train a ...

research-article

Semi-Oblivious Chase Termination for Linear Existential Rules: An Experimental Study

Pages 2858–2870https://doi.org/10.14778/3611479.3611493

The chase procedure is a fundamental algorithmic tool in databases that allows us to reason with constraints, such as existential rules, with a plethora of applications. It takes as input a database and a set of constraints, and iteratively completes ...

research-article

Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server

Pages 2871–2883https://doi.org/10.14778/3611479.3611494

Cardinality estimation is widely believed to be one of the most important causes of poor query plans. Prior studies evaluate the impact of cardinality estimation on plan quality on a set of Select-Project-Join queries on PostgreSQL DBMS. Our empirical ...

research-article

WALTZ: Leveraging Zone Append to Tighten the Tail Latency of LSM Tree on ZNS SSD

Pages 2884–2896https://doi.org/10.14778/3611479.3611495

We propose WALTZ, an LSM tree-based key-value store on the emerging Zoned Namespace (ZNS) SSD. The key contribution of WALTZ is to leverage the zone append command, which is a recent addition to ZNS SSD specifications, to provide tight tail latency. The ...

research-article

Accelerating Aggregation Queries on Unstructured Streams of Data

Pages 2897–2910https://doi.org/10.14778/3611479.3611496

Analysts and scientists are interested in querying streams of video, audio, and text to extract quantitative insights. For example, an urban planner may wish to measure congestion by querying the live feed from a traffic camera. Prior work has used deep ...

research-article

QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting

Pages 2911–2924https://doi.org/10.14778/3611479.3611497

SQL query performance is critical in database applications, and query rewriting is a technique that transforms an original query into an equivalent query with a better performance. In a wide range of database-supported systems, there is a unique problem ...

research-article

Consistent Range Approximation for Fair Predictive Modeling

Pages 2925–2938https://doi.org/10.14778/3611479.3611498

This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximation (CRA) of ...

research-article

SUREL+: Moving from Walks to Sets for Scalable Subgraph-Based Graph Representation Learning

Pages 2939–2948https://doi.org/10.14778/3611479.3611499

Subgraph-based graph representation learning (SGRL) has recently emerged as a powerful tool in many prediction tasks on graphs due to its advantages in model expressiveness and generalization ability. Most previous SGRL models face computational issues ...

research-article

Estimating Single-Node PageRank in Õ (min{d_t, √m}) Time

Pages 2949–2961https://doi.org/10.14778/3611479.3611500

PageRank is a famous measure of graph centrality that has numerous applications in practice. The problem of computing a single node's PageRank has been the subject of extensive research over a decade. However, existing methods still incur large time ...

research-article

Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis

Pages 2962–2975https://doi.org/10.14778/3611479.3611501

There have been many decades of work on optimizing query processing in database management systems. Recently, modern machine learning (ML), and specifically reinforcement learning (RL), has gained increased attention as a means to develop a query ...

research-article

BP-Tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-Trees

Pages 2976–2989https://doi.org/10.14778/3611479.3611502

B-trees are the go-to data structure for in-memory indexes in databases and storage systems. B-trees support both point operations (i.e., inserts and finds) and range operations (i.e., iterators and maps). However, there is an inherent tradeoff between ...

research-article

HENCE-X: Toward Heterogeneity-Agnostic Multi-Level Explainability for Deep Graph Networks

Pages 2990–3003https://doi.org/10.14778/3611479.3611503

Deep graph networks (DGNs) have demonstrated their outstanding effectiveness on both heterogeneous and homogeneous graphs. However their black-box nature does not allow human users to understand their working mechanisms. Recently, extensive efforts have ...

research-article

Automatic Road Extraction with Multi-Source Data Revisited: Completeness, Smoothness and Discrimination

Pages 3004–3017https://doi.org/10.14778/3611479.3611504

Extracting roads from multi-source data, such as aerial images and vehicle trajectories, is an important way to maintain road networks in the filed of urban computing. In this paper, we revisit the problem of road extraction and aim to boost its ...

research-article

Asymptotically Better Query Optimization Using Indexed Algebra

Pages 3018–3030https://doi.org/10.14778/3611479.3611505

Query optimization is essential for the efficient execution of queries. The necessary analysis, if we can and should apply optimizations and transform the query plan, is already challenging. Traditional techniques focus on the availability of columns at ...

research-article

Normalizing Property Graphs

Pages 3031–3043https://doi.org/10.14778/3611479.3611506

Normalization aims at minimizing sources of potential data inconsistency and costs of update maintenance incurred by data redundancy. For relational databases, different classes of dependencies cause data redundancy and have resulted in proposals such ...

research-article

A Deep Dive into Common Open Formats for Analytical DBMSs

Pages 3044–3056https://doi.org/10.14778/3611479.3611507

This paper evaluates the suitability of Apache Arrow, Parquet, and ORC as formats for subsumption in an analytical DBMS. We systematically identify and explore the high-level features that are important to support efficient querying in modern OLAP DBMSs ...

research-article

Saibot: A Differentially Private Data Search Platform

Pages 3057–3070https://doi.org/10.14778/3611479.3611508

Recent data search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters submit a training dataset, and these platforms search for augmentations---join or union-compatible datasets--...

research-article

JoinBoost: Grow Trees over Normalized Data Using Only SQL

Pages 3071–3084https://doi.org/10.14778/3611479.3611509

Although dominant for tabular data, ML libraries that train tree models over normalized databases (e.g., LightGBM, XGBoost) require the data to be denormalized as a single table, materialized, and exported. This process is not scalable, slow, and poses ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Proceedings of the VLDB Endowment

Sections

Issue Downloads

A Two-Level Signature Scheme for Stable Set Similarity Joins

Scalable Reasoning on Document Stores via Instance-Aware Query Rewriting

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions

Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines

Epoxy: ACID Transactions across Diverse Data Stores

Analyzing Vectorized Hash Tables across CPU Architectures

Exploiting Cloud Object Storage for High-Performance Analytics

A Randomized Blocking Structure for Streaming Record Linkage

REmatch: A Novel Regex Engine for Finding All Matches

ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning

Triangular Stability Maximization by Influence Spread over Social Networks

CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space

Fast Search-by-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests

Semi-Oblivious Chase Termination for Linear Existential Rules: An Experimental Study

Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server

WALTZ: Leveraging Zone Append to Tighten the Tail Latency of LSM Tree on ZNS SSD

Accelerating Aggregation Queries on Unstructured Streams of Data

QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting

Consistent Range Approximation for Fair Predictive Modeling

SUREL+: Moving from Walks to Sets for Scalable Subgraph-Based Graph Representation Learning

Estimating Single-Node PageRank in Õ (min{d_t, √m}) Time

Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis

BP-Tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-Trees

HENCE-X: Toward Heterogeneity-Agnostic Multi-Level Explainability for Deep Graph Networks

Automatic Road Extraction with Multi-Source Data Revisited: Completeness, Smoothness and Discrimination

Asymptotically Better Query Optimization Using Indexed Algebra

Normalizing Property Graphs

A Deep Dive into Common Open Formats for Analytical DBMSs

Saibot: A Differentially Private Data Search Platform

JoinBoost: Grow Trees over Normalized Data Using Only SQL