Author: Sun, Jianling : Search

research-article

B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software EngineeringPages 1693–1705https://doi.org/10.1145/3691620.3695536

Selecting the best code solution from multiple generated ones is an essential task in code generation, which can be achieved by using some reliable validators (e.g., developer-written test cases) for assistance. Since reliable test cases are not always ...

Article

Unlocking the Power of Diversity in Index Tuning for Cluster Databases

Database and Expert Systems ApplicationsPages 185–200https://doi.org/10.1007/978-3-031-68312-1_15

Abstract

Index tuning is crucial for database performance, but existing algorithms are often tailored for single instances, posing challenges in cluster architectures. While uniform tuning simplifies matters by applying identical configurations across ...

research-article

Calibration of Time-Series Forecasting: Detecting and Adapting Context-Driven Distribution Shift

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 341–352https://doi.org/10.1145/3637528.3671926

Recent years have witnessed the success of introducing deep learning models to time series forecasting. From a data generation perspective, we illustrate that existing models are susceptible to distribution shifts driven by temporal contexts, whether ...

research-article

Open Access

Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

Proceedings of the ACM on Software Engineering (PACMSE), Volume 1, Issue FSEArticle No.: 104, Pages 2355–2377https://doi.org/10.1145/3660811

Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require ...

research-article

Open Access

Full-Attention Driven Graph Contrastive Learning: with Effective Mutual Information Insight

WWW '24: Proceedings of the ACM Web Conference 2024Pages 1069–1080https://doi.org/10.1145/3589334.3645717

Graph contrastive learning often faces challenges when data augmentations compromise the graph's critical attributes, introducing the risk of generating noise-positive pairs. Although recent methods have attempted to address these issues, they either ...

research-article

Large-Scale Graph Label Propagation on GPUs

IEEE Transactions on Knowledge and Data Engineering (IEEECS_TKDE), Volume 36, Issue 10Pages 5234–5248https://doi.org/10.1109/TKDE.2023.3336329

Graph label propagation (<monospace>LP</monospace>) is a core component in many downstream applications such as fraud detection, recommendation and image segmentation. In this paper, we propose <monospace>GLP</monospace>, a GPU-based framework to enable ...

Article

Enhancing Online Index Tuning with a Learned Tuning Diagnostic

Database and Expert Systems ApplicationsPages 197–212https://doi.org/10.1007/978-3-031-39847-6_14

Abstract

Indexes are vital for data retrieval performance. For online scenarios with dynamic workloads, index tuning is challenging. A commonly used strategy is to launch tuning requests periodically, yet resource-intensive tuning sessions can obstruct it, ...

research-article

Open Access

PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba

Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 199, Pages 1–25https://doi.org/10.1145/3589785

Cloud-native databases have become the de-facto choice for mission-critical applications on the cloud due to the need for high availability, resource elasticity, and cost efficiency. Meanwhile, driven by the increasing connectivity between data ...

research-article

HINormer: Representation Learning On Heterogeneous Information Networks with Graph Transformer

WWW '23: Proceedings of the ACM Web Conference 2023Pages 599–610https://doi.org/10.1145/3543507.3583493

Recent studies have highlighted the limitations of message-passing based graph neural networks (GNNs), e.g., limited model expressiveness, over-smoothing, over-squashing, etc. To alleviate these issues, Graph Transformers (GTs) have been proposed which ...

research-article

CatSQL: Towards Real World Natural Language to SQL Applications

Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 6Pages 1534–1547https://doi.org/10.14778/3583140.3583165

Natural language to SQL (NL2SQL) techniques provide a convenient interface to access databases, especially for non-expert users, to conduct various data analytics. Existing methods often employ either a rule-base approach or a deep learning based ...

research-article

LBD: decouple relevance and observation for individual-level unbiased learning to rank

NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 2420, Pages 33400–33413

Using Unbiased Learning to Rank (ULTR) to train the ranking model with biased click logs has attracted increased research interest. The key idea is to explicitly model the user's observation behavior when building the ranker with a large number of click ...

research-article

Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 136–145https://doi.org/10.1145/3534678.3539468

Unbiased learning to rank (ULTR) aims to train an unbiased ranking model from biased user click logs. Most of the current ULTR methods are based on the examination hypothesis (EH), which assumes that the click probability can be factorized into two ...

research-article

SA-LSM: optimize data layout for LSM-tree based storage using survival analysis

Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 10Pages 2161–2174https://doi.org/10.14778/3547305.3547320

A significant fraction of data in cloud storage is rarely accessed, referred to as cold data. Accurately identifying and efficiently managing cold data on cost-effective storages is one of the major challenges for cloud providers, which balances between ...

research-article

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

WWW '22: Proceedings of the ACM Web Conference 2022Pages 1506–1516https://doi.org/10.1145/3485447.3512197

The prevalence of graph structures attracts a surge of investigation on graph data, enabling several downstream tasks such as multi-graph classification. However, in the multi-graph setting, graphs usually follow a long-tailed distribution in terms of ...

short-paper

KGAMD: an API-misuse detector driven by fine-grained API-constraint knowledge graph

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1515–1519https://doi.org/10.1145/3468264.3473112

Application Programming Interfaces (APIs) typically come with usage constraints. The violations of these constraints (i.e. API misuses) can cause significant problems in software development. Existing methods mine frequent API usage patterns from ...

research-article

Adapting Interactional Observation Embedding for Counterfactual Learning to Rank

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 285–294https://doi.org/10.1145/3404835.3462901

Counterfactual Learning to Rank (CLTR) becomes an attractive research topic due to its capability of training ranker with click logs. However, CLTR inherently suffers from a large amount of bias caused by confounders, variables that affect both the ...

research-article

GPU-Accelerated Graph Label Propagation for Real-Time Fraud Detection

SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2348–2356https://doi.org/10.1145/3448016.3452774

Fraud detection is a pressing challenge for most financial and commercial platforms. In this paper, we study the processing pipeline of fraud detection in a large e-commerce platform of TaoBao. Graph label propagation (LP) is a core component in this ...

research-article

API-misuse detection driven by fine-grained API-constraint knowledge graph

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software EngineeringPages 461–472https://doi.org/10.1145/3324884.3416551

API misuses cause significant problem in software development. Existing methods detect API misuses against frequent API usage patterns mined from codebase. They make a naive assumption that API usage that deviates from the most-frequent API usage is a ...

research-article

Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningPages 359–367https://doi.org/10.1145/3394486.3403078

Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of ...

research-article

Demystify official API usage directives with crowdsourced API misuse scenarios, erroneous code examples and patches

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringPages 925–936https://doi.org/10.1145/3377811.3380430

API usage directives in official API documentation describe the contracts, constraints and guidelines for using APIs in natural language. Through the investigation of API misuse scenarios on Stack Overflow, we identify three barriers that hinder the ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences