Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software EngineeringPages 1693–1705https://doi.org/10.1145/3691620.3695536Selecting the best code solution from multiple generated ones is an essential task in code generation, which can be achieved by using some reliable validators (e.g., developer-written test cases) for assistance. Since reliable test cases are not always ...
- ArticleAugust 2024
Unlocking the Power of Diversity in Index Tuning for Cluster Databases
AbstractIndex tuning is crucial for database performance, but existing algorithms are often tailored for single instances, posing challenges in cluster architectures. While uniform tuning simplifies matters by applying identical configurations across ...
- research-articleAugust 2024
Calibration of Time-Series Forecasting: Detecting and Adapting Context-Driven Distribution Shift
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 341–352https://doi.org/10.1145/3637528.3671926Recent years have witnessed the success of introducing deep learning models to time series forecasting. From a data generation perspective, we illustrate that existing models are susceptible to distribution shifts driven by temporal contexts, whether ...
- research-articleJuly 2024
Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning
Proceedings of the ACM on Software Engineering (PACMSE), Volume 1, Issue FSEArticle No.: 104, Pages 2355–2377https://doi.org/10.1145/3660811Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require ...
- research-articleMay 2024
Full-Attention Driven Graph Contrastive Learning: with Effective Mutual Information Insight
WWW '24: Proceedings of the ACM Web Conference 2024Pages 1069–1080https://doi.org/10.1145/3589334.3645717Graph contrastive learning often faces challenges when data augmentations compromise the graph's critical attributes, introducing the risk of generating noise-positive pairs. Although recent methods have attempted to address these issues, they either ...
-
- research-articleNovember 2023
Large-Scale Graph Label Propagation on GPUs
IEEE Transactions on Knowledge and Data Engineering (IEEECS_TKDE), Volume 36, Issue 10Pages 5234–5248https://doi.org/10.1109/TKDE.2023.3336329Graph label propagation (<monospace>LP</monospace>) is a core component in many downstream applications such as fraud detection, recommendation and image segmentation. In this paper, we propose <monospace>GLP</monospace>, a GPU-based framework to enable ...
- ArticleAugust 2023
Enhancing Online Index Tuning with a Learned Tuning Diagnostic
AbstractIndexes are vital for data retrieval performance. For online scenarios with dynamic workloads, index tuning is challenging. A commonly used strategy is to launch tuning requests periodically, yet resource-intensive tuning sessions can obstruct it, ...
- research-articleJune 2023
PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba
- Jianying Wang,
- Tongliang Li,
- Haoze Song,
- Xinjun Yang,
- Wenchao Zhou,
- Feifei Li,
- Baoyue Yan,
- Qianqian Wu,
- Yukun Liang,
- ChengJun Ying,
- Yujie Wang,
- Baokai Chen,
- Chang Cai,
- Yubin Ruan,
- Xiaoyi Weng,
- Shibin Chen,
- Liang Yin,
- Chengzhong Yang,
- Xin Cai,
- Hongyan Xing,
- Nanlong Yu,
- Xiaofei Chen,
- Dapeng Huang,
- Jianling Sun
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 199, Pages 1–25https://doi.org/10.1145/3589785Cloud-native databases have become the de-facto choice for mission-critical applications on the cloud due to the need for high availability, resource elasticity, and cost efficiency. Meanwhile, driven by the increasing connectivity between data ...
- research-articleApril 2023
HINormer: Representation Learning On Heterogeneous Information Networks with Graph Transformer
WWW '23: Proceedings of the ACM Web Conference 2023Pages 599–610https://doi.org/10.1145/3543507.3583493Recent studies have highlighted the limitations of message-passing based graph neural networks (GNNs), e.g., limited model expressiveness, over-smoothing, over-squashing, etc. To alleviate these issues, Graph Transformers (GTs) have been proposed which ...
CatSQL: Towards Real World Natural Language to SQL Applications
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 6Pages 1534–1547https://doi.org/10.14778/3583140.3583165Natural language to SQL (NL2SQL) techniques provide a convenient interface to access databases, especially for non-expert users, to conduct various data analytics. Existing methods often employ either a rule-base approach or a deep learning based ...
- research-articleApril 2024
LBD: decouple relevance and observation for individual-level unbiased learning to rank
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 2420, Pages 33400–33413Using Unbiased Learning to Rank (ULTR) to train the ranking model with biased click logs has attracted increased research interest. The key idea is to explicitly model the user's observation behavior when building the ranker with a large number of click ...
- research-articleAugust 2022
Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 136–145https://doi.org/10.1145/3534678.3539468Unbiased learning to rank (ULTR) aims to train an unbiased ranking model from biased user click logs. Most of the current ULTR methods are based on the examination hypothesis (EH), which assumes that the click probability can be factorized into two ...
- research-articleJune 2022
SA-LSM: optimize data layout for LSM-tree based storage using survival analysis
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 10Pages 2161–2174https://doi.org/10.14778/3547305.3547320A significant fraction of data in cloud storage is rarely accessed, referred to as cold data. Accurately identifying and efficiently managing cold data on cost-effective storages is one of the major challenges for cloud providers, which balances between ...
- research-articleApril 2022
On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks
WWW '22: Proceedings of the ACM Web Conference 2022Pages 1506–1516https://doi.org/10.1145/3485447.3512197The prevalence of graph structures attracts a surge of investigation on graph data, enabling several downstream tasks such as multi-graph classification. However, in the multi-graph setting, graphs usually follow a long-tailed distribution in terms of ...
- short-paperAugust 2021
KGAMD: an API-misuse detector driven by fine-grained API-constraint knowledge graph
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1515–1519https://doi.org/10.1145/3468264.3473112Application Programming Interfaces (APIs) typically come with usage constraints. The violations of these constraints (i.e. API misuses) can cause significant problems in software development. Existing methods mine frequent API usage patterns from ...
- research-articleJuly 2021
Adapting Interactional Observation Embedding for Counterfactual Learning to Rank
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 285–294https://doi.org/10.1145/3404835.3462901Counterfactual Learning to Rank (CLTR) becomes an attractive research topic due to its capability of training ranker with click logs. However, CLTR inherently suffers from a large amount of bias caused by confounders, variables that affect both the ...
- research-articleJune 2021
GPU-Accelerated Graph Label Propagation for Real-Time Fraud Detection
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2348–2356https://doi.org/10.1145/3448016.3452774Fraud detection is a pressing challenge for most financial and commercial platforms. In this paper, we study the processing pipeline of fraud detection in a large e-commerce platform of TaoBao. Graph label propagation (LP) is a core component in this ...
- research-articleJanuary 2021
API-misuse detection driven by fine-grained API-constraint knowledge graph
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software EngineeringPages 461–472https://doi.org/10.1145/3324884.3416551API misuses cause significant problem in software development. Existing methods detect API misuses against frequent API usage patterns mined from codebase. They make a naive assumption that API usage that deviates from the most-frequent API usage is a ...
- research-articleAugust 2020
Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningPages 359–367https://doi.org/10.1145/3394486.3403078Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of ...
- research-articleOctober 2020
Demystify official API usage directives with crowdsourced API misuse scenarios, erroneous code examples and patches
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringPages 925–936https://doi.org/10.1145/3377811.3380430API usage directives in official API documentation describe the contracts, constraints and guidelines for using APIs in natural language. Through the investigation of API misuse scenarios on Stack Overflow, we identify three barriers that hinder the ...