Towards unified ad-hoc data processing

X Shi, B Cui, G Dobbie, BC Ooi - Proceedings of the 2014 ACM SIGMOD …, 2014 - dl.acm.org
Proceedings of the 2014 ACM SIGMOD International Conference on Management of …, 2014dl.acm.org
It is important to provide efficient execution for ad-hoc data processing programs. In contrast
to constructing complex declarative queries, many users prefer to write their programs using
procedural code with simple queries. As many users are not expert programmers, their
programs usually exhibit poor performance in practice and it is a challenge to automatically
optimize these programs and efficiently execute the programs. In this paper, we present
UniAD, a system designed to simplify the programming of data processing tasks and provide …
It is important to provide efficient execution for ad-hoc data processing programs. In contrast to constructing complex declarative queries, many users prefer to write their programs using procedural code with simple queries. As many users are not expert programmers, their programs usually exhibit poor performance in practice and it is a challenge to automatically optimize these programs and efficiently execute the programs. In this paper, we present UniAD, a system designed to simplify the programming of data processing tasks and provide efficient execution for user programs. We propose a novel intermediate representation named UniQL which utilizes HOQs to describe the operations performed in programs. By combining both procedural and declarative logics, we can perform various optimizations across the boundary between procedural and declarative codes. We describe optimizations and conduct extensive empirical studies using UniAD. The experimental results on four benchmarks demonstrate that our techniques can significantly improve the performance of a wide range of data processing programs.
ACM Digital Library