research-article

Open access

HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries

Authors:

Rana Alotaibi,

Bogdan Cautis,

Alin Deutsch,

Ioana ManolescuAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 23 - 35

https://doi.org/10.1145/3448016.3457311

Published: 18 June 2021 Publication History

PDF eReader

Abstract

Hybrid complex analytics workloads typically include (i) data management tasks (joins, selections, etc. ), easily expressed using relational algebra (RA)-based languages, and (ii) complex analytics tasks (regressions, matrix decompositions, etc.), mostly expressed in linear algebra (LA) expressions. Such workloads are common in many application areas, including scientific computing, web analytics, and business recommendation. Existing solutions for evaluating hybrid analytical tasks - ranging from LA-oriented systems, to relational systems (extended to handle LA operations), to hybrid systems - either optimize data management and complex tasks separately, exploit RA properties only while leaving LA-specific optimization opportunities unexploited, or focus heavily on physical optimization, leaving semantic query optimization opportunities unexplored. Additionally, they are not able to exploit precomputed (materialized) results to avoid recomputing (part of) a given mixed (RA and/or LA) computation.

In this paper, we take a major step towards filling this gap by proposing HADAD, an extensible lightweight approach for optimizing hybrid complex analytics queries, based on a common abstraction that facilitates unified reasoning: a relational model endowed with integrity constraints. Our solution can be naturally and portably applied on top of pure LA and hybrid RA-LA platforms without modifying their internals. An extensive empirical evaluation shows that HADAD yields significant performance gains on diverse workloads, ranging from LA-centered to hybrid.

Supplementary Material

Source Code (3448016.3457311_source_code.zip)

Download
113.25 MB

Read me (3448016.3457311_readme.pdf)

Download
85.60 KB

MP4 File (3448016.3457311.mp4)

Hybrid complex analytics workloads typically include (i) data management tasks (joins, filters, etc. ), easily expressed using relational algebra (RA)-based languages, and (ii) complex analytics tasks (regressions, matrix decompositions, etc.), mostly expressed in linear algebra (LA) expressions. Such workloads are common in a number of areas, including scientific computing, web analytics, business recommendation, natural language processing, speech recognition. Existing solutions for evaluating hybrid complex analytics queries ranging from LA-oriented systems, to relational systems (extended to handle LA operations), to hybrid systems fail to provide a unified optimization framework for such a hybrid setting. These systems either optimize data management and complex analytics tasks separately, or exploit RA properties only while leaving LA-specific optimization opportunities unexplored. Finally, they are not able to exploit precomputed (materialized) results to avoid computing again(part of) a given mixed (LA and RA) computation. We describe HADAD, an extensible lightweight approach for optimizinghybrid complex analytics queries, based on a common abstraction that facilitates unified reasoning: a relational model endowed with integrity constraints, which can be used to express the properties of the two computation formalisms. Our approach enables full exploration of LA properties and rewrites, as well as semantic query optimization. Importantly, our approach does not require modifying the internals of the existing systems. Our experimental evaluation shows significant performance gains on diverse workloads, from LA-centered ones to hybrid ones.

Download
147.90 MB

References

[1]

Amazon Review Data. https://nijianmo.github.io/amazon/index.html, Accessed August, 2020.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Complete yet practical search for minimal query reformulations under constraints

Determinacy and query rewriting for conjunctive queries and views

Optimizing complex queries based on similarities of subqueries

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations