Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Demonstrating TabEE: Tabular Embedding Explanations
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4285–4288https://doi.org/10.14778/3685800.3685856We present TabEE, Tabular Embedding Explanations, a framework designed to generate explanations for interpreting tabular embedding models. Our framework aims to furnish both local and global explanations for the original data, facilitating the detection ...
- research-articleJuly 2024
- research-articleJune 2024
Cost-Effective LLM Utilization for Machine Learning Tasks over Tabular Data
GUIDE-AI '24: Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AIPages 45–49https://doi.org/10.1145/3665601.3669848Classic machine learning (ML) models excel in modeling tabular datasets but lack broader world knowledge due to the absence of pre-training, an area where Large Language Models (LLMs) stand out. This paper presents an effective method that bridges the ...
- short-paperJune 2024
ASQP-RL Demo: Learning Approximation Sets for Exploratory Queries
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 452–455https://doi.org/10.1145/3626246.3654741We demonstrate the Approximate Selection Query Processing (ASQP-RL) system, which uses Reinforcement Learning to select a subset of a large external dataset to process locally in a notebook during data exploration. Given a query workload over an external ...
- research-articleMarch 2024
TabEE: Tabular Embeddings Explanations
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 1Article No.: 72, Pages 1–26https://doi.org/10.1145/3639329Tabular embedding methods have become increasingly popular due to their effectiveness in improving the results of various tasks, including classic databases tasks and machine learning predictions. However, most current methods treat these embedding ...
-
- short-paperJune 2023
ATENA-PRO: Generating Personalized Exploration Notebooks with Constrained Reinforcement Learning
SIGMOD '23: Companion of the 2023 International Conference on Management of DataPages 167–170https://doi.org/10.1145/3555041.3589727One of the most common, helpful practices of data scientists, when starting the exploration of a given dataset, is to examine existing data exploration notebooks prepared by other data analysts or scientists. These notebooks contain curated sessions of ...
FEDEX: An Explainability Framework for Data Exploration Steps
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 13Pages 3854–3868https://doi.org/10.14778/3565838.3565841When exploring a new dataset, Data Scientists often apply analysis queries, look for insights in the resulting dataframe, and repeat to apply further queries. We propose in this paper a novel solution that assists data scientists in this laborious ...
- research-articleAugust 2022
PHOcus: efficiently archiving photos
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 12Pages 3630–3633https://doi.org/10.14778/3554821.3554861Our ability to collect data is rapidly outstripping our ability to effectively store and use it. Organizations are therefore facing tough decisions of what data to archive (or dispose of) to effectively meet their business goals. PHOcus addresses this ...
- research-articleAugust 2022
OREO: detection of cherry-picked generalizations
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 12Pages 3570–3573https://doi.org/10.14778/3554821.3554846Data analytics often make sense of large data sets by generalization: aggregating from the detailed data to a more general context. Given a dataset, misleading generalizations can sometimes be drawn from a cherry-picked level of aggregation to obscure ...
- research-articleJuly 2022
The Seattle report on database research
- Daniel Abadi,
- Anastasia Ailamaki,
- David Andersen,
- Peter Bailis,
- Magdalena Balazinska,
- Philip A. Bernstein,
- Peter Boncz,
- Surajit Chaudhuri,
- Alvin Cheung,
- Anhai Doan,
- Luna Dong,
- Michael J. Franklin,
- Juliana Freire,
- Alon Halevy,
- Joseph M. Hellerstein,
- Stratos Idreos,
- Donald Kossmann,
- Tim Kraska,
- Sailesh Krishnamurthy,
- Volker Markl,
- Sergey Melnik,
- Tova Milo,
- C. Mohan,
- Thomas Neumann,
- Beng Chin Ooi,
- Fatma Ozcan,
- Jignesh Patel,
- Andrew Pavlo,
- Raluca Popa,
- Raghu Ramakrishnan,
- Christopher Re,
- Michael Stonebraker,
- Dan Suciu
Every five years, a group of the leading database researchers meet to reflect on their community's impact on the computing industry as well as examine current research challenges.
- research-articleJune 2022
Automated Category Tree Construction in E-Commerce
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1770–1783https://doi.org/10.1145/3514221.3526124Category trees play a central role in many web applications, enabling browsing-style information access. Building trees that reflect users' dynamic interests is, however, a challenging task, carried out by taxonomists. This manual construction leads to ...
- short-paperJune 2022
SubTab: Data Exploration with Informative Sub-Tables
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 2369–2372https://doi.org/10.1145/3514221.3520154We demonstrate SubTab, a framework for creating small, informative sub-tables of large data tables to speed up data exploration. Given a table with n rows and m columns where n and m are large, SubTab creates a sub-table T_sub with k<n rows and l<m ...
- research-articleJune 2022
Classifier Construction Under Budget Constraints
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1160–1174https://doi.org/10.1145/3514221.3517863Search mechanisms over large assortments of items are central to the operation of many platforms. As users commonly express filtering conditions based on item properties that are not initially stored, companies must derive the missing information by ...
On detecting cherry-picked generalizations
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 1Pages 59–71https://doi.org/10.14778/3485450.3485457Generalizing from detailed data to statements in a broader context is often critical for users to make sense of large data sets. Correspondingly, poorly constructed generalizations might convey misleading information even if the statements are ...
- research-articleJune 2021
Exploring Ratings in Subjective Databases
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 62–75https://doi.org/10.1145/3448016.3457259Subjective data links people to content items and reflects who likes or dislikes what. The valuable information this data contains is virtually infinite and satisfies various information needs. Yet, as of today, dedicated tools to explore this data are ...
- research-articleAugust 2020
ExplainED: explanations for EDA notebooks
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 12Pages 2917–2920https://doi.org/10.14778/3415478.3415508Exploratory Data Analysis (EDA) is an essential yet highly demanding task. To get a head start before exploring a new dataset, data scientists often prefer to view existing EDA notebooks - illustrative exploratory sessions that were created by fellow ...
- research-articleAugust 2020
CONCIERGE: improving constrained search results by data melioration
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 12Pages 2865–2868https://doi.org/10.14778/3415478.3415495The problem of finding an item-set of maximal aggregated utility that satisfies a set of constraints is at the cornerstone of many e-commerce applications. Its classical definition assumes that all the information needed to verify the constraints is ...
- short-paperMay 2020
Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataPages 1527–1537https://doi.org/10.1145/3318464.3389779Exploratory Data Analysis (EDA) is an essential yet highly demanding task. To get a head start before exploring a new dataset, data scientists often prefer to view existing EDA notebooks -- illustrative, curated exploratory sessions, on the same dataset,...
- research-articleMay 2020
Minimization of Classifier Construction Cost for Search Queries
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataPages 1351–1365https://doi.org/10.1145/3318464.3389755Search over massive sets of items is the cornerstone of many modern applications. Users express a set of properties and expect the system to retrieve qualifying items. A common difficulty, however, is that the information on whether an item satisfies ...