Computer Science > Machine Learning

arXiv:2406.08311 (cs)

[Submitted on 12 Jun 2024 (v1), last revised 5 Jul 2024 (this version, v2)]

Title:Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework

Authors:Ruibo Tu, Zineb Senane, Lele Cao, Cheng Zhang, Hedvig Kjellström, Gustav Eje Henter

Abstract:Tabular synthesis models remain ineffective at capturing complex dependencies, and the quality of synthetic data is still insufficient for comprehensive downstream tasks, such as prediction under distribution shifts, automated decision-making, and cross-table understanding. A major challenge is the lack of prior knowledge about underlying structures and high-order relationships in tabular data. We argue that a systematic evaluation on high-order structural information for tabular data synthesis is the first step towards solving the problem. In this paper, we introduce high-order structural causal information as natural prior knowledge and provide a benchmark framework for the evaluation of tabular synthesis models. The framework allows us to generate benchmark datasets with a flexible range of data generation processes and to train tabular synthesis models using these datasets for further evaluation. We propose multiple benchmark tasks, high-order metrics, and causal inference tasks as downstream tasks for evaluating the quality of synthetic data generated by the trained models. Our experiments demonstrate to leverage the benchmark framework for evaluating the model capability of capturing high-order structural causal information. Furthermore, our benchmarking results provide an initial assessment of state-of-the-art tabular synthesis models. They have clearly revealed significant gaps between ideal and actual performance and how baseline methods differ. Our benchmark framework is available at URL this https URL.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.08311 [cs.LG]
	(or arXiv:2406.08311v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.08311

Submission history

From: Ruibo Tu [view email]
[v1] Wed, 12 Jun 2024 15:12:49 UTC (330 KB)
[v2] Fri, 5 Jul 2024 06:44:33 UTC (330 KB)

Computer Science > Machine Learning

Title:Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators