[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3589335.3641297acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
introduction
Free access

DCAI: Data-centric Artificial Intelligence

Published: 13 May 2024 Publication History

Abstract

The emergence of Data-centric AI (DCAI) represents a pivotal shift in AI development, redirecting focus from model refinement to prioritizing data quality. This paradigmatic transition emphasizes the critical role of data in AI. While past approaches centered on refining models, they often overlooked potential data imperfections, raising questions about the true potential of enhanced model performance. DCAI advocates the systematic engineering of data, complementing existing efforts and playing a vital role in driving AI success. This transition has spurred innovation in various machine learning and data mining algorithms and their applications on the Web. Therefore, we propose the DCAI Workshop at WWW'24, which offers a platform for academic researchers and industry practitioners to showcase the latest advancements in DCAI research and their practical applications in the real world.

References

[1]
Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, et al. 2024. A Survey on Data Selection for Language Models. arXiv preprint arXiv:2402.16827 (2024).
[2]
Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, and Jiliang Tang. 2024. Label-free Node Classification on Graphs with Large Language Models (LLMs). In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=hESD2NJFg8
[3]
Kristina Dachtler, Michael Ortner, Massimo Ferri, Christof Eberst, and Alexander Schiendorfer. [n.,d.]. Data-centric and Goal-oriented AI for Robotic Repair Tasks. ( [n.,d.]).
[4]
Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B Aditya Prakash, and Wei Jin. 2024. A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation. arXiv preprint arXiv:2402.03358 (2024).
[5]
Johannes Jakubik, Michael Vössing, Niklas Kühl, Jannis Walk, and Gerhard Satzger. 2022. Data-centric artificial intelligence. arXiv preprint arXiv:2212.11854 (2022).
[6]
Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph structure learning for robust graph neural networks. In KDD.
[7]
Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. 2022. Condensing graphs via one-step gradient matching. In KDD.
[8]
Hugo Laurencc on, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, et al. 2022. The bigscience roots corpus: A 1.6 tb composite multilingual dataset. In NeurIPS.
[9]
Haoyang Liu, Maheep Chaudhary, and Haohan Wang. 2023 a. Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives. arXiv preprint arXiv:2307.16851 (2023).
[10]
Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based out-of-distribution detection. Advances in neural information processing systems, Vol. 33 (2020), 21464--21475.
[11]
Xiao-Yang Liu, Guoxuan Wang, and Daochen Zha. 2023 b. FinGPT: Democratizing Internet-scale Data for Financial Large Language Models. arXiv preprint arXiv:2307.10485 (2023).
[12]
Xiao-Yang Liu, Ziyi Xia, Hongyang Yang, Jiechao Gao, Daochen Zha, Ming Zhu, Christina Dan Wang, Zhaoran Wang, and Jian Guo. 2023 c. Dynamic Datasets and Market Environments for Financial Reinforcement Learning. arXiv preprint arXiv:2304.13174 (2023).
[13]
Parikshit N Mahalle, Gitanjali R Shinde, Yashwant S Ingle, and Namrata N Wasatkar. 2023. Data-Centric AI in Mechanical Engineering. In Data Centric Artificial Intelligence: A Beginner's Guide. Springer, 97--108.
[14]
Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. 2022. Out-of-distribution detection with deep nearest neighbors. In International Conference on Machine Learning. PMLR, 20827--20840.
[15]
Haohan Wang, Zeyi Huang, Hanlin Zhang, Yong Jae Lee, and Eric P Xing. 2022. Toward learning human-aligned cross-domain robust models by countering misaligned features. In Uncertainty in Artificial Intelligence. PMLR, 2075--2084.
[16]
Cheng Yang, Deyu Bo, Jixi Liu, Yufei Peng, Boyu Chen, Haoran Dai, Ao Sun, Yue Yu, Yixin Xiao, Qi Zhang, et al. 2023. Data-centric Graph Learning: A Survey. arXiv preprint arXiv:2310.04987 (2023).
[17]
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, and Xia Hu. 2023 a. Data-centric AI: Perspectives and Challenges. In SDM.
[18]
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. 2013. Data-centric Artificial Intelligence: A Survey. arXiv preprint arXiv:2303.10158 (2013).
[19]
Daochen Zha, Kwei-Herng Lai, Fan Yang, Na Zou, Huiji Gao, and Xia Hu. 2023 b. Data-Centric AI: Techniques and Future Perspectives. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA) (KDD '23). Association for Computing Machinery, New York, NY, USA, 5839--5840. https://doi.org/10.1145/3580305.3599553
[20]
Tong Zhao, Wei Jin, Yozen Liu, Yingheng Wang, Gang Liu, Stephan Günneman, Neil Shah, and Meng Jiang. 2023. Graph Data Augmentation for Graph Machine Learning: A Survey. IEEE Data Engineering Bulletin (2023).
[21]
Xin Zheng, Yixin Liu, Zhifeng Bao, Meng Fang, Xia Hu, Alan Wee-Chung Liew, and Shirui Pan. 2023. Towards Data-centric Graph Machine Learning: Review and Outlook. arXiv preprint arXiv:2309.10979 (2023).
[22]
Zhiyao Zhou, Sheng Zhou, Bochao Mao, Xuanyi Zhou, Jiawei Chen, Qiaoyu Tan, Daochen Zha, Can Wang, Yan Feng, and Chun Chen. 2023. OpenGSL: A Comprehensive Benchmark for Graph Structure Learning. arXiv preprint arXiv:2306.10280 (2023).

Index Terms

  1. DCAI: Data-centric Artificial Intelligence

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '24: Companion Proceedings of the ACM Web Conference 2024
      May 2024
      1928 pages
      ISBN:9798400701726
      DOI:10.1145/3589335
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2024

      Check for updates

      Author Tags

      1. data augmentation
      2. data evaluation
      3. data optimization
      4. data reduction
      5. data selection
      6. data-centric ai

      Qualifiers

      • Introduction

      Funding Sources

      Conference

      WWW '24
      Sponsor:
      WWW '24: The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore, Singapore

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 563
        Total Downloads
      • Downloads (Last 12 months)563
      • Downloads (Last 6 weeks)145
      Reflects downloads up to 23 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media