[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3459637.3482197acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Tabular Data Concept Type Detection Using Star-Transformers

Published: 30 October 2021 Publication History

Abstract

Tabular data is an invaluable information resource for search, in-formation extraction and question answering about the world. It is critical to understand the semantic concept types for table columns in order to fully exploit the information in tabular data. In this paper, we focus on learning-based approaches for column concept type detection without relying on any metadata or queries to existing knowledge bases. We propose a model that employs both statistical and semantic features of table columns, and use Star-Transformers to gather and scatter information across the whole table to boost the performance on individual columns. We apply distant supervision to construct a tabular dataset with columns annotated with DBpedia classes. Our experiment results show that our model achieves 93.57 accuracy on the dataset, exceeding that of the state-of-the-art baselines.

Supplementary Material

MP4 File (CIKM_tabular_data_talk.mp4)
In this paper, we focus on learning-based approaches for column concept type detection without relying on any metadata or queries to existing knowledge bases. We propose a model that employs both statistical and semantic features of table columns, and use StarTransformers to gather and scatter information across the whole table to boost the performance on individual columns. We apply distant supervision to construct a tabular dataset with columns annotated with DBpedia classes.

References

[1]
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. Tabel: Entity linking in web tables. In ISWC.
[2]
Matteo Cannaviccio, Lorenzo Ariemma, Denilson Barbosa, and Paolo Merialdo. 2018. Leveraging wikipedia table schemas for knowledge graph augmentation. In WebDB.
[3]
Jiaoyan Chen, Ernesto Jiménez-Ruiz, Ian Horrocks, and Charles Sutton. 2019. Learning semantic annotations for tabular data. In IJCAI.
[4]
Zhiyu Chen, Haiyan Jia, Jeff Heflin, and Brian D Davison. 2018. Generating schema labels through dataset content analysis. In WWW.
[5]
Andrew M Dai, Christopher Olah, and Quoc V Le. 2015. Document embedding with paragraph vectors. In NIPS Deep Learning Workshop.
[6]
Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. 2021. Turl: Table understanding through representation learning. In VLDB.
[7]
Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. Star-transformer. In NAACL.
[8]
Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, cC agatay Demiralp, and César Hidalgo. 2019. Sherlock: A deep learning approach to semantic data type detection. In KDD.
[9]
Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and searching web tables using entities, types and relationships. In VLDB.
[10]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In ICLR.
[11]
Varish Mulwad, Tim Finin, and Anupam Joshi. 2013. Semantic message passing for generating linked data from tables. In ISWC.
[12]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP.
[13]
Minh Pham, Suresh Alse, Craig A Knoblock, and Pedro Szekely. 2016. Semantic labeling: a domain-independent approach. In ISWC.
[14]
Dominique Ritze and Christian Bizer. 2017. Matching web tables to dbpedia-a feature utility study. In EDBT.
[15]
Natalia Rümmele, Yuriy Tyshetskiy, and Alex Collins. 2018. Evaluating approaches for supervised semantic labeling. In WWW Linked Data on the Web Workshop.
[16]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR.
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
[18]
Petros Venetis, Alon Y. Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, and Gengxin Miao. 2011. Recovering semantics of tables on the web. In VLDB.
[19]
Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Sebastian Riedel. 2020. Tabert: Pretraining for joint understanding of textual and tabular data. In ACL.
[20]
Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, Caugatay Demiralp, and Wang-Chiew Tan. 2020. Sato: Contextual semantic type detection in tables. In VLDB.
[21]
Ziqi Zhang. 2017. Effective and efficient semantic table interpretation using tableminer+. Semantic Web 8, 6 (2017), 921?957.

Cited By

View all
  • (2023)RECA: Related Tables Enhanced Column Semantic Type Annotation FrameworkProceedings of the VLDB Endowment10.14778/3583140.358314916:6(1319-1331)Online publication date: 1-Feb-2023

Index Terms

  1. Tabular Data Concept Type Detection Using Star-Transformers

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637
      This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. column classification
      2. concept type detection
      3. neural networks
      4. tabular data

      Qualifiers

      • Short-paper

      Conference

      CIKM '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)22
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)RECA: Related Tables Enhanced Column Semantic Type Annotation FrameworkProceedings of the VLDB Endowment10.14778/3583140.358314916:6(1319-1331)Online publication date: 1-Feb-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media