Abstract
Table retrieval aims to rank candidate tables for answering natural language query, in which the most critical problem is how to learn informative representations for structured tables. Most previous methods roughly flatten the table and send it into a sequence encoder, ignoring the structure information of tables and the semantic interaction between table cells and contexts. In this paper, we propose a dual graph based method to perceive the semantics and structure of tables, so as to preferably support the downstream table retrieval task. Inspired by human cognition, we first decouple a table into the row view and column view, then build dual graphs from these two views with the consideration of table contexts. Afterward, intra-graph and inter-graph interactions are iteratively performed for aggregating and exchanging local row- and column-oriented features respectively, and an adaptive fusion strategy is eventually tailor-made for sophisticated table representations. In this way, the table structure and semantic information are well considered with dual-graph modeling. Consequently, the input query can match the target tables based on their full-fledged table representations and achieve the ultimate ranking results more accurately. Extensive experiments verify the superiority of our dual graphs over strong baselines on two table retrieval datasets WikiTables and WebQueryTable. Further analyses also confirm the adaptability for row-/column-oriented tables, and show the rationality and generalization of dual graphs. The source code is available at https://github.com/ty33123/DualG.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cafarella, M.J., Halevy, A., Khoussainova, N.: Data integration for the relational web. Proc. VLDB Endowment 2(1), 1090–1101 (2009)
Chen, W., et al.: TabFact: a large-scale dataset for table-based fact verification. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=rkeJRhNYDH
Chen, W., Zha, H., Chen, Z., Xiong, W., Wang, H., Wang, W.Y.: HybridQA: a dataset of multi-hop question answering over tabular and textual data. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.91, https://aclanthology.org/2020.findings-emnlp.91
Chen, Z., Trabelsi, M., Heflin, J., Xu, Y., Davison, B.D.: Table search using a deep contextualized language model. In: Huang, J., et al. (eds.) Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, 25–30 July 2020. ACM (2020). https://doi.org/10.1145/3397271.3401044, https://doi.org/10.1145/3397271.3401044
Chen, Z., Trabelsi, M., Heflin, J., Yin, D., Davison, B.D.: MGNETS: multi-graph neural networks for table search. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2945–2949. Association for Computing Machinery, New York, NY, USA (2021), https://doi.org/10.1145/3459637.3482140
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. vol. 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Eberius, J., Braunschweig, K., Hentsch, M., Thiele, M., Ahmadov, A., Lehner, W.: Building the dresden web table corpus: a classification approach. In: 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC) (2015). https://doi.org/10.1109/BDC.2015.30
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. vol. 2, Short Papers. Association for Computational Linguistics, Valencia, Spain (2017). https://aclanthology.org/E17-2068
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=SJU4ayYgl
Kurland, O.: The cluster hypothesis in information retrieval. In: Jones, G.J.F., Sheridan, P., Kelly, D., de Rijke, M., Sakai, T. (eds.) The 36th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2013, Dublin, Ireland - July 28 - August 01, 2013. ACM (2013). https://doi.org/10.1145/2484028.2484192, https://doi.org/10.1145/2484028.2484192
Li, X., Sun, Y., Cheng, G.: TSQA: Tabular scenario based question answering. Proc. AAAI Conf. Artif. Intell. 35(15), 13297–13305 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/17570
MacDonald, E., Barbosa, D.: Neural relation extraction on wikipedia tables for augmenting knowledge graphs. In: d’Aquin, M., Dietze, S., Hauff, C., Curry, E., Cudré-Mauroux, P. (eds.) CIKM 2020: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, 19–23 October 2020. ACM (2020). https://doi.org/10.1145/3340531.3412164, https://doi.org/10.1145/3340531.3412164
Pan, F., Canim, M., Glass, M., Gliozzo, A., Fox, P.: CLTR: an end-to-end, transformer-based system for cell-level table retrieval and table question answering. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-demo.24, https://aclanthology.org/2021.acl-demo.24
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. Nist Special Publication Sp 109 (1995)
Shi, Q., Zhang, Y., Yin, Q., Liu, T.: Logic-level evidence retrieval and graph-based verification network for table-based fact verification. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.16, https://aclanthology.org/2021.emnlp-main.16
Shraga, R., Roitman, H., Feigenblat, G., Canim, M.: Ad hoc table retrieval using intrinsic and extrinsic similarities. In: WWW 2020: The Web Conference 2020, Taipei, Taiwan, April 20–24, 2020. ACM/IW3C2 (2020). https://doi.org/10.1145/3366423.3379995, https://doi.org/10.1145/3366423.3379995
Shraga, R., Roitman, H., Feigenblat, G., Canim, M.: Web table retrieval using multimodal deep learning. In: Huang, J., (eds.), Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, 25–30 July 2020. ACM (2020). https://doi.org/10.1145/3397271.3401120, https://doi.org/10.1145/3397271.3401120
Sun, Y., Yan, Z., Tang, D., Duan, N., Qin, B.: Content-based table retrieval for web queries. Neurocomputing 349, 183–189 (2019). https://doi.org/10.1016/j.neucom.2018.10.033, https://www.sciencedirect.com/science/article/pii/S0925231218312219
Trabelsi, M., Chen, Z., Zhang, S., Davison, B.D., Heflin, J.: StruBERT: structure-aware BERT for table search and matching. In: Proceedings of the ACM Web Conference 2022. WWW 2022, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3485447.3511972, https://doi.org/10.1145/3485447.3511972
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., (ed.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA (2017)
Venetis, P., et al.: Recovering semantics of tables on the web. Proc. VLDB Endowment 4(9), 528–538 (2011)
Wang, D., Shiralkar, P., Lockard, C., Huang, B., Dong, X.L., Jiang, M.: TCN: table convolutional network for web table interpretation. In: Proceedings of the Web Conference 2021. WWW 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3442381.3450090, https://doi.org/10.1145/3442381.3450090
Wang, F., Sun, K., Chen, M., Pujara, J., Szekely, P.: Retrieving complex tables with multi-granular graph representation learning. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3404835.3462909
Yin, P., Neubig, G., Yih, W.T., Riedel, S.: TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.745, https://aclanthology.org/2020.acl-main.745
Zhang, L., Zhang, S., Balog, K.: Table2vec: neural word and entity embeddings for table population and retrieval. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019. ACM (2019). https://doi.org/10.1145/3331184.3331333, https://doi.org/10.1145/3331184.3331333
Zhang, S., Balog, K.: Ad hoc table retrieval using semantic similarity. In: Champin, P., Gandon, F.L., Lalmas, M., Ipeirotis, P.G. (eds.) Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, 23–27 April 2018. ACM (2018). https://doi.org/10.1145/3178876.3186067, https://doi.org/10.1145/3178876.3186067
Acknowledgment
This work is supported by the National Key Research and Development Program of China (grant No.2021YFB3100600), the Strategic Priority Research Program of Chinese Academy of Sciences (grant No.XDC02040400) and the Youth Innovation Promotion Association of CAS (Grant No. 2021153).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethics Statement
I understand that using technology can have ethical implications, especially in collection, processing, and privacy of form retrieval data. I acknowledge and recognize the importance of complying with ethical standards and the hazards of potential risks.
In the data collection and processing, my training data comes from two publicly available tabular search datasets. Although we do not collect or store any sensitive information, we should strictly restrict the retrieval text of users and ensure that it does not contain any dangerous information.
In addition, when the model used in police or military related applications, we should pay special attention to its use in these areas, which must conducted in a more responsible manner. To prevent models from providing inaccurate search results for police or military personnel, users are responsible for ensuring that they comply with ethical principles and laws and regulations when using model outputs, and for screening search results.
In summary, I strive to ensure that the model outputs search results in an ethical and responsible manner, and I urge my users to do the same. I will continue to adhere to ethical standards and stay abreast of emerging ethical issues in the fields of machine learning and data mining.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, T. et al. (2023). Enhancing Table Retrieval with Dual Graph Representations. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-43421-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43420-4
Online ISBN: 978-3-031-43421-1
eBook Packages: Computer ScienceComputer Science (R0)