[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Content-based table retrieval for web queries

Published: 15 July 2019 Publication History

Abstract

Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to find the most relevant table from a collection of tables. Further progress towards improving this area requires powerful models of semantic matching and richer training and evaluation resources. To remedy this, we present a ranking based approach, and implement both carefully designed features and neural network architectures to measure the relevance between a query and the content of a table. Furthermore, we release an open-domain dataset that includes 21,113 web queries for 273,816 tables. We conduct comprehensive experiments on both real world and synthetic datasets. Various evaluation criteria demonstrate that the proposed approach performs comparable or better than a carefully designed feature-based system. We show what depth of table and language understanding is required to do well on this task, and hope further interests from the community in exploring deeper connections between table and text.

References

[1]
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, Proceedings of the International Conference on Learning Representations (ICLR), 2015.
[2]
S. Balakrishnan, A.Y. Halevy, B. Harb, H. Lee, J. Madhavan, A. Rostamizadeh, W. Shen, K. Wilder, F. Wu, C. Yu, Applying webtables in practice, in: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings, 2015.
[3]
C.J. Burges, From ranknet to lambdarank to lambdamart: an overview, Microsoft Res. Tech. Rep. MSR-TR-2010-82 11 (23–581) (2010) 81.
[4]
M.J. Cafarella, A. Halevy, Wang D.Z., Wu E., Zhang Y., Webtables: exploring the power of tables on the web, Proc. VLDB Endow. 1 (1) (2008) 538–549.
[5]
D. Chen, A. Fisch, J. Weston, A. Bordes, Reading wikipedia to answer opendomain questions, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, Volume 1: Long Papers, 2017, pp. 1870–1879.
[6]
Cho K., B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using rnn encoder–decoder for statistical machine translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1724–1734.
[7]
J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, CoRR abs/1412.3555 (2014).
[8]
A. Das Sarma, Fang L., N. Gupta, A. Halevy, Lee H., Wu F., Xin R., Yu C., Finding related tables, Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, 2012, pp. 817–828.
[9]
A. Fader, L. Zettlemoyer, O. Etzioni, Paraphrase-driven learning for open question answering, Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2013.
[10]
W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krüpl, B. Pollak, Towards domain-independent information extraction from web tables, Proceedings of the 16th international conference on World Wide Web (WWW), ACM, 2007, pp. 71–80.
[11]
A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwińska, S.G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, et al., Hybrid computing using a neural network with dynamic external memory, Nature 538 (7626) (2016) 471–476.
[12]
R. Gupta, S. Sarawagi, Answering table augmentation queries from unstructured lists on the web, Proc. VLDB Endow. 2 (1) (2009) 289–300.
[13]
P. Koehn, F.J. Och, D. Marcu, Statistical phrase-based translation, 1 (2003) 48–54.
[14]
R. Lebret, D. Grangier, M. Auli, Neural text generation from structured data with application to the biography domain, Proceedings of the 2016 Conference on Empirical Methods in Natural Language (EMNLP), 2016.
[15]
G. Limaye, S. Sarawagi, S. Chakrabarti, Annotating and searching web tables using entities, types and relationships, Proc. VLDB Endow. 3 (1–2) (2010) 1338–1347.
[16]
C.D. Manning, P. Raghavan, H. Schütze, et al., Introduction to Information Retrieval, 1, Cambridge University Press, Cambridge, 2008.
[17]
H. Mei, M. Bansal, M.R. Walter, What to talk about and how? Selective generation using LSTMS with coarse-to-fine alignment, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, San Diego, California, 2016, pp. 720–730.
[18]
T. Mikolov, I. Sutskever, Chen K., G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, Proceedings of the Advances in Neural Information Processing systems (NIPS), 2013, pp. 3111–3119.
[19]
A. Neelakantan, Q.V. Le, I. Sutskever, Neural programmer: Inducing latent programs with gradient descent, in: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
[20]
P. Pasupat, P. Liang, Compositional semantic parsing on semi-structured tables, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, 2015, pp. 1470–1480.
[21]
R. Pimplikar, S. Sarawagi, Answering table queries on the web using column keywords, Proc. VLDB Endow. 5 (10) (2012) 908–919.
[22]
S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, M. Gatford, et al., Okapi at trec-3, NIST Spec. publ. SP 109 (1995) 109.
[23]
I.V. Serban, A. García-Durán, C. Gulcehre, S. Ahn, S. Chandar, A. Courville, Y. Bengio, Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, 2016, pp. 588–598.
[24]
Shen Y., He X., Gao J., Deng L., G. Mesnil, A latent semantic model with convolutional-pooling structure for information retrieval, Proceedings of the Conference on Information and Knowledge Management (CIKM), 2014, pp. 101–110.
[25]
S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus, End-to-end memory networks, Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 2431–2439.
[26]
P. Venetis, A. Halevy, J. Madhavan, M. Paşca, Shen W., Wu F., Miao G., Wu C., Recovering semantics of tables on the web, Proc. VLDB Endow. 4 (9) (2011) 528–538.
[27]
O. Vinyals, S. Bengio, M. Kudlur, Order matters: Sequence to sequence for sets, in: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
[28]
S. Wang, M. Yu, X. Guo, Z. Wang, T. Klinger, W. Zhang, S. Chang, G. Tesauro, B. Zhou, J. Jiang, R3: Reinforced ranker-reader for open-domain question answering, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, 2018, pp. 5981–5988.
[29]
R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (3–4) (1992) 229–256.
[30]
J. Yin, X. Jiang, Z. Lu, L. Shang, H. Li, X. Li, Neural generative question answering, arXiv preprint arXiv:1512.01337(2015a).
[31]
P. Yin, Z. Lu, H. Li, B. Kao, Neural enquirer: Learning to query tables in natural language, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016, pp. 2308–2314.
[32]
Zhu Y., Guan Z., Tan S., Liu H., Cai D., He X., Heterogeneous hypergraph embedding for document recommendation, Neurocomputing 216 (2016) 150–162.
[33]
Zhu Y., Li H., Liao Y., Wang B., Guan Z., Liu H., Cai D., What to do next: modeling user behaviors by time-LSTM, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, 2017, pp. 3602–3608.

Cited By

View all
  • (2024)STaR: Space and Time-aware Statistic Query AnsweringProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679209(5190-5194)Online publication date: 21-Oct-2024
  • (2024)COTER: Conditional Optimal Transport meets Table RetrievalProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635796(911-919)Online publication date: 4-Mar-2024
  • (2023)Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised ApproachProceedings of the ACM on Management of Data10.1145/36267561:4(1-27)Online publication date: 12-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 349, Issue C
Jul 2019
327 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 15 July 2019

Author Tags

  1. Content-based table retrieval
  2. Semi-structured information processing
  3. Representation learning
  4. Information retrieval
  5. Natural language processing

Author Tags

  1. 00-01
  2. 99-00

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)STaR: Space and Time-aware Statistic Query AnsweringProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679209(5190-5194)Online publication date: 21-Oct-2024
  • (2024)COTER: Conditional Optimal Transport meets Table RetrievalProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635796(911-919)Online publication date: 4-Mar-2024
  • (2023)Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised ApproachProceedings of the ACM on Management of Data10.1145/36267561:4(1-27)Online publication date: 12-Dec-2023
  • (2022)Matching news articles and wikipedia tables for news augmentationKnowledge and Information Systems10.1007/s10115-022-01815-065:4(1713-1734)Online publication date: 27-Dec-2022
  • (2021)From Tables to KnowledgeProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3470809(4060-4061)Online publication date: 14-Aug-2021
  • (2021)Collocating News Articles with Structured Web Tables✱Companion Proceedings of the Web Conference 202110.1145/3442442.3452326(393-401)Online publication date: 19-Apr-2021
  • (2021)WTR: A Test Collection for Web Table RetrievalProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463260(2514-2520)Online publication date: 11-Jul-2021
  • (2021)Retrieving Complex Tables with Multi-Granular Graph Representation LearningProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462909(1472-1482)Online publication date: 11-Jul-2021
  • (2020)Web Table Retrieval using Multimodal Deep LearningProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401120(1399-1408)Online publication date: 25-Jul-2020
  • (2020)Table Search Using a Deep Contextualized Language ModelProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401044(589-598)Online publication date: 25-Jul-2020
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media