Learning to Extract Attribute Values from a Search Engine with Few Examples

Xingxing Zhang²³,
Tao Ge²³ &
Zhifang Sui²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8202))

Included in the following conference series:

Abstract

We propose an attribute value extraction method based on analysing snippets from a search engine. First, a pattern based detector is applied to locate the candidate attribute values in snippets. Then a classifier is used to predict whether a candidate value is correct. To train such a classifier, only very few annotated <entity, attribute, value> triples are needed, and sufficient training data can be generated automatically by matching these triples back to snippets and titles. Finally, as a correct value may appear in multiple snippets, to exploit such redundant information, all the individual predictions are assembled together by voting. Experiments on both Chinese and English corpora in the celebrity domain demonstrate the effectiveness of our method: with only 15 annotated <entity, attribute, value> triples, 7 of 12 attributes’ precisions are over 85%; Compared to a state-of-the-art method, 11 of 12 attributes have improvements.

This paper is supported by NSFC Project 61075067 and National Key Technology R&D Program (No: 2011BAH10B04-03).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

DAVE: Extracting Domain Attributes and Values from Text Corpus

HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains

Article 21 January 2022

Attribute Value Extraction Based on Rule Matching

References

Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI (2007)
Google Scholar
Bakalov, A., Fuxman, A., Talukdar, P., Chakrabarti, S.: Scad: collective discovery of attribute values. In: Proceedings of WWW 2011, Hyderabad, India, pp. 447–456 (2011)
Google Scholar
Cafarella, M.J.: Extracting and querying a comprehensive web database. In: CIDR (2009)
Google Scholar
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proc. of WSDM (2010a)
Google Scholar
Carlson, A., et al.: Toward an architecture for never-ending language learning. In: Proceedings of AAAI 2010 (2010b)
Google Scholar
Cimiano, P., Völker, J.: Text2Onto – a framework for ontology learning and data-driven change discovery. In: NLDB (2005)
Google Scholar
Davidov, D., Rappoport, A.: Extraction and Approximation of Numerical Attributes from the Web. In: Proc. of ACL (2010)
Google Scholar
Etzioni, O., et al.: Unsupervised named-entity extraction from the web: An experimental study. Artif. Intell. 165(1) (2005)
Google Scholar
Kozareva, Z., Riloff, E., Hovy, E.: Semantic class learning from the web with hyponym pattern linkage graphs. In: Proceedings of ACL 2008: HLT (2008)
Google Scholar
Pasca, M., Van Durme, B.: Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs. In: Proceedings of ACL 2008, pp. 19–27 (2008)
Google Scholar
Probst, K., Ghani, R., Krema, M., Fano, A., Liu, Y.: Semi-supervised learning of attribute-value pairs from product descriptions. In: IJCAI (2007)
Google Scholar
Ravi, S., Pasca, M.: Using Structured Text for Large-Scale Attribute Extraction. In: Proceedings of CIKM 2008, pp. 1183–1192 (2008)
Google Scholar
Wang, R.C., Cohen, W.W.: Language-independent set expansion of named entities using the web. In: ICDM, pp. 342–350. IEEE Computer Society (2007)
Google Scholar
Wu, F., Weld, D.S.: Automatically semantifying Wikipedia. In: CIKM, pp. 41–50 (2007)
Google Scholar
Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: Proceedings of WWW 2008 (2008)
Google Scholar
Wu, F., Hoffmann, R., Weld, D.S.: Information extraction from Wikipedia: Moving down the long tail. In: Proceedings of KDD (2008)
Google Scholar
Xu, F., Uszkoreit, H., Li, H.: A seed-driven bottom-up machine learning framework for extracting relations of various complexity. In: ACL (2007)
Google Scholar
Zhang, L.: Maximum Entropy Modeling Toolkit for Python and C++ (2004), http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html

Download references

Author information

Authors and Affiliations

Key Laboratory of Computational Linguistics, Ministry of Education, Peking University, China
Xingxing Zhang, Tao Ge & Zhifang Sui

Authors

Xingxing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Ge
View author publications
You can also search for this author in PubMed Google Scholar
Zhifang Sui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Maosong Sun
Horizon Doctoral Training Centre, School of Computer Science, University of Nottingham, NG8 1BB, Nottingham, UK
Min Zhang
Google Inc., Mountain View, CA, USA
Dekang Lin
Baidu Inc., Beijing, China
Haifeng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Ge, T., Sui, Z. (2013). Learning to Extract Attribute Values from a Search Engine with Few Examples. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-41491-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41490-9
Online ISBN: 978-3-642-41491-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics