[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2407085.2407101acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
short-paper

Finding additional semantic entity information for search engines

Published: 05 December 2012 Publication History

Abstract

Entity-oriented search has become an essential component of modern search engines. It focuses on retrieving a list of entities or information about the specific entities instead of documents. In this paper, we study the problem of finding entity related information, referred to as attribute-value pairs, that play a significant role in searching target entities. We propose a novel decomposition framework combining reduced relations and the discriminative model, Conditional Random Field (CRF), for automatically finding entity-related attribute-value pairs from free text documents. This decomposition framework allows us to locate potential text fragments and identify the hidden semantics, in the form of attribute-value pairs for user queries. Empirical analysis shows that the decomposition framework outperforms pattern-based approaches due to its capability of effective integration of syntactic and semantic features.

References

[1]
Adafre, S. F., Rijke, de M., and Sang, E. T. K. 2007. Entity Retrieval. In Proceedings of International Conference of Recent Advances in Natural Language Processing (Borovets, Bulgaria, 2007). RANLP'07. John Benjamins, Amsterdam. Netherland.
[2]
Almuhareb, A. 2006. Attributes in Lexical Acquisition. University of Essex, Colchester.
[3]
Arguello, J., F. Diaz, F., Callan, J., and Crespo, J. F. 2009. Sources of evidence for vertical selection. In Proceedings of ACM International Conference on Research and development in information retrieval (Boston, MA, USA, 2009). SIGIR'09. ACM, New York, NY, 315--322. DOI=http://doi.acm.org/10.1145/1571941.1571997.
[4]
Banko, M. 2009. Open Information Extraction for the Web. University of Washington, Seattle.
[5]
Banko, M. and Etzioni, O. 2008. The Tradeoffs Between Open and Traditional Relation Extraction. In Proceedings of Annual Meeting of the Association for Computational Linguistics, (Ohio, USA, 2008). ACL'08. Association for Computational Linguistics, Stroudsburg, PA, 28--36.
[6]
Bron, M., He, J., Hofmann, K., Meij, E., Rijke, M. D., Tsagkias, M., and Weerkamp, W. 2011. The University of Amsterdam at TREC 2010: Session, Entity and Relevance Feedback. In Proceedings of Text REtrieval Conference TREC 2010 (Gaithersburg, USA, 2011). TREC'11. NIST Special Publication, Gaithersburg, Maryland.
[7]
Demartini, G., C. S. Firan, C. S., lofciu, T., Krestel, R., and Nejdl, W. 2010. Why finding entities in Wikipedia is difficult, sometimes. Inf. Retr, 135, 534--567. DOI=http://doi.acm.org/10.1007/s10791-010-9135-7.
[8]
Etzioni, O., M. Banko, M., Soderland, S., and Weld, D. S. 2008. Open information extraction from the web. In Proceedings of International Joint Conference on Artificial Intelligence (Hyderabad, India, 2008). IJCAI'08. AAAI Press, Palo Alto, California, 2670--2676. DOI=http://doi.acm.org/10.1145/1409360.1409378.
[9]
Fader, A., Soderland, S., and Etzioni, O. 2011. Identifying relations for open information extraction. In Proceedings of Conference on Empirical Methods in Natural Language Processing (Edinburgh, United Kingdom, 2011). EMNLP'11. Association for Computational Linguistics, Stroudsburg, PA, 1535--1545.
[10]
Ghani, R., K. Probst, K., Liu, Y., Krema, M., and Fano, A. 2006. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 81, 41--48. DOI=http://doi.acm.org/10.1145/1147234.1147241.
[11]
Hartung, M. and Frank, A. 2010. A structured vector space model for hidden attribute meaning in adjective-noun phrases. In Proceedings of International Conference on Computational Linguistics (Beijing, China, 2010). COLING'10. Association for Computational Linguistics, Stroudsburg, PA, 430--438.
[12]
Hartung, M. and Frank, A. 2011. Exploring supervised LDA models for assigning attributes to adjective-noun phrases. In Proceedings of Conference on Empirical Methods in Natural Language Processing (Edinburgh, United Kingdom, 2011). EMNLP'11. Association for Computational Linguistics, Stroudsburg, PA, 540--551.
[13]
Lafferty, J. D., A. McCallum, A., and Pereira, F. C. N. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of International Conference on Machine Learning (Williamstown, USA, 2001). ICML'01. Morgan Kaufmann Publishers Inc., San Fransisco, CA, 282--289.
[14]
Li, F., X. Zhang, X., Yuan, J. H., and Zhu, X. Y. 2008. Classifying what-type questions by head noun tagging. In Proceedings of International Conference on Computational Linguistics (Manchester, United Kingdom, 2008). COLING'08. Association for Computational Linguistics, Stroudsburg, PA, 481--488.
[15]
Li, X. 2010. Understanding the semantic structure of noun phrase queries. In Proceedings of Annual Meeting of the Association for Computational Linguistics (Uppsala, Sweden, 2010). ACL'10. Association for Computational Linguistics, Stroudsburg, PA, 1337--1345.
[16]
Pasca, M. 2007. Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds. In Proceedings of International conference on World Wide Web (Banff, Canada, 2007). WWW'07. ACM, New York, NY, 101--110. DOI=http://doi.acm.org/10.1145/1242572.1242587.
[17]
Pasca, M. 2008. Turning web text and search queries into factual knowledge: hierarchical class attribute extraction. In Proceedings of National Conference on Artificial intelligence (Chicago, Illinois, 2008). AAAI'08. AAAI Press, Palo Alto, California, 1225--1230.
[18]
Pasca, M. and Durme, B. V. 2007. What you seek is what you get: extraction of class attributes from query logs. In Proceedings of International joint conference on Artifical intelligence (Hyderabad, India, 2007). IJCAI'07. Morgan Kaufmann Publishers Inc., San Fransisco, CA, 2832--2837.
[19]
Pasca, M. and Durme, B. V. 2008. Weakly--supervised acquisition of open-domain classes and class attributes from web documents and query logs. In Proceedings of Annual Meeting of the Association for Computational Linguistics (Ohio, USA, 2008). ACL'08. Association for Computational Linguistics, Stroudsburg, PA, 19--27.
[20]
Reverb. http://reverb.cs.washington.edu
[21]
Rode, H. 2008. From document to entity retrieval: improving precision and performance of focused text search. University of Twente, Enschede.
[22]
Shen, D., J.--T. Sun, J. T., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification. In Proceedings of ACM International Conference on Research and development in information retrieval (Seattle, USA, 2006). SIGIR'06. ACM, New York, NY, 131--138. DOI= http://doi.acm.org/10.1145/1148170.1148196.
[23]
Sowa, John F. 2000. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Distributed Systems Online, 51, 1--3.
[24]
Suchanek, F. M., Kasneci, G., and Weikum, G. 2007. YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. In Proceedings of International World Wide Web Conference (Banff, Canada, 2007). WWW'07. ACM, New York, NY, 697--706. DOI= http://doi.acm.org/10.1145/1242572.1242667.
[25]
Tsikrika, T., P. Serdyukov, P., Rode, H., Westerveld, T., Aly, D, and Vries, A. P. 2008. Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah. In Proceedings of Focused access to XML documents: 6th international workshop of the initiative for the evaluation of XML (Dagstuhl Castle, Germany, 2008). INEX'08. Springer--Verlag, Heidelberg, Germany, 306--320. DOI= http://dx.doi.org/10.1007/978-3-540-85902-4_27.
[26]
Voorhees, E. M. and Harman, D. 2004. Overview of the TREC 2004 Question Answering Track. In Proceedings of Text REtrieval Conference TREC-4 (Gaithersburg, USA, 2004). TREC'04. NIST Special Publication, Gaithersburg, Maryland, 1--11.
[27]
Wu, F. and Weld, D. S. 2010. Open information extraction using Wikipedia. In Proceedings of Annual Meeting of the Association for Computational Linguistics (Uppsala, Sweden, 2010). ACL'10. Association for Computational Linguistics, Stroudsburg, PA, 118--127.
[28]
Zirn, C., V. Nastase, V., and Strube, M. 2008. Distinguishing between instances and classes in the Wikipedia taxonomy. In Proceedings of European semantic web conference on The semantic web: research and applications (Tenerife, Spain, 2008). ESWC'08. Springer--Verlag, Heidelberg, Germany, 376--387. DOI= http://dx.doi.org/10.1007/978-3-540-68234-9_29.

Cited By

View all

Index Terms

  1. Finding additional semantic entity information for search engines

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ADCS '12: Proceedings of the Seventeenth Australasian Document Computing Symposium
    December 2012
    142 pages
    ISBN:9781450314114
    DOI:10.1145/2407085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Dept. of Information Science, Univ.of Otago: Department of Information Science, University of Otago, Dunedin, New Zealand

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. conditional random field (CRF)
    2. decomposition framework
    3. entity retrieval

    Qualifiers

    • Short-paper

    Conference

    ADCS '12
    Sponsor:
    • Dept. of Information Science, Univ.of Otago

    Acceptance Rates

    Overall Acceptance Rate 30 of 57 submissions, 53%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media