More Web Proxy on the site http://driver.im/

research-article

Graph-based concept identification and disambiguation for enterprise search

Authors:

Gregor Hackenbroich,

Wojciech M. BarczynskiAuthors Info & Claims

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 171 - 180

https://doi.org/10.1145/1772690.1772709

Published: 26 April 2010 Publication History

Abstract

Enterprise Search (ES) is different from traditional IR due to a number of reasons, among which the high level of ambiguity of terms in queries and documents and existence of graph-structured enterprise data (ontologies) that describe the concepts of interest and their relationships to each other, are the most important ones.

Our method identifies concepts from the enterprise ontology in the query and corpus. We propose a ranking scheme for ontology sub-graphs on top of approximately matched token q-grams. The ranking leverages the graph-structure of the ontology to incorporate not explicitly mentioned concepts. It improves previous solutions by using a fine-grained ranking function that is specifically designed to cope with high levels of ambiguity. This method is able to capture much more of the semantics of queries and documents than previous techniques. We prove this claim by an evaluation of our method in three real-life scenarios from two different domains, and found it to consistently be superior both in terms of precision and recall.

References

[1]

S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In Proc. ICDE 2002.

Digital Library

[2]

E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In Proc. SIGIR 2004.

Digital Library

[3]

A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In Proc. VLDB 2004.

Digital Library

[4]

K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. SIGIR 2006.

Digital Library

[5]

G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In Proc. ICDE 2002.

Digital Library

[6]

C. Binnig, S. Hildenbrand, and F. Färber. Dictionary-based order-preserving string compression for main memory column stores. In Proc. SIGMOD 2009.

Digital Library

[7]

F. Brauer, W. Barczynski, G. Hackenbroich, M. Schramm, and A. Mocan. RankIE: Document Retrieval on Ranked Entity Graphs. In Proc. VLDB 2009 (Demo Track).

Digital Library

[8]

A. Z. Broder and A. C. Ciccolo. Towards the next generation of enterprise search technology. IBM Syst. J., 43(3):451--454, 2004.

Digital Library

[9]

V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania. Efficiently linking text documents with relevant structured information. In Proc. VLDB 2006.

Digital Library

[10]

A. Chandel, P. C. Nagesh, and S. Sarawagi. Efficient Batch Top-k Search for Dictionary-based Entity Recognition. In Proc. ICDE 2006.

Digital Library

[11]

S. Chaudhuri, V. Ganti, and R. Kaushik. A Primitive Operator for Similarity Joins in Data Cleaning. In Proc. ICDE 2006.

Digital Library

[12]

Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web search engines. In Proc. SIGMOD 2006.

Digital Library

[13]

T. Cheng, X. Yan, and K. C.-C. Chang. EntityRank: searching entities directly and holistically. In Proc. VLDB 2007.

Digital Library

[14]

W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A Comparison of String Metrics for Matching Names and Records. In KDD Workshop on Data Cleaning and Object Consolidation, 2003.

[15]

N. Craswell and D. Hawking. Overview of the TREC 2004 Web Track. In E. M. Voorhees and L. P. Buckland, editors, TREC, volume Special Publication 500-261. National Institute of Standards and Technology (NIST), 2004.

[16]

S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. of EMNLP-CoNLL, 2007.

[17]

J. L. G. Dietz. Enterprise Ontology: Theory and Methodology. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

Digital Library

[18]

S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. Tomlin, et al. SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation. In Proc. WWW 2003.

Digital Library

[19]

L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. Doshi, and J. Sachs. Swoogle: a search and metadata engine for the semantic web. In Proc. CIKM 2004.

Digital Library

[20]

C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the Web. In Proc. WWW 2001.

Digital Library

[21]

A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007.

Digital Library

[22]

H. Fang and C. Zhai. Probabilistic Models for Expert Finding. In Proc. ECIR 2007.

Digital Library

[23]

F. Farfán, V. Hristidis, A. Ranganathan, and M. Weiner. XOntoRank: Ontology-Aware Search of Electronic Medical Records. In Proc. ICDE 2009.

Digital Library

[24]

S. Gaudan, A. J. Yepes, V. Lee, and D. Rebholz-Schuhmann. Combining evidence, specificity, and proximity towards the normalization of gene ontology terms in text. EURASIP J. Bioinformatics Syst. Biol., pages 1--9, 2008.

Digital Library

[25]

J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarran. Indexing with WordNet synsets can improve text retrieval. Arxiv preprint cmp-lg/9808002, 1998.

[26]

L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: ranked keyword search over XML documents. In Proc. SIGMOD '03.

Digital Library

[27]

J. Hassell, B. Aleman-Meza, and I. B. Arpinar. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text. In Proc. ISWC 2006.

Digital Library

[28]

V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In Proc. VLDB 2003.

Digital Library

[29]

V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. In Proc. VLDB 2002.

Digital Library

[30]

V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proc. VLDB 2005.

Digital Library

[31]

G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. NAGA: Searching and Ranking Knowledge. In Proc. ICDE 2008.

Digital Library

[32]

G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proc. SIGMOD 2008.

Digital Library

[33]

F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In Proc. SIGMOD 2006.

Digital Library

[34]

A. Löser, W. M. Barczynski, and F. Brauer. What's the Intention Behind Your Query? A few Observations From a Large Developer Community. In Proc. IRSW 2008.

[35]

Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: top-k keyword query in relational databases. In Proc. SIGMOD 2007.

Digital Library

[36]

R. Mandala, T. Takenobu, and T. Hozumi. The use of WordNet in information retrieval. In Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, 1998.

[37]

C. Mangold, H. Schwarz, and B. Mitschang. u38: A framework for database-supported enterprise document-retrieval. In Proc. IDEAS 2006, 2006.

Digital Library

[38]

C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

Digital Library

[39]

K. S. McCurley. Geospatial mapping and navigation of the web. In Proc WWW 2001.

Digital Library

[40]

M. Michelson and C. A. Knoblock. Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web. Int. J. Doc. Anal. Recognit., 10(3):211--226, 2007.

Digital Library

[41]

K. Muthmann, A. Loeser, W. Barczynski, and F. Brauer. Near-Duplicate Detection for Web-Forums. In Proc. IDEAS 2009.

Digital Library

[42]

G. Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 33(1), 2001.

Digital Library

[43]

R. Navigli and P. Velardi. An analysis of ontology-based query expansion strategies. In Workshop on Adaptive Text Extraction and Mining, 2003.

[44]

J. K. Owyang, S. VanBoskirk, S. Glass, C. S. Overby, G. O. Young, and A. Polanco. The Forrester Wave: Community Platforms, Q1 2009. Forrester Wave (white paper), 2009.

[45]

S. Puhlmann, M. Weis, and F. Naumann. XML Duplicate Detection Using Sorted Neighborhoods. In Proc. EDBT 2006.

Digital Library

[46]

R. Richardson and A. Smeaton. Using WordNet in a knowledge-based approach to information retrieval. In Proceedings of the BCS-IRSG Colloquium, Crewe, 1995.

[47]

C. Rocha, D. Schwabe, and M. P. Aragao. A hybrid approach for searching in the semantic web. In Proc. WWW 2004.

Digital Library

[48]

E. F. T. K. Sang. Memory-based shallow parsing. J. Mach. Learn. Res., 2:559--594, 2002.

Digital Library

[49]

S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 1(3):261--377, 2008.

Digital Library

[50]

M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In Proc. VLDB 2004.

Digital Library

[51]

Y. Tsuruoka and J. ichi Tsujii. Improving the performance of dictionary-based approaches in protein name recognition. Journal of Biomedical Informatics, 37(6), 2004.

Digital Library

[52]

W. Wang, C. Xiao, X. Lin, and C. Zhang. Efficient approximate entity extraction with edit distance constraints. In Proc. SIGMOD 2009.

Digital Library

[53]

X. Yang, C. M. Procopiuc, and D. Srivastava. Summarizing Relational Databases. Proc. VLDB 2009.

Digital Library

[54]

Q. Zhou, C. Wang, M. Xiong, H. Wang, and Y. Yu. SPARK: Adapting Keyword Query to Semantic Search. In Proc. ISWC/ASWC 2007.

Digital Library

Cited By

He MFang TWang WSong Y(2024)Acquiring and Modeling Abstract Commonsense Knowledge via ConceptualizationArtificial Intelligence10.1016/j.artint.2024.104149(104149)Online publication date: May-2024
https://doi.org/10.1016/j.artint.2024.104149
Li JYang JLiu CZhao YLiu BShi Y(2014)Exploiting semantic linkages among multiple sources for semantic information retrievalEnterprise Information Systems10.1080/17517575.2013.8799238:4(464-489)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1080/17517575.2013.879923
Kuchmann-Beauger NBrauer FAufaure M(2013)QUASL: A framework for question answering and its Application to business intelligenceIEEE 7th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2013.6577686(1-12)Online publication date: May-2013
https://doi.org/10.1109/RCIS.2013.6577686
Show More Cited By

Index Terms

Graph-based concept identification and disambiguation for enterprise search
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Thesauri
    2. Search engine architectures and scalability
      1. Search engine indexing
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

Acronym extraction and disambiguation in large-scale organizational web pages
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

In this paper, we focus on the automatic extraction and disambiguation of acronyms in large-scale organizational web pages, which is important but difficult due to the diversity of acronyms and the scale of organizational web pages. We propose two novel ...
Query Expansion in Enterprise Search
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018

Although web search remains an active research area, interest in enterprise search has not kept up with the information requirements of the contemporary workforce. To address these issues, this research aims to develop, implement, and study the query ...
Search result diversification for enterprise data
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Search result diversification aims to return a list of diversified relevant documents in order to satisfy different user information needs. Most of the efforts focused on Web Search, and few studies have considered another important search domain, i.e., ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '10: Proceedings of the 19th international conference on World wide web

April 2010

1407 pages

ISBN:9781605587998

DOI:10.1145/1772690

General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India

Copyright © 2010 International World Wide Web Conference Committee (IW3C2).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '10

WWW '10: The 19th International World Wide Web Conference

April 26 - 30, 2010

North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
867
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

He MFang TWang WSong Y(2024)Acquiring and Modeling Abstract Commonsense Knowledge via ConceptualizationArtificial Intelligence10.1016/j.artint.2024.104149(104149)Online publication date: May-2024
https://doi.org/10.1016/j.artint.2024.104149
Li JYang JLiu CZhao YLiu BShi Y(2014)Exploiting semantic linkages among multiple sources for semantic information retrievalEnterprise Information Systems10.1080/17517575.2013.8799238:4(464-489)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1080/17517575.2013.879923
Kuchmann-Beauger NBrauer FAufaure M(2013)QUASL: A framework for question answering and its Application to business intelligenceIEEE 7th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2013.6577686(1-12)Online publication date: May-2013
https://doi.org/10.1109/RCIS.2013.6577686
Li JLiu CLiu B(2013)Large Scale Sequential Learning from Partially Labeled DataProceedings of the 2013 IEEE Seventh International Conference on Semantic Computing10.1109/ICSC.2013.39(176-183)Online publication date: 16-Sep-2013
https://dl.acm.org/doi/10.1109/ICSC.2013.39
Dong YLi J(2013)Organization oriented web search management2013 6th International Conference on Information Management, Innovation Management and Industrial Engineering10.1109/ICIII.2013.6703569(274-277)Online publication date: Nov-2013
https://doi.org/10.1109/ICIII.2013.6703569
Roy MWeber IBenatallah B(2013)Entity-Centric Search for Enterprise ServicesProceedings of the 11th International Conference on Service-Oriented Computing - Volume 827410.1007/978-3-642-45005-1_28(404-412)Online publication date: 2-Dec-2013
https://dl.acm.org/doi/10.1007/978-3-642-45005-1_28
Murayama TSakai RIiduka KMorita D(2012)Leveraging Semantic Web Technologies for Enterprise Information IntegrationNTT Technical Review10.53829/ntr201208ra110:8(29-35)Online publication date: Aug-2012
https://doi.org/10.53829/ntr201208ra1
Li JLiu C(2012)A Cooperative Co-learning Approach for Concept Detection in DocumentsProceedings of the 2012 IEEE Sixth International Conference on Semantic Computing10.1109/ICSC.2012.32(310-317)Online publication date: 19-Sep-2012
https://dl.acm.org/doi/10.1109/ICSC.2012.32
Liu CLi J(2012)Semantic-Based Composite Document RankingProceedings of the 2012 IEEE Sixth International Conference on Semantic Computing10.1109/ICSC.2012.28(126-129)Online publication date: 19-Sep-2012
https://dl.acm.org/doi/10.1109/ICSC.2012.28
Trißl SHussels PLeser U(2012)InterOnto – Ranking Inter-Ontology LinksData Integration in the Life Sciences10.1007/978-3-642-31040-9_2(5-20)Online publication date: 2012
https://doi.org/10.1007/978-3-642-31040-9_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

EPUB

View this article in ePub.

Media

Figures

Other

Tables

View Table of Contents