[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1772690.1772709acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Graph-based concept identification and disambiguation for enterprise search

Published: 26 April 2010 Publication History

Abstract

Enterprise Search (ES) is different from traditional IR due to a number of reasons, among which the high level of ambiguity of terms in queries and documents and existence of graph-structured enterprise data (ontologies) that describe the concepts of interest and their relationships to each other, are the most important ones.
Our method identifies concepts from the enterprise ontology in the query and corpus. We propose a ranking scheme for ontology sub-graphs on top of approximately matched token q-grams. The ranking leverages the graph-structure of the ontology to incorporate not explicitly mentioned concepts. It improves previous solutions by using a fine-grained ranking function that is specifically designed to cope with high levels of ambiguity. This method is able to capture much more of the semantics of queries and documents than previous techniques. We prove this claim by an evaluation of our method in three real-life scenarios from two different domains, and found it to consistently be superior both in terms of precision and recall.

References

[1]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In Proc. ICDE 2002.
[2]
E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In Proc. SIGIR 2004.
[3]
A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In Proc. VLDB 2004.
[4]
K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. SIGIR 2006.
[5]
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In Proc. ICDE 2002.
[6]
C. Binnig, S. Hildenbrand, and F. Färber. Dictionary-based order-preserving string compression for main memory column stores. In Proc. SIGMOD 2009.
[7]
F. Brauer, W. Barczynski, G. Hackenbroich, M. Schramm, and A. Mocan. RankIE: Document Retrieval on Ranked Entity Graphs. In Proc. VLDB 2009 (Demo Track).
[8]
A. Z. Broder and A. C. Ciccolo. Towards the next generation of enterprise search technology. IBM Syst. J., 43(3):451--454, 2004.
[9]
V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania. Efficiently linking text documents with relevant structured information. In Proc. VLDB 2006.
[10]
A. Chandel, P. C. Nagesh, and S. Sarawagi. Efficient Batch Top-k Search for Dictionary-based Entity Recognition. In Proc. ICDE 2006.
[11]
S. Chaudhuri, V. Ganti, and R. Kaushik. A Primitive Operator for Similarity Joins in Data Cleaning. In Proc. ICDE 2006.
[12]
Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web search engines. In Proc. SIGMOD 2006.
[13]
T. Cheng, X. Yan, and K. C.-C. Chang. EntityRank: searching entities directly and holistically. In Proc. VLDB 2007.
[14]
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A Comparison of String Metrics for Matching Names and Records. In KDD Workshop on Data Cleaning and Object Consolidation, 2003.
[15]
N. Craswell and D. Hawking. Overview of the TREC 2004 Web Track. In E. M. Voorhees and L. P. Buckland, editors, TREC, volume Special Publication 500-261. National Institute of Standards and Technology (NIST), 2004.
[16]
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. of EMNLP-CoNLL, 2007.
[17]
J. L. G. Dietz. Enterprise Ontology: Theory and Methodology. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[18]
S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. Tomlin, et al. SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation. In Proc. WWW 2003.
[19]
L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. Doshi, and J. Sachs. Swoogle: a search and metadata engine for the semantic web. In Proc. CIKM 2004.
[20]
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the Web. In Proc. WWW 2001.
[21]
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007.
[22]
H. Fang and C. Zhai. Probabilistic Models for Expert Finding. In Proc. ECIR 2007.
[23]
F. Farfán, V. Hristidis, A. Ranganathan, and M. Weiner. XOntoRank: Ontology-Aware Search of Electronic Medical Records. In Proc. ICDE 2009.
[24]
S. Gaudan, A. J. Yepes, V. Lee, and D. Rebholz-Schuhmann. Combining evidence, specificity, and proximity towards the normalization of gene ontology terms in text. EURASIP J. Bioinformatics Syst. Biol., pages 1--9, 2008.
[25]
J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarran. Indexing with WordNet synsets can improve text retrieval. Arxiv preprint cmp-lg/9808002, 1998.
[26]
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: ranked keyword search over XML documents. In Proc. SIGMOD '03.
[27]
J. Hassell, B. Aleman-Meza, and I. B. Arpinar. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text. In Proc. ISWC 2006.
[28]
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In Proc. VLDB 2003.
[29]
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. In Proc. VLDB 2002.
[30]
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proc. VLDB 2005.
[31]
G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. NAGA: Searching and Ranking Knowledge. In Proc. ICDE 2008.
[32]
G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proc. SIGMOD 2008.
[33]
F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In Proc. SIGMOD 2006.
[34]
A. Löser, W. M. Barczynski, and F. Brauer. What's the Intention Behind Your Query? A few Observations From a Large Developer Community. In Proc. IRSW 2008.
[35]
Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: top-k keyword query in relational databases. In Proc. SIGMOD 2007.
[36]
R. Mandala, T. Takenobu, and T. Hozumi. The use of WordNet in information retrieval. In Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, 1998.
[37]
C. Mangold, H. Schwarz, and B. Mitschang. u38: A framework for database-supported enterprise document-retrieval. In Proc. IDEAS 2006, 2006.
[38]
C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
[39]
K. S. McCurley. Geospatial mapping and navigation of the web. In Proc WWW 2001.
[40]
M. Michelson and C. A. Knoblock. Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web. Int. J. Doc. Anal. Recognit., 10(3):211--226, 2007.
[41]
K. Muthmann, A. Loeser, W. Barczynski, and F. Brauer. Near-Duplicate Detection for Web-Forums. In Proc. IDEAS 2009.
[42]
G. Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 33(1), 2001.
[43]
R. Navigli and P. Velardi. An analysis of ontology-based query expansion strategies. In Workshop on Adaptive Text Extraction and Mining, 2003.
[44]
J. K. Owyang, S. VanBoskirk, S. Glass, C. S. Overby, G. O. Young, and A. Polanco. The Forrester Wave: Community Platforms, Q1 2009. Forrester Wave (white paper), 2009.
[45]
S. Puhlmann, M. Weis, and F. Naumann. XML Duplicate Detection Using Sorted Neighborhoods. In Proc. EDBT 2006.
[46]
R. Richardson and A. Smeaton. Using WordNet in a knowledge-based approach to information retrieval. In Proceedings of the BCS-IRSG Colloquium, Crewe, 1995.
[47]
C. Rocha, D. Schwabe, and M. P. Aragao. A hybrid approach for searching in the semantic web. In Proc. WWW 2004.
[48]
E. F. T. K. Sang. Memory-based shallow parsing. J. Mach. Learn. Res., 2:559--594, 2002.
[49]
S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 1(3):261--377, 2008.
[50]
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In Proc. VLDB 2004.
[51]
Y. Tsuruoka and J. ichi Tsujii. Improving the performance of dictionary-based approaches in protein name recognition. Journal of Biomedical Informatics, 37(6), 2004.
[52]
W. Wang, C. Xiao, X. Lin, and C. Zhang. Efficient approximate entity extraction with edit distance constraints. In Proc. SIGMOD 2009.
[53]
X. Yang, C. M. Procopiuc, and D. Srivastava. Summarizing Relational Databases. Proc. VLDB 2009.
[54]
Q. Zhou, C. Wang, M. Xiong, H. Wang, and Y. Yu. SPARK: Adapting Keyword Query to Semantic Search. In Proc. ISWC/ASWC 2007.

Cited By

View all
  • (2024)Acquiring and Modeling Abstract Commonsense Knowledge via ConceptualizationArtificial Intelligence10.1016/j.artint.2024.104149(104149)Online publication date: May-2024
  • (2014)Exploiting semantic linkages among multiple sources for semantic information retrievalEnterprise Information Systems10.1080/17517575.2013.8799238:4(464-489)Online publication date: 1-Jul-2014
  • (2013)QUASL: A framework for question answering and its Application to business intelligenceIEEE 7th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2013.6577686(1-12)Online publication date: May-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. enterprise search
  2. graph-based disambiguation
  3. information extraction

Qualifiers

  • Research-article

Conference

WWW '10
WWW '10: The 19th International World Wide Web Conference
April 26 - 30, 2010
North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Acquiring and Modeling Abstract Commonsense Knowledge via ConceptualizationArtificial Intelligence10.1016/j.artint.2024.104149(104149)Online publication date: May-2024
  • (2014)Exploiting semantic linkages among multiple sources for semantic information retrievalEnterprise Information Systems10.1080/17517575.2013.8799238:4(464-489)Online publication date: 1-Jul-2014
  • (2013)QUASL: A framework for question answering and its Application to business intelligenceIEEE 7th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2013.6577686(1-12)Online publication date: May-2013
  • (2013)Large Scale Sequential Learning from Partially Labeled DataProceedings of the 2013 IEEE Seventh International Conference on Semantic Computing10.1109/ICSC.2013.39(176-183)Online publication date: 16-Sep-2013
  • (2013)Organization oriented web search management2013 6th International Conference on Information Management, Innovation Management and Industrial Engineering10.1109/ICIII.2013.6703569(274-277)Online publication date: Nov-2013
  • (2013)Entity-Centric Search for Enterprise ServicesProceedings of the 11th International Conference on Service-Oriented Computing - Volume 827410.1007/978-3-642-45005-1_28(404-412)Online publication date: 2-Dec-2013
  • (2012)Leveraging Semantic Web Technologies for Enterprise Information IntegrationNTT Technical Review10.53829/ntr201208ra110:8(29-35)Online publication date: Aug-2012
  • (2012)A Cooperative Co-learning Approach for Concept Detection in DocumentsProceedings of the 2012 IEEE Sixth International Conference on Semantic Computing10.1109/ICSC.2012.32(310-317)Online publication date: 19-Sep-2012
  • (2012)Semantic-Based Composite Document RankingProceedings of the 2012 IEEE Sixth International Conference on Semantic Computing10.1109/ICSC.2012.28(126-129)Online publication date: 19-Sep-2012
  • (2012)InterOnto – Ranking Inter-Ontology LinksData Integration in the Life Sciences10.1007/978-3-642-31040-9_2(5-20)Online publication date: 2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EPUB

View this article in ePub.

ePub

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media