[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1298406.1298426acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
Article

Interactive thesaurus assessment for automatic document annotation

Published: 28 October 2007 Publication History

Abstract

The use of thesaurus-based indexing is a common approach for increasing the performance of document retrieval. With the growing amount of documents available, manual indexing is not a feasible option. Statistical methods for automated document indexing are an attractive alternative. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed is of crucial importance inautomatic indexing because there is no human in the loop that can spot and avoid indexing errors. We propose a method for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualization techniques that supports the detection of potential problems in a thesaurus. We describe this method and show its application in the context of two automatic indexing tasks. The examples show that the methods indeed eases the detection and correction of errors leading to a better indexing result. Please refer to http://www.kaiec.org for high resolution media of all figures used in this paper, as well as an animated presentation of the interactive tool.

References

[1]
M. Burkart. Grundlagen der praktischen Dokumentation und Information, Band 1, chapter Thesaurus, pages 151--153. Saur, 2004.
[2]
J. Calmet and A. Daemi. Assessing conflicts in ontologies. Technical report, IAKS Calmet, University Karlsruhe (TH), Germany, 2004.
[3]
J. Calmet and A. Daemi. From entropy to ontology. Technical report, Institute for Algorithms and Cognitive Systems (IAKS), University of Karlsruhe (TH), Germany, 2004.
[4]
J. Euzenat. Semantic precision and recall for ontology alignment evaluation. In IJCAI, pages 348--353, 2007.
[5]
U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI/IAAI, pages 524--531, 1998.
[6]
D. Lin. An information-theoretic definition of similarity. In Proc. 15th International Conf. on Machine Learning, pages 296--304.Morgan Kaufmann, San Francisco, CA, 1998.
[7]
D. Maynard,W. Peters, and Y. Li. Metrics for evaluation of ontology-based information extraction. In WWW 2006 Workshop on "Evaluation of Ontologies for the Web" (EON), Edinburgh, Scotland, 2006.
[8]
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI--95), 1995.
[9]
N. Seco, T. Veale, and J. Hayes. An intrinsic information content metric for semantic similarity in wordnet. In Proceedings of the 16th European Conference on Artificial Intelligence, pages 1089--1090. Valencia, Spain, 2004.
[10]
B. Shneiderman. Tree visualization with tree-maps: 2d space-filling approach. ACM Trans. Graph., 11(1):92--99, 1992.

Cited By

View all
  • (2018)Augmenting and structuring user queries to support efficient free-form code searchEmpirical Software Engineering10.1007/s10664-017-9544-y23:5(2622-2654)Online publication date: 1-Oct-2018
  • (2015)Keyword Extraction from Company Websites for the Development of Regional Knowledge MapsKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-662-46549-3_7(96-111)Online publication date: 25-Apr-2015
  • (2014)A new similarity measure for subject hierarchical structuresJournal of Documentation10.1108/JD-12-2012-016070:3(364-391)Online publication date: 6-May-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
K-CAP '07: Proceedings of the 4th international conference on Knowledge capture
October 2007
216 pages
ISBN:9781595936431
DOI:10.1145/1298406
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. suitability
  2. thesaurus-based retrieval
  3. visualization

Qualifiers

  • Article

Conference

K-CAP07
Sponsor:
K-CAP07: International Conference on Knowledge Capture 2007
October 28 - 31, 2007
BC, Whistler, Canada

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Augmenting and structuring user queries to support efficient free-form code searchEmpirical Software Engineering10.1007/s10664-017-9544-y23:5(2622-2654)Online publication date: 1-Oct-2018
  • (2015)Keyword Extraction from Company Websites for the Development of Regional Knowledge MapsKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-662-46549-3_7(96-111)Online publication date: 25-Apr-2015
  • (2014)A new similarity measure for subject hierarchical structuresJournal of Documentation10.1108/JD-12-2012-016070:3(364-391)Online publication date: 6-May-2014
  • (2013)An experiment in automatic indexing using the HASSET thesaurus2013 5th Computer Science and Electronic Engineering Conference (CEEC)10.1109/CEEC.2013.6659437(13-18)Online publication date: Sep-2013
  • (2011)User-Centered Maintenance of Concept HierarchiesOntology Learning and Knowledge Discovery Using the Web10.4018/978-1-60960-625-1.ch006(105-128)Online publication date: 2011
  • (2009)Tagging and automation: challenges and opportunities for academic librariesLibrary Hi Tech10.1108/0737883091100766427:4(557-569)Online publication date: 20-Nov-2009
  • (2008)SemtinelProceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries10.1145/1378889.1378972(425-425)Online publication date: 16-Jun-2008
  • (2008)Visual Analysis of Classification Systems and Library CollectionsResearch and Advanced Technology for Digital Libraries10.1007/978-3-540-87599-4_57(436-439)Online publication date: 2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media