Abstract
This research project is a contribution to the global field of information discovery in digital documents. We aim to provide the user with a tool for flexible access to the contents of digital documents: a text browsing facility inspired by traditional “back-of-the-book” style indexes. It gives at a glance the main topics discussed in the document, and presents certain kinds of relationships between these topics. These are captured automatically by exploiting certain lexical classes. Previous research on this and similar topics is reviewed, followed by the main characteristics of a research prototype, which relies on modeling of professionally produced indexes. Experimental results are presented, as well as remaining hurdles and potential applications.
This research is funded by a grant from the Natural Science and Engineering Research Council of Canada.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aït El Mekki, T., Nazarenko, A.: Une mesure de pertinence pour le tri de l’information dans un index de “fin de livre”. In: TALN 2004, Fès, April 19-21 (2004) (accessed 2004/6/15), http://www.lpl.univ-aix.fr/jep-taln04/proceed/actes/taln2004-Fez/AitElMekki-Nazarenko.pdf
Anick, P., Tipirneni, S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Hearst, M., Gey, F., Tong, R. (eds.) Proceedings on the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 153–159 (1999)
Artandi, S.: Book indexing by computer, S.S. Artandi, New Brunswick, N.J (1963)
Baker, D.: Stargazers look for life. South Magazine 117, 76–77 (1990)
Da Sylva, L.: A Document Browsing Tool Based on Book Indexes. In: Proceedings of Computational Linguistics in the North East (CliNE 2004), Concordia University, Montréal, pp. 45–52 (2004)
Da Sylva, L.: Relations sémantiques pour l’indexation automatique. Définition d’objectifs pour la détection automatique. Document numérique, Numéro spécial Fouille de textes et organisation de documents 8(3), 135–155 (2004)
Earl, L.L.: Experiments in automatic extraction and indexing. Information Storage and Retrieval 6, 313–334 (1970)
Fetters, L.K.: Handbook of Indexing Techniques: a Guide for Beginning Indexers, American Society of Indexers, Port Aransas, TX (1994)
Hearst, M.: TextTiling: Segmenting Text into Multi-Paragraph Subtopic Passages. Computational Linguistics 23(1), 33–64 (1997)
Hernandez, N., Grau, B.: What is this text about? Combining topic and meta descriptors for text structure presentation. In: Proceedings of the 21st annual international conference on Documentation (ACM SIGDOC), San Francisco, October 12-15, pp. 117–124 (2003)
Jones, S., Paynter, G.W.: Human Evaluation of Kea, an Automatic Keyphrasing System. In: Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 148–156 (2001)
Klement, S.: Open-system versus closed-system indexing. The Indexer 23(1), 23–31 (2002)
Lawrie, D., Croft, B.: Finding Topic Words for Hierarchical Summarization. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, pp. 349–357 (2001)
Lawrie, D., Croft, B.: Discovering and Comparing Topic Hierarchies. In: RIAO 2000, pp. 314–330 (2000)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Mulvany, N.: Indexing books. University of Chicago Press, Chicago (1994)
Nevill-Manning, C.G., Witten, I.H., Paynter, G.W.: Lexically-generated subject hierarchies for browsing large collections. International Journal of Digital Libraries 2(2/3), 111–123 (1999)
Ogden, C.K.: Basic English: A General Introduction with Rules and Grammar. Paul Treber & Co., Ltd, London (1930, 1940)
Vinokourov, A., Girolami, M.: A Probabilistic Hierarchical Clustering Method for Organising Collections of Text Documents. In: Proceedings of the 15thInternational Conference on Pattern Recognition (ICPR 2000), Barcelona, pp. 182–185 (2000)
Waller, S.: L’analyse documentaire. Une approche méthodologique, ADBS edn., Paris (1999)
Yaari, Y.: NLP-assisted exploration of texts (2000), http://citeseer.ist.psu.edu/412683.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Da Sylva, L., Doll, F. (2005). A Document Browsing Tool: Using Lexical Classes to Convey Information. In: Kégl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_33
Download citation
DOI: https://doi.org/10.1007/11424918_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25864-3
Online ISBN: 978-3-540-31952-8
eBook Packages: Computer ScienceComputer Science (R0)