Abstract
This paper presents a knowledge-based approach to managing and retrieving personal documents. The dual document models consist of a document type hierarchy and a folder organization. The document type hierarchy is used to capture the layout, logical and conceptual structures of documents. The folder organization mimics the user's real-world document filing system for organizing and storing documents in an office environment. Predicate-based representation of documents is formalized for specifying knowledge about documents. Document filing and retrieval are predicate-driven. The filing criteria for the folders, which are specified in terms of predicates, govern the grouping of frame instances, regardless of their document types. We incorporated the notions of document type hierarchy and folder organization into the multilevel architecture of document storage. This architecture supports various text-based information retrieval techniques and content-based multimedia information retrieval techniques. The paper also proposes a knowledge-based query-preprocessing algorithm, which reduces the search space. For automating the document filing and retrieval, a predicate evaluation engine with a knowledge base is proposed. The learning agent is responsible for acquiring the knowledge needed by the evaluation engine.
Similar content being viewed by others
References
N. Bianchi, P. Mussio, M. Padula, and G. R. Rinaldi. Multimedia Document Management: An Anthropocentric Approach. Information Processing & Management, 32(3): 287–303, 96.
A. Celentano, M.G. Fugini, and S. Pozzi. Knowledge-Based Document Retrieval in Office Environments: The Kabiria System. ACM Transactions on Office Information Systems, 13(3): 237–268, July 1995.
S.S. Chen. Document Preprocessing and Fuzzy Unsupervised Character Classification. PhD thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, May 1995.
S. Cisco and J. Wertzberger. Indexing Digital Documents. Inform, 11(2): 12–20, Feb. 1997.
X. Fan, Q. Liu, and P. A. Ng. A Multimedia Document Filing System. In Proc. of the IEEE International Conference on Multimedia Computing and Systems, pages 492–499, Ottawa, Ontario, Canada, June 1997.
X. Hao. Automatic Office Document Classification and Information Extraction. PhD thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, Augest 1995.
W. Hu and G. Ritter. A Line String Image Representation for Image Storage and Retrieval. In Proc. of the International Conference on Multimedia Computing and Systems, pages 434–441, Ottawa, Ontario, Canada, June 1997.
D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast Pattern Matching in Strings. SIAM Journal of Computing, 6(2): 323–350, June 1977.
Q. Liu and P.A. Ng. Document Processing and Retrieval: Text Processing. Kluwer Academic Publishers, Norwell, MA, 1996.
C. Meghini, R. Fausto, and C. Thanos. Conceptual Modeling of Multimedia Document. Computer, 24(10): 23–29, 1991.
B.Di Nubila. Concept-Based Indexing and Retrieval of Multimedia Documents. Information Sciences, 20(3): 185–196, 94.
Esen Ozkarahan. Multimedia Document Retrieval. Information Processing& Management, 31(1): 113–131, 1995.
S. Pierre and H. Safa. Models for Storing and Presenting Multimedia Documents. Telematics and Informatics, 13(4): 233–250, 1996.
S. Pozzi and A. Celentano. Knowledge-Based Document Filing. IEEE Expert, pages 34–45, October 1993.
M. Snoeck and G. Dedene. Generalization/Specification and Role in Object Oriented Conceptual Modeling. Data and Knowledge Engineering, 19(2): 171–195, June 1996.
C.Y. Wang, Q. Liu, and P.A. Ng. Intelligent Browser for TEXPROS. In Proceeding of International Conference on Intelligent Information Systems Technology, pages 389–398, Grand Bahamas Island, The Bahamas, December 1997.
J.T.L. Wang and P.A. Ng. TEXPROS: An Intelligent Document Proceesing System. International Journal of Software Engineering and Knowledge Engineering, 15(4): 171–196, April 1992.
C. Wei. Knowledge Discovering for Document Classification Using Tree Matching in TEXPROS. PhD thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, May 1996.
R.J. Wirfs-Brock and R.E. Johnson. Surveying Current Research in Object-Oriented Design. Communications of the ACM, 33(9): 104–124, sept. 1990.
H. Yu and W. Wolf. A Visual Search System for Video and Image Database. In Proc. of the International Conference on Multimedia Computing and Systems, pages 517–524, Ottawa, Ontario, Canada, June 1997.
Z. Zhu, J.A. McHugh, J.T.L. Wang, and P.A. Ng. A Formal Approach to Modeling Office Information Systems. Journal of Systems Integration, 4(4): 373–403, December 1994.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fan, X., Ng, P.A. Personal Document Management and Retrieval: A Knowledge-Based Approach. Journal of Systems Integration 8, 287–312 (1998). https://doi.org/10.1023/A:1026461329174
Issue Date:
DOI: https://doi.org/10.1023/A:1026461329174