Abstract
The saying ‘a picture is worth a thousand words’ exemplifies the great value of pictures in describing a scenario. Pictures convey spatial information in a compact form, allowing textual descriptions to concentrate on the non-spatial (henceforth, contextual) properties of objects. The difficult task in integrating text and diagrammatic input to a system is to establish coreference — matching object references in the text to objects in the diagram.
We show that the coreference problem can be greatly simplified if limited contextual information can be provided directly in diagrams. We present a methodology, the Picture Semantics description language, for associating contextual information with objects drawn through graphical editors. Then, we describe our implemented research tool, the Figure Understander, which uses this methodology to integrate the differing information in text and graphically-drawn diagrammatic input into a single unified knowledge base description. We illustrate the utility of our methods through examples from two independent domains.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ballard, D.H. and Brown, C.M. (1982). Computer Vision, Prentice Hall.
Chang, S. K., Shi, Q. Y., and Yan, C. W. (1987). Iconic Indexing by 2-D Strings IEEE Transactions on Pattern Recognition and Machine Intelligence, 9: 413–428.
Crawford, J. and Kuipers, B. (1991). Algernon — A Tractable System for Knowledge-Representation. In Working Notes of The AAAI Spring Symposium on Implemented Knowledge Representation and Reasoning Systems. Palo Alto, CA: American Association for Artificial Intelligence.
Feiner, S. and McKeown, K. (1990). Coordinating Text and Graphics in Explanation Generation. In Proceedings of The Eighth National Conference on Artificial Intelligence, 442–449, Boston, MA: American Association for Artificial Intelligence.
Freksa, C. (1992). Using Orientation Information for Qualitative Spatial Reasoning. In Frank, A., Campari, I., and Formentini, U. (Eds.) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, 162–178, Springer-Verlag: Berlin.
Gapp, K. (1994). Basic Meanings of Spatial Relations: Computation and Evaluation in 3D Space. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1393–1398, Seattle, WA: American Association for Artificial Intelligence.
He, S., Abe, N., and Kitahashi, T. (1994) Assembly Plan Generation by Integrating Pictorial and Textual Information in an Assembly Illustration. In PaulMc, Kevitt (Ed.) Working Notes of The AAAI-94 Workshop on Integration of Natural Language and Vision Processing, 66–73, Seattle, WA: American Association for Artificial Intelligence.
Herskovits, A. (1985). Semantics and Pragmatics of Locative Expressions. Cognitive Science 9: 341–378.
Jungert, E. (1992). The Observer's Point of View: An Extension of Symbolic Projections. In Frank, A., Campari, I., and Formentini, U. (Eds.) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, 179–195, Springer-Verlag: Berlin.
Kuipers, B. (1994) Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. MIT Press: Cambridge, MA.
Landau, B. and Jackendoff, R. (1993). “What” and “Where” in Spatial Language and Spatial Cognition”. Behavioral and Brain Sciences 16: 217–265.
Larkin, J. and Simon, H. (1987). Why a Diagram is (Sometimes) Worth 10,000 Words. Cognitive Science 11: 65–99.
Latecki, L. and Pribbenow, S. (1992). On Hybrid Reasoning for Processing Spatial Expressions. In Proceedings of The Tenth European Conference on Artificial Intelligence, 389–393, Vienna: European Coordinating Committee for Artificial Intelligence.
Maass, W. (1994). From Visual Perception to Multimodal Communication: Incremental Route Descriptions. AI Review Journal, 8.
McKevitt, P. (Ed.) (1994) Working Notes of the AAAI Workshop on Integration of Natural Language and Vision Processing. American Association for Artificial Intelligence: Menlo Park, CA.
Mukerjee, A. and Joe, G. (1990). A Qualitative Model for Space. In Proceedings of The Eighth National Conference on Artificial Intelligence, Boston, MA: American Association for Artificial Intelligence.
Narayanan, N. Hari (Ed.) (1992). Working Notes of the AAAI Spring Symposium Series, Symposium: Reasoning with Diagrammatic Representations. American Association for Artificial Intelligence: Menlo Park, CA.
Narayanan, N. Hari, Suwa, M., and Motoda, H. (1994). How Things Appear to Work: Predicting Behaviors from Device Diagrams. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1161–1166, Seattle, WA: American Association for Artificial Intelligence.
Nielsen, P. (1988). A Qualitative Approach to Mechanical Constraint. In Proceedings of The Seventh National Conference on Artificial Intelligence, 270–274, Saint Paul, MN: American Association for Artificial Intelligence.
Novak, G. S., and Bulko, W. (1993). Diagrams and Text as Computer Input. Journal of Visual Languages and Computing 4: 161–175.
Olivier, P., Maeda, T., and Tsujii, J. (1994). Automatic Depiction of Spatial Descriptions In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1405–1410, Seattle, WA: American Association for Artificial Intelligence.
Rajagopalan, R. (1994a). A Model for Integrated Spatial and Dynamic Reasoning about Physical Systems. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1411–1417, Seattle, WA: American Association for Artificial Intelligence.
Rajagopalan, R. (1994b). The Figure Understander: A Tool for the Integration of Text and Graphical Input to a Knowledge Base. In Proceedings of The Sixth IEEE International Conference on Tools with Artificial Intelligence, 80–87, New Orleans, LA: IEEE Computer Society.
Rajagopalan, R. (1995). Qualitative Reasoning about Dynamic Change in the Spatial Properties of a Physical System. Ph.D. Diss., Department of Computer Sciences, University of Texas at Austin, Austin, TX.
Rajagopalan, R. and Kuipers, B. (1994). The Figure Understander: A System for Integrating Text and Diagram Input to a Knowledge Base. In Proceedings of The Seventh International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 211–220, Austin, TX: International Society of Applied Intelligence.
Resnick, J., and Halliday, D. (1988) Fundamentals of Physics, John Wiley and Sons: New York.
Retz-Schmidt, G. (1988). Various Views on Spatial Prepositions. AI magazine 9: 95–105.
Rowe, N. and Guglielmo, E. (1993). Exploiting captions in retrieval of multimedia data. Information Processing and Management 29: 453–461.
Srihari, R. (1994). Use of Captions and other Collateral Text in Understanding Photographs. AI Review Journal 8: 349–369.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rajagopalan, R. Picture semantics for integrating text and diagram input. Artif Intell Rev 10, 321–344 (1996). https://doi.org/10.1007/BF00127685
Issue Date:
DOI: https://doi.org/10.1007/BF00127685