Picture semantics for integrating text and diagram input

Raman Rajagopalan¹^nAff2

49 Accesses
2 Citations
Explore all metrics

Abstract

The saying ‘a picture is worth a thousand words’ exemplifies the great value of pictures in describing a scenario. Pictures convey spatial information in a compact form, allowing textual descriptions to concentrate on the non-spatial (henceforth, contextual) properties of objects. The difficult task in integrating text and diagrammatic input to a system is to establish coreference — matching object references in the text to objects in the diagram.

We show that the coreference problem can be greatly simplified if limited contextual information can be provided directly in diagrams. We present a methodology, the Picture Semantics description language, for associating contextual information with objects drawn through graphical editors. Then, we describe our implemented research tool, the Figure Understander, which uses this methodology to integrate the differing information in text and graphically-drawn diagrammatic input into a single unified knowledge base description. We illustrate the utility of our methods through examples from two independent domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Ballard, D.H. and Brown, C.M. (1982). Computer Vision, Prentice Hall.
Chang, S. K., Shi, Q. Y., and Yan, C. W. (1987). Iconic Indexing by 2-D Strings IEEE Transactions on Pattern Recognition and Machine Intelligence, 9: 413–428.
Google Scholar
Crawford, J. and Kuipers, B. (1991). Algernon — A Tractable System for Knowledge-Representation. In Working Notes of The AAAI Spring Symposium on Implemented Knowledge Representation and Reasoning Systems. Palo Alto, CA: American Association for Artificial Intelligence.
Google Scholar
Feiner, S. and McKeown, K. (1990). Coordinating Text and Graphics in Explanation Generation. In Proceedings of The Eighth National Conference on Artificial Intelligence, 442–449, Boston, MA: American Association for Artificial Intelligence.
Google Scholar
Freksa, C. (1992). Using Orientation Information for Qualitative Spatial Reasoning. In Frank, A., Campari, I., and Formentini, U. (Eds.) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, 162–178, Springer-Verlag: Berlin.
Google Scholar
Gapp, K. (1994). Basic Meanings of Spatial Relations: Computation and Evaluation in 3D Space. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1393–1398, Seattle, WA: American Association for Artificial Intelligence.
Google Scholar
He, S., Abe, N., and Kitahashi, T. (1994) Assembly Plan Generation by Integrating Pictorial and Textual Information in an Assembly Illustration. In PaulMc, Kevitt (Ed.) Working Notes of The AAAI-94 Workshop on Integration of Natural Language and Vision Processing, 66–73, Seattle, WA: American Association for Artificial Intelligence.
Google Scholar
Herskovits, A. (1985). Semantics and Pragmatics of Locative Expressions. Cognitive Science 9: 341–378.
Google Scholar
Jungert, E. (1992). The Observer's Point of View: An Extension of Symbolic Projections. In Frank, A., Campari, I., and Formentini, U. (Eds.) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, 179–195, Springer-Verlag: Berlin.
Google Scholar
Kuipers, B. (1994) Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. MIT Press: Cambridge, MA.
Google Scholar
Landau, B. and Jackendoff, R. (1993). “What” and “Where” in Spatial Language and Spatial Cognition”. Behavioral and Brain Sciences 16: 217–265.
Google Scholar
Larkin, J. and Simon, H. (1987). Why a Diagram is (Sometimes) Worth 10,000 Words. Cognitive Science 11: 65–99.
Google Scholar
Latecki, L. and Pribbenow, S. (1992). On Hybrid Reasoning for Processing Spatial Expressions. In Proceedings of The Tenth European Conference on Artificial Intelligence, 389–393, Vienna: European Coordinating Committee for Artificial Intelligence.
Google Scholar
Maass, W. (1994). From Visual Perception to Multimodal Communication: Incremental Route Descriptions. AI Review Journal, 8.
McKevitt, P. (Ed.) (1994) Working Notes of the AAAI Workshop on Integration of Natural Language and Vision Processing. American Association for Artificial Intelligence: Menlo Park, CA.
Google Scholar
Mukerjee, A. and Joe, G. (1990). A Qualitative Model for Space. In Proceedings of The Eighth National Conference on Artificial Intelligence, Boston, MA: American Association for Artificial Intelligence.
Google Scholar
Narayanan, N. Hari (Ed.) (1992). Working Notes of the AAAI Spring Symposium Series, Symposium: Reasoning with Diagrammatic Representations. American Association for Artificial Intelligence: Menlo Park, CA.
Google Scholar
Narayanan, N. Hari, Suwa, M., and Motoda, H. (1994). How Things Appear to Work: Predicting Behaviors from Device Diagrams. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1161–1166, Seattle, WA: American Association for Artificial Intelligence.
Google Scholar
Nielsen, P. (1988). A Qualitative Approach to Mechanical Constraint. In Proceedings of The Seventh National Conference on Artificial Intelligence, 270–274, Saint Paul, MN: American Association for Artificial Intelligence.
Google Scholar
Novak, G. S., and Bulko, W. (1993). Diagrams and Text as Computer Input. Journal of Visual Languages and Computing 4: 161–175.
Google Scholar
Olivier, P., Maeda, T., and Tsujii, J. (1994). Automatic Depiction of Spatial Descriptions In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1405–1410, Seattle, WA: American Association for Artificial Intelligence.
Google Scholar
Rajagopalan, R. (1994a). A Model for Integrated Spatial and Dynamic Reasoning about Physical Systems. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1411–1417, Seattle, WA: American Association for Artificial Intelligence.
Google Scholar
Rajagopalan, R. (1994b). The Figure Understander: A Tool for the Integration of Text and Graphical Input to a Knowledge Base. In Proceedings of The Sixth IEEE International Conference on Tools with Artificial Intelligence, 80–87, New Orleans, LA: IEEE Computer Society.
Google Scholar
Rajagopalan, R. (1995). Qualitative Reasoning about Dynamic Change in the Spatial Properties of a Physical System. Ph.D. Diss., Department of Computer Sciences, University of Texas at Austin, Austin, TX.
Rajagopalan, R. and Kuipers, B. (1994). The Figure Understander: A System for Integrating Text and Diagram Input to a Knowledge Base. In Proceedings of The Seventh International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 211–220, Austin, TX: International Society of Applied Intelligence.
Google Scholar
Resnick, J., and Halliday, D. (1988) Fundamentals of Physics, John Wiley and Sons: New York.
Google Scholar
Retz-Schmidt, G. (1988). Various Views on Spatial Prepositions. AI magazine 9: 95–105.
Google Scholar
Rowe, N. and Guglielmo, E. (1993). Exploiting captions in retrieval of multimedia data. Information Processing and Management 29: 453–461.
Google Scholar
Srihari, R. (1994). Use of Captions and other Collateral Text in Understanding Photographs. AI Review Journal 8: 349–369.
Google Scholar

Download references

Author information

Raman Rajagopalan
Present address: Intel Corporation, MS EY3-06, 5200 N.E. Elam Young Parkway, 97124-6497, Hillsboro, OR, USA

Authors and Affiliations

Department of Computer Sciences, University of Texas at Austin, 78712, Austin, Texas
Raman Rajagopalan

Authors

Raman Rajagopalan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajagopalan, R. Picture semantics for integrating text and diagram input. Artif Intell Rev 10, 321–344 (1996). https://doi.org/10.1007/BF00127685

Download citation

Issue Date: August 1996
DOI: https://doi.org/10.1007/BF00127685

Picture semantics for integrating text and diagram input

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Image Schemas and Conceptual Blending in Diagrammatic Reasoning: The Case of Hasse Diagrams

Graphs in Linguistics: Diagrammatic Features and Data Models

Diagrams and Nonmonotonic Logic: What Is the Cognitive Relation?

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

Navigation

Picture semantics for integrating text and diagram input

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Image Schemas and Conceptual Blending in Diagrammatic Reasoning: The Case of Hasse Diagrams

Graphs in Linguistics: Diagrammatic Features and Data Models

Diagrams and Nonmonotonic Logic: What Is the Cognitive Relation?

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now

Search

Navigation