[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Picture semantics for integrating text and diagram input

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The saying ‘a picture is worth a thousand words’ exemplifies the great value of pictures in describing a scenario. Pictures convey spatial information in a compact form, allowing textual descriptions to concentrate on the non-spatial (henceforth, contextual) properties of objects. The difficult task in integrating text and diagrammatic input to a system is to establish coreference — matching object references in the text to objects in the diagram.

We show that the coreference problem can be greatly simplified if limited contextual information can be provided directly in diagrams. We present a methodology, the Picture Semantics description language, for associating contextual information with objects drawn through graphical editors. Then, we describe our implemented research tool, the Figure Understander, which uses this methodology to integrate the differing information in text and graphically-drawn diagrammatic input into a single unified knowledge base description. We illustrate the utility of our methods through examples from two independent domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Ballard, D.H. and Brown, C.M. (1982). Computer Vision, Prentice Hall.

  • Chang, S. K., Shi, Q. Y., and Yan, C. W. (1987). Iconic Indexing by 2-D Strings IEEE Transactions on Pattern Recognition and Machine Intelligence, 9: 413–428.

    Google Scholar 

  • Crawford, J. and Kuipers, B. (1991). Algernon — A Tractable System for Knowledge-Representation. In Working Notes of The AAAI Spring Symposium on Implemented Knowledge Representation and Reasoning Systems. Palo Alto, CA: American Association for Artificial Intelligence.

    Google Scholar 

  • Feiner, S. and McKeown, K. (1990). Coordinating Text and Graphics in Explanation Generation. In Proceedings of The Eighth National Conference on Artificial Intelligence, 442–449, Boston, MA: American Association for Artificial Intelligence.

    Google Scholar 

  • Freksa, C. (1992). Using Orientation Information for Qualitative Spatial Reasoning. In Frank, A., Campari, I., and Formentini, U. (Eds.) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, 162–178, Springer-Verlag: Berlin.

    Google Scholar 

  • Gapp, K. (1994). Basic Meanings of Spatial Relations: Computation and Evaluation in 3D Space. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1393–1398, Seattle, WA: American Association for Artificial Intelligence.

    Google Scholar 

  • He, S., Abe, N., and Kitahashi, T. (1994) Assembly Plan Generation by Integrating Pictorial and Textual Information in an Assembly Illustration. In PaulMc, Kevitt (Ed.) Working Notes of The AAAI-94 Workshop on Integration of Natural Language and Vision Processing, 66–73, Seattle, WA: American Association for Artificial Intelligence.

    Google Scholar 

  • Herskovits, A. (1985). Semantics and Pragmatics of Locative Expressions. Cognitive Science 9: 341–378.

    Google Scholar 

  • Jungert, E. (1992). The Observer's Point of View: An Extension of Symbolic Projections. In Frank, A., Campari, I., and Formentini, U. (Eds.) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, 179–195, Springer-Verlag: Berlin.

    Google Scholar 

  • Kuipers, B. (1994) Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. MIT Press: Cambridge, MA.

    Google Scholar 

  • Landau, B. and Jackendoff, R. (1993). “What” and “Where” in Spatial Language and Spatial Cognition”. Behavioral and Brain Sciences 16: 217–265.

    Google Scholar 

  • Larkin, J. and Simon, H. (1987). Why a Diagram is (Sometimes) Worth 10,000 Words. Cognitive Science 11: 65–99.

    Google Scholar 

  • Latecki, L. and Pribbenow, S. (1992). On Hybrid Reasoning for Processing Spatial Expressions. In Proceedings of The Tenth European Conference on Artificial Intelligence, 389–393, Vienna: European Coordinating Committee for Artificial Intelligence.

    Google Scholar 

  • Maass, W. (1994). From Visual Perception to Multimodal Communication: Incremental Route Descriptions. AI Review Journal, 8.

  • McKevitt, P. (Ed.) (1994) Working Notes of the AAAI Workshop on Integration of Natural Language and Vision Processing. American Association for Artificial Intelligence: Menlo Park, CA.

    Google Scholar 

  • Mukerjee, A. and Joe, G. (1990). A Qualitative Model for Space. In Proceedings of The Eighth National Conference on Artificial Intelligence, Boston, MA: American Association for Artificial Intelligence.

    Google Scholar 

  • Narayanan, N. Hari (Ed.) (1992). Working Notes of the AAAI Spring Symposium Series, Symposium: Reasoning with Diagrammatic Representations. American Association for Artificial Intelligence: Menlo Park, CA.

    Google Scholar 

  • Narayanan, N. Hari, Suwa, M., and Motoda, H. (1994). How Things Appear to Work: Predicting Behaviors from Device Diagrams. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1161–1166, Seattle, WA: American Association for Artificial Intelligence.

    Google Scholar 

  • Nielsen, P. (1988). A Qualitative Approach to Mechanical Constraint. In Proceedings of The Seventh National Conference on Artificial Intelligence, 270–274, Saint Paul, MN: American Association for Artificial Intelligence.

    Google Scholar 

  • Novak, G. S., and Bulko, W. (1993). Diagrams and Text as Computer Input. Journal of Visual Languages and Computing 4: 161–175.

    Google Scholar 

  • Olivier, P., Maeda, T., and Tsujii, J. (1994). Automatic Depiction of Spatial Descriptions In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1405–1410, Seattle, WA: American Association for Artificial Intelligence.

    Google Scholar 

  • Rajagopalan, R. (1994a). A Model for Integrated Spatial and Dynamic Reasoning about Physical Systems. In Proceedings of The Twelfth National Conference on Artificial Intelligence, 1411–1417, Seattle, WA: American Association for Artificial Intelligence.

    Google Scholar 

  • Rajagopalan, R. (1994b). The Figure Understander: A Tool for the Integration of Text and Graphical Input to a Knowledge Base. In Proceedings of The Sixth IEEE International Conference on Tools with Artificial Intelligence, 80–87, New Orleans, LA: IEEE Computer Society.

    Google Scholar 

  • Rajagopalan, R. (1995). Qualitative Reasoning about Dynamic Change in the Spatial Properties of a Physical System. Ph.D. Diss., Department of Computer Sciences, University of Texas at Austin, Austin, TX.

  • Rajagopalan, R. and Kuipers, B. (1994). The Figure Understander: A System for Integrating Text and Diagram Input to a Knowledge Base. In Proceedings of The Seventh International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 211–220, Austin, TX: International Society of Applied Intelligence.

    Google Scholar 

  • Resnick, J., and Halliday, D. (1988) Fundamentals of Physics, John Wiley and Sons: New York.

    Google Scholar 

  • Retz-Schmidt, G. (1988). Various Views on Spatial Prepositions. AI magazine 9: 95–105.

    Google Scholar 

  • Rowe, N. and Guglielmo, E. (1993). Exploiting captions in retrieval of multimedia data. Information Processing and Management 29: 453–461.

    Google Scholar 

  • Srihari, R. (1994). Use of Captions and other Collateral Text in Understanding Photographs. AI Review Journal 8: 349–369.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajagopalan, R. Picture semantics for integrating text and diagram input. Artif Intell Rev 10, 321–344 (1996). https://doi.org/10.1007/BF00127685

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00127685

Key words

Navigation