Key Points
Visual objects in the real world are seen in contextual scenes. These contexts are usually coherent in terms of their physical and semantic content, and they usually occur in typical configurations. Objects can be used to make predictions about probable contexts and about other objects that might be found in the same scene, and contexts can be used to inform the identification of individual objects. A full understanding of object recognition must include a consideration of contextual and associative influences.
'Context frames' might be used as structures of prototypical contexts that represent information about the identity of, and relationships between, objects that are likely to be present in each context (for example, a prototypical bathroom would contain a sink and a mirror, with the mirror typically set above the sink).
These context frames can be viewed as sets of expectations that are derived from exposure to real-world scenes. During recognition, a single object can activate appropriate context frames, and context frames can activate representations of expected objects. Scenes and individual objects can facilitate identification of each other and of other objects that are expected to occur in the same context.
To be useful for facilitating object recognition, the gist of a scene must be extracted and rapidly processed. This rapid extraction might rely on global cues conveyed by low spatial frequencies in an image, with higher spatial frequencies providing details gradually and slowly.
Structures within the medial temporal lobe are thought to be important for associative processing. The prefrontal and retrosplenial cortex also seem to be important for processing contextual information. I propose that the parahippocampal cortex serves as a switchboard-like multiplexer that connects the representations of individual objects in the inferior temporal cortex, according to typical associations represented in context frames.
In the proposed model, a blurred, low-frequency representation of a scene is projected rapidly from the visual cortex to the parahippocampal areas, and a context frame is activated on the basis of an experience-based guess. This context frame activates associated representations of objects in the inferior temporal cortex. Simultaneously, the low-frequency image of a fixated object in the scene is also projected rapidly to the prefrontal cortex, which sensitizes the representations of objects that resemble the fixated object. In the inferior temporal cortex, these two sets of objects intersect and the object can be identified.
The proposed model accounts for many existing findings, and produces testable predictions about the contextual facilitation of object recognition.
We see the world in scenes, where visual objects occur in rich surroundings, often embedded in a typical context with other related objects. How does the human brain analyse and use these common associations? This article reviews the knowledge that is available, proposes specific mechanisms for the contextual facilitation of object recognition, and highlights important open questions. Although much has already been revealed about the cognitive and cortical mechanisms that subserve recognition of individual objects, surprisingly little is known about the neural underpinnings of contextual analysis and scene perception. Building on previous findings, we now have the means to address the question of how the brain integrates individual elements to construct the visual experience.
Supported by the National Institute of Neurological Disorders and Stroke, the James S. McDonnell Foundation (21st Century Science Research Award in Bridging Brain, Mind and Behavior) and the MIND Institute.
The level of abstraction that carries the most information, and at which objects are typically named most readily. For example, subjects would recognize an Australian Shepherd as a dog (that is, basic-level) more easily than as an animal (that is, superordinate-level) or as an Australian Shepherd (that is, subordinate-level).
An experience-based facilitation in perceiving a physical stimulus. In a typical object priming experiment, subjects are presented with stimuli (the primes) and their performance in object naming is recorded. Subsequently, subjects are presented with either the same stimuli or stimuli that have some defined relationship to the primes. Any stimulus-specific difference in performance is taken as a measure of priming.
(MEG). A non-invasive technology for functional brain mapping, which provides superior millisecond temporal resolution. It measures magnetic fields generated by electric currents from active neurons in the brain. By localizing the sources of these currents, MEG is used to reveal cortical function.
Originally described as a negative deflection in the event-related potential waveform occurring approximately 400 ms following the onset of contextually incongruent words in a sentence. It has consistently been linked to semantic processing. Although it is probably one of the best neural signatures of contextual processing, its exact functional significance has yet to be elucidated.
Use a priori probability distributions derived from experience to infer optimal expectations. They are based on Bayes' theorem, which can be seen as a rule for taking into account history information to produce a number representing the probability that a certain hypothesis is true.
Builds on Hebb's learning rule that the connections between two neurons will strengthen if the neurons fire simultaneously. The original Hebbian rule has serious limitations, but it is used as the basis for more powerful learning rules. From a neurophysiological perspective, Hebbian learning can be described as a mechanism that increases synaptic efficacy as a function of synchrony between pre- and postsynaptic activity.
Cite this article
Bar, M. Visual objects in context. Nat Rev Neurosci 5, 617–629 (2004).
