LZ conceived and designed the experiments, built the model, and wrote the paper. LJ performed the experiments. LZ and LJ analyzed the data.
The authors have declared that no competing interests exist.
Visual object recognition and sensitivity to image features are largely influenced by contextual inputs. We study influences by contextual bars on the bias to perceive or infer the presence of a target bar, rather than on the sensitivity to image features. Human observers judged from a briefly presented stimulus whether a target bar of a known orientation and shape is present at the center of a display, given a weak or missing input contrast at the target location with or without a context of other bars. Observers are more likely to perceive a target when the context has a weaker rather than stronger contrast. When the context can perceptually group well with the would-be target, weak contrast contextual bars bias the observers to perceive a target relative to the condition without contexts, as if to fill in the target. Meanwhile, high-contrast contextual bars, regardless of whether they group well with the target, bias the observers to perceive no target. A Bayesian model of visual inference is shown to account for the data well, illustrating that the context influences the perception in two ways: (1) biasing observers' prior belief that a target should be present according to visual grouping principles, and (2) biasing observers' internal model of the likely input contrasts caused by a target bar. According to this model, our data suggest that the context does not influence the perceived target contrast despite its influence on the bias to perceive the target's presence, thereby suggesting that cortical areas beyond the primary visual cortex are responsible for the visual inferences.
We study how visual perception of a target bar can be biased by contextual bars in the image, and how a Bayesian model of object inference can account for the data. Human observers are more likely to perceive a target bar when the contextual contrast, i.e., the luminance difference between the contextual bars and background, is weaker rather than stronger. Relative to the situation without the context, they are biased to perceive the target in a context of weak contrast when the target can perceptually group well with the context, as if the context fills in the target. Meanwhile, they are biased not to perceive the target in a context of strong contrast, as if the context suppresses the perception, regardless of whether it could perceptually group well with the would-be target. The Bayesian model illustrates that the context influences the perception by biasing (1) observers' prior belief that a target should be present and (2) observers' internal model of the likely input contrasts from a target bar. Our data suggest that brain areas beyond the primary visual cortex along the visual pathway are responsible for inferring object causes for input images.
Visual inputs are first represented in early visual stages such as retina and the primary visual cortex (V1), such that input features such as local color, orientation, luminance contrast, and spatial scale of image patches are encoded by the activities of retinal and V1 neurons with various input sensitivities. The neural representation of inputs is then used by the brain to infer the possible objects in the 3-D scene causing the 2-D input images. For instance, from V1′s responses to the luminance edges in
(A) and (B) show two images containing the same white patch, and (C) and (D) show the two possible inferred objects in the scene causing this white patch. The inferred causes for any particular input image patch is not unique, although some inferences are more likely than others. The difference in the most likely inferred object for the same image patch in (A) and (B) demonstrates that inference could be greatly influenced by the image context.
Visual inference from any part of the input is often influenced by the contextual input. For instance, the more likely cause for the white patch in
We are interested in contextual influences in inference of objects from images, focusing in this paper on the perception in the spatial context of other inputs. Most previous studies on influences by spatial context used quite complex inputs such as photographs of everyday scenes [
The previous studies used the stimuli of bars to probe input sensitivities by the two-alternative forced choice (2AFC) design. In contrast, we probe perceptual biases by a yes–no design. In each trial of the 2AFC design, two brief intervals of the stimuli are presented: both intervals contain the same contextual input but only one contains the target, and the observer has to answer which interval contains the target. The input sensitivity is inversely linked with the minimum target input (contrast) necessary to enable about 80% of the responses by the observers to be correct. It has long been known [
We report in this paper that our study, using the bar stimuli and the yes–no task, revealed how visual contexts influence the perception of the target bar through a Bayesian inference and decision process. In particular, quite unexpectedly from the finding of colinear facilitation of input sensitivities revealed neurophysiologically and behaviorally (by the 2AFC task), we found that weaker colinear contexts induce stronger biases to fill-in the missing target. In the framework of a model of the Bayesian process, our data suggest that contextual facilitation or suppression of input sensitivities plays no role in the inference probed by our task, and hence the neural substrate responsible for this inference is more likely beyond V1. In the rest of the Introduction, we formulate the Bayesian model applied to our yes–no task. The Results section then presents our experiments probing the contextual influences in human inference behavior and the fit of our data by the Bayesian model. The Discussion section will summarize the findings with discussions.
The Bayesian inference and decision process applied to our task is formulated as follows [
In the Bayesian model, the inferred probability
Note that
Both
(A) Schematics of perceiving a weak vertical target bar in three different contexts. Colinear contexts give a higher prior belief
(B) the probability
(C,D) Effects of the contextual contrast
Without the context
One may think of
Filling-in, which occurs when
The Bayesian inference described above predicts in particular: (1) a weak context encourages filling-in of the visual target object when it is consistent or easily grouped with the target, i.e.,
In the experiments, human observers were asked to answer whether or not they perceive the target by pressing a button. They were informed that the target when present was a nearly visible vertical bar at the center of the fixation array, and that they should make their judgments according to the target alone regardless of the context. We only used naive observers to minimize any systematic bias not related to the contextual stimuli. In each trial, the particular target and contextual (contrast and configuration) condition was unpredictably chosen among all conditions within an experiment.
In experiment 1, the context has 10 colinear bars on each side of the target bar (
We found that (
The data points are the mean over six observers, and the error bars indicate the standard errors of the means (SEMs). On average and relative to the no-context condition, the weaker colinear contexts
The mean yes rates with the context are 86% ± 4%, 63% ± 6%, and 32% ± 5%, respectively, for
The adequacy of the Bayesian model is demonstrated by its reasonable fit to the data from the three non-zero contextual contrast conditions, using only three parameters
Note that fitting the yes rate data for the no context condition by the Bayesian model would require two additional parameters,
The higher yes rates under weaker contextual contrasts
Experiment 2 was based on
(A,B) Yes rates under colinear and orthogonal context (schematically like
(C,D) Yes rates under different contextual contrast
(E) Prior
The data can be fitted by the Bayesian model for the four yes rate curves (two configurations × two contextual contrasts) using only four parameters:
Experiment 3 shows that even subtle differences in contextual configuration can manifest in different biases in inferences in ways consistent with the Bayesian account. It is like Experiment 2, but with three colinear context: one is 2-sided which is the one in Experiment 1, removing contextual bars from one end of the target gives the 1-sided context, while removing every alternate contextual bar gives the sparce context, see
(A) The schematics of the stimuli.
(B–D) Yes rates (with SEM error bars) averaged over seven observers. The 2-sided context gives higher yes rates than other contexts for
(A–C) The red, magenta, and blue curves and data points indicate respective quantities associated with different contextual contrasts
(D) The prior
Using simple visual stimuli of bars familiar in psychophysical and physiological studies of
We show additionally that these findings, unexpected from previous findings of contextual facilitation on input sensitivities, can be accounted for by a Bayesian inference and decision model. The model assumes that the perception results from an inference of the posterior probability
The filling-in and suppression of the target respectively in our study is not unlike the visual assimilation and contrast respectively in the perception of brightness [
The findings in higher level vision [
Compared with most of the previous studies on the influences by the spatial context, our study uses simpler stimuli that can be more easily or quantitatively manipulated and described. Consequently, we not only model our data using a simple Bayesian inference and decision model, but also use this model to deduce that, at least in inference, the underlying neural mechanisms do not cause contextual facilitation or suppression of input sensitivities observed at the visual encoding stage [
Context can change sensitivity to input bars (or bar like elements such as gabors) as manifested behaviorally in 2AFC tasks for target detection [
In previous studies of contextual influence on visual inferences, researchers probed perception by asking the observers to report the appearance, e.g., color and motion direction, of the stimuli. Our study may seem different by asking for reports of whether the target is perceived or not, rather than the appearance, e.g., apparent contrast. However, in essence, the question of “whether you perceive the target or not” is not unlike a question “whether the luminance profile at this location appears as if it is caused by a target or by noise,” which probes the appearance of the perception evoked by the input at the image location concerned. If we had instead asked for reports of apparent contrast, these reports may or may not directly reflect the process of
It is in principle possible that the bias in the observers' reports did not arise from the inference stage (which gives
One may wonder whether the sensitivities in the 2AFC task could be derived as the derivatives of the psychometric function (the yes rate) observed in our yes–no task using the same stimuli [
In our model, the parameters
Our observers seemed unconsciously to use prior beliefs induced by context, despite our instructions informing them that the context was irrelevant to the task. Furthermore, they could quickly switch from one prior to another as the context changes from one trial to another. However, these different priors are only different from the perspective of the target alone. When combining target and context as a whole, the joint prior probability of the visual input in principle arises from the same underlying probability distribution [
The stimuli were shown on a gamma-corrected 21 inch Sony GDM-F520 monitor using 14-bits luminance resolution. The viewing distance was 67.6 cm, and the screen width was 40 centimeters. All stimulus (target or contextual) bars were rectangular shapes of 0.9° × 0.165° in visual angle, with a luminance
Each observer was between 18–40 years old, had normal or corrected-to-normal vision, and participated in only one experment. The experiments were carried out in a dimly lit room. Each trial began with the fixation display for 500 ms, followed by the test stimulus display for 80 ms together with an auditory beep, which is then followed by the fixation display which stayed on waiting for observers' button press response to indicate whether they perceived the target or not in the trial. No feedbacks were given regarding whether their responses were correct. The next trial started 800 ms after the button press. A total of 20 randomly selected trials were performed before data collection for each observer before each session. Each experimental session randomly interleaved different stimulus conditions, such that the observers could not predict beyond chance the target contrast
Here we formulate our Bayesian inference and decision model in more detail. In a single trial,
If the observer responds “yes” or “no,” the probability of error is 1 −
The posterior probability
While an Ansatz is typically justified by its suitability in accounting for the data, as demonstrated in the main text for the Ansatz above, here we provide some motivations behind this Ansatz. For ease of presentation, we abbreviate the integration
The model
Other additional variabilities, such as the perceived locations of the stimulus, would behave analogously to the internal variables
Since
We thank Joshua A. Solomon and Mike Morgan for very helpful discussions and help on references, two anonymous reviewers for their constructive comments, and Peter Dayan for reading the manuscript and comments.