Journal section: Behavioral/Systems/Cognitive Do object-category selective regions in the ventral visual stream represent perceived distance information? Elinor Amit1, Eyal Mehoudar 2, Yaacov Trope3, and Galit Yovel2 1Harvard University 2Tel-Aviv University 3New York University Address for correspondence: Elinor Amit Department of Psychology Harvard University William James Hall 1484 33 Kirkland Street Cambridge, MA 02138 Tel. 617-496-9636 Email: [email protected] Acknowledgments: We would like to thank Yonantan Douek and Vadim Axelrod for valuable help with fMRI data analysis, Peter Huy Wong for assisting with data collection and stimulus preparation of the psychophysical studies, Jonathan Oron for help with collection of eye tracking data, and Tejaswinhi Srinivas for proofreading. We also like to thank Russell Epstein for valuable comments. This work was funded by a grant from the Adams Super Center for Brain Research to G.Y. Abstract It is well established that scenes and objects elicit a highly selective response in specific brain regions in the ventral visual cortex. An inherent difference between these categories that has not been explored yet is their perceived distance from the observer (i.e. scenes are distal whereas objects are proximal). The current study aimed to test the extent to which scene and object selective areas are sensitive to perceived distance information independently from their category-selectivity and retinotopic location. We conducted two studies that used a distance illusion (i.e., the Ponzo lines) and showed that scene regions (the parahippocampal place area, PPA, and transverse occipital sulcus, TOS) are biased toward perceived distal stimuli, whereas the lateral occipital (LO) object region is biased toward perceived proximal stimuli. These results suggest that the ventral visual cortex plays a role in representing distance information, extending recent findings on the sensitivity of these regions to location information. More broadly, our findings imply that distance information is inherent to object recognition. Key words: ventral visual stream, distance, objects, scenes Running title: distance information in the ventral visual cortex 1. Introduction It is well established that scenes and objects elicit a highly selective response in specific brain regions in the ventral visual cortex (e.g., Kanwisher et al, 1996; Epstein and Kanwisher, 1998). Notably, those categories differ is their perceived distance from the observer (i.e. scenes are distal whereas objects are proximal). The goal of the current study is to test to what extent category selective areas in the ventral visual stream play a role in representing distance information, such that scene regions are biased towards distal information whereas object regions are biased towards proximal information, independently from their category-selectivity and retinotopic location. 1.1. Past Research on the Representation of Object Categories in the Ventral Visual Stream Neuroimaging studies in the past 15 years have established that certain object categories elicit a highly selective response in specific brain regions in the ventral visual stream (e.g., Kanwisher et al, 1996; Epstein and Kanwisher, 1998; Aguirre et al., 1998; for a recent review, see Op de Beeck et al, 2008). Such selective neural response has been reported for faces (Kanwisher & Yovel, 2006), places (Epstein and Kanwisher, 1998), body parts (Downing et al, 2001), and words or letter strings (Cohen et al, 2000). The underlying assumption of this line of research is that responses of category-selective regions in the ventral visual stream can be explained by higher-level object representations. For example, a “scene” contains information about the spatial structure of the local environment, and this triggers the response in scene regions (the parahippocampal place area, PPA, and transverse occipital sulcus, TOS, Epstein, 2005; Epstein et al, 2007). With respect to objects, Epstein (2005) suggested that objects are spatially compact entities that one acts upon, in contrast to scenes, which are spatially distributed entities that one acts within. Likewise, Mazer and Gallant (2000) suggested that in order to recognize a single object, constituent patches must be segmented from the background and grouped together into a coherent whole. In other words, in both Epstein and Mazer et al’s definitions, objects 2 are defined by their inverse relationships with background. Consistent with the research on the ventral stream, the ventral/dorsal stream theory (Ungeleider and Mishkin, 1982) suggests that as visual information exits the occipital lobe, it follows two main paths, or "streams." The dorsal stream terminates in the parietal lobe and represents the location of an object (the "where pathway") or an “action” towards the object (Goodale et al, 1991). The ventral stream in the temporal lobe is involved with object identification (the "what pathway"). Lesions to this region of the brain often produce difficulties in recognizing, identifying and naming different categories of objects (Habib & Sirigu, 1987; McNeil & Warrington, 1993; Moscovitch et al, 1997). 1.2. Biases in the Ventral Visual Stream Beyond Shape Recent studies have shown that at least some of the object category areas in the ventral “what” stream are sensitive not only to shape, but also to retinotopic location – that is, the location of a target relative to fixation (center vs. periphery). Specifically, it was found that scene-selective regions are biased toward stimuli presented in the upper visual field, whereas object-selective regions are biased toward stimuli presented in the lower visual field (Schwarzlose et al, 2008). Furthermore, object areas were found to be biased towards foveal information, whereas scene areas are biased towards peripheral information (Hasson, Levy, Behrmann, Hendler, & Malach, 2002). In addition to these retinotopic biases, recent studies also suggest that these category-selective regions are sensitive to the size of the stimuli. Specifically, it was shown that scene regions (PPA and TOS) are biased toward stimuli that are generally classified as large (henceforth, “large stimuli”), whereas object region (lateral occipital, LO) is biased toward small stimuli (Cate, Goodale, and Kohler, 2011; Konkle & Oliva, 2010). In sum, the findings above suggest that the ventral stream includes information other than just the category of the target (e.g., face, object, or place) but also its size and location. However, a unified account of these location and size biases and the selectivity for objects and scenes is still lacking. 3 1.3. Perceived Distance as an Integrating Variable One factor that may be common to all four types of biases (i.e., object category, location, retinotopy, and size) is the perceived distance between the observer and the visual stimuli – that is, egocentric distance. In particular, scenes are typically farther away whereas objects are more proximal. Furthermore, distal stimuli usually occupy the upper part of the visual field (e.g., Gibson, 1950, p.180; Yonas, Elieff, & Arterberry, 2002; Bruno & Cutting, 1988; Cutting & Vishton, 1995; Epstein, 1966) and do not require foveal vision for recognition, whereas proximal stimuli usually occupy the lower part of the visual field and do require foveal vision for recognition (Levy, Hasson, Avidan, Hendler, & Malach, 2001). Thus, the location and eccentricity biases in scene and object regions may reflect an egocentric distance bias. Finally, size and distance are inherently related to each other (in the realm of visual perception). Specifically, when presented with two items identical in size, the distal item is perceived to be larger than the proximal item (e.g., Murray et al, 2005, Smith, 1958). Thus, it is possible that effects that were attributed to size could actually be explained by distance. While size and distance are typically confounded, given that object and scene selective areas also showed biases to location (down /up) and eccentricity – with both being associated with distance but not size – we believe that it is distance and not size that may underlie all reported biases in object and scene areas. In sum, perceived distance can account for the location, eccentricity, and size biases reported above. Consistent with the view of the ventral/dorsal stream theory (Ungeleider and Mishkin, 1982), it is possible that this distance bias in the “what” stream reflects the ideal distance of a given stimulus from the observer to allow for intact identification. The current study tested whether category selective areas in the ventral visual stream represent egocentric distance information. In order to test the hypothesis, in both Experiments 1 and 2, we took advantage of the well-known distance illusion: the Ponzo illusion (e.g., Yi Li & Guo, 1995). This illusion is comprised of two lines that converge at the upper end (see Figure 1 for an illustration). As a result, the upper end is perceived as more distal than the lower end. The evidence for the existence of 4 such an illusion comes from the estimation of a “distal” (upper) item as larger than the “proximal” (lower) item, when in reality they are equal in size. We first pretested the effectiveness of the distance illusion using a psychophysical size estimation task and then examined the response of object and scene category regions to the distance manipulation. Our main hypothesis concerns the representation of perceived distance in the ventral visual stream. Although there are several candidate regions for testing the distance hypothesis, we chose, based on the literature reviewed above, four representative regions: scene regions (the parahippocampal place area, PPA, Epstein & Kanwisher, 1998; and transverse occipital sulcus, TOS, Epstein et al, 2007) and object regions (LO; and posterior fusiform gyrus: pFs; both comprises together the LOC: the Lateral Occipital Cortex, e.g., Malach et al 1995). We hypothesized that object regions (LO and pFs) would be biased toward perceived proximal stimuli, whereas scene regions (PPA and TOS) would be biased toward perceived distal stimuli. In Experiment 1 we tested this hypothesis by presenting buildings and objects within the Ponzo lines, in a “distal” (upper) or “proximal” (lower) locations. Objects and buildings were used as they were found to elicit a strong response in the LOC object area and the scene areas respectively (e.g., Malach et al, 1995 Epstein et al, 1998). From a theoretical point of view, the PPA response to buildings may be due to the fact that buildings are stable objects that often play a role in defining the spatial structure of the environment (Epstein, 2005). In Experiment 2 we tested the effect of distance independently of location of the target item, by presenting the stimuli in a fixed location at the center of the screen, while the Ponzo lines are presented in either the upper half or the lower half of the screen. Importantly, in both experiments, our perceived distance manipulation entailed egocentric and not retinotopic distance. In other words, stimuli were always presented foveally and perceived distance was defined as a function of the location of the stimulus in the Ponzo lines rather than retinotopic location. Finally, we collected eye position data outside of the scanner with the exact same design and stimuli as the experiment in the scanner, to confirm that subjects fixate on the stimuli and to rule out 5 the possibility that retinotopic and not spatiotopic effects accounts for our results. 2. Materials and Methods Our study included two fMRI experiments. Each fMRI experiment was preceded by a psychophysical experiment that took place outside the scanner, in which we assessed whether the particular Ponzo display that we used in the fMRI experiment indeed generates a distance illusion. Finally, we collected eye position data outside of the scanner. 2.1. Methods for Psychophysical Experiments To assess whether the Ponzo lines generated a distance illusion we took advantage of the well- established effect of distance on size perception (e.g., Murray, Boyaci, & Kersten, 2005; Smith, 1958) (see Figure 1 for an illustration). Participants were asked to compare the size of two same-identity stimuli, which appeared simultaneously in the upper and lower parts of two lines that converged toward the upper end (i.e., the Ponzo lines). The participants were asked to judge which stimulus was larger. A distance illusion is evident if the stimulus at the upper end of the Ponzo lines is judged to be larger than the stimulus at the lower end of the Ponzo lines, despite the fact that they are identical in size. 2.1.1. Participants Ten New York University undergraduates participated in the two psychophysical manipulation- check experiments, for partial fulfillment of course credit. The study was approved by the University Committee on Activities Involving Human Subjects (UCAIHS). 2.1.2. Stimuli The stimuli consisted of black and white photographs of 20 objects and 20 buildings. Images of buildings were taken from the site http://www.zoomap.co.il/default.asp, and were selected randomly 6 (See Figure 2 for an illustration). Objects were adapted from Epstein & Kanwisher (1998). A pair of vertically oriented converging straight lines, the Ponzo illusion, was used to create the illusion of depth. In Experiment 1, the Ponzo lines transversed the entire computer screen. In Experiment 2, the Ponzo lines appeared either at the upper half or the lower half of the screen. In both experiments, two same-identity stimuli appeared simultaneously at the upper end and at the lower end of the Ponzo lines. The stimuli appeared in five different sizes. In Experiment 1 (full screen Ponzo), the standard image that was used in the fMRI experiment and labeled here “100%,” was 4.5 cm in width, and 4.5 cm in length. In Experiment 2 (half screen Ponzo) the width of the standard image was 2.6 cm and the length was 2.6 cm. We created four additional stimuli that were 98%, 99%, 101% and 102% of the size of the standard stimuli. 2.1.3. Procedure Stimuli were presented using Superlab software. Trials of objects and buildings were presented in separate blocks, with a rest period in the middle of each block. Each trial consisted of two stimuli that appeared simultaneously at the upper end and at the lower end of the Ponzo lines. The two stimuli included the standard stimulus (size 100%) and same-identity stimulus in one of the five different sizes (98%, 99%, 100%, 101% or 102%). The location of the standard stimuli was at the upper part of the screen on half of the trials, and the lower part on the other half. In Experiment 1, the design was as follows: 2 Categories (Buildings, Objects) x 5 Sizes (98%, 99%, 100%, 101%, 102%) x 2 Locations of the standard stimulus (Upper, Lower). Each of the combinations was presented twice for each of the 20 Building and the 20 Object stimuli, making a total of 800 presentations. On each trial, the Ponzo lines with the two stimuli embedded in them appeared for 800 ms. In Experiment 2, the design was as follows: 2 Categories (Buildings, Objects) x 5 Sizes (98%, 99%, 100%, 101%, 102%) x 2 Locations of the standard stimulus (Upper, Lower) x 2 Locations of the Ponzo lines (Upper screen, Lower screen). Because this experiment included an additional condition 7 (the location of the Ponzo), to keep the length of this experiment similar to Experiment 1, each of the unique combinations was presented once for each of the 20 Building and 20 Object stimuli, making a total of 800 presentations. On each trial, the Ponzo lines with the two stimuli embedded in them appeared for 800 ms. In both experiments, after viewing the stimulus, the following question was presented: “Please indicate which of the two stimuli seemed larger: Press (A) if it was the upper, (H) if the lower or (L) if they seemed equal.” 2.2. Methods of the fMRI Experiments The fMRI experiments tested the prediction that object regions (the two parts of the LOC: LO, and pFs) would be biased toward perceived proximal information whereas scene regions (PPA and TOS) would be biased toward perceived distal information. We therefore ran a localizer task to identify object and scene regions in each subject (See Figure 3 for representative participants). Subsequently, we examined the response of these areas to the different conditions of the main task (i.e., perceived distance). 2.2.1. Participants Twenty-six healthy participants participated in Experiment 1 and 14 participants participated in Experiment 2. All participants had normal or corrected to normal vision and signed a consent form that was approved by the Helsinki committee of the Sourasky Medical Center, Israel. 2.2.2. Stimuli Localizer. In order to define functional regions of interest (fROI) we ran a localizer experiment. The stimuli consisted of three categories of items: Scenes, Objects, and Scrambled objects. Scrambled objects were designed as Fourier power-spectrum controls for the object pictures. They were constructed by performing a two dimensional (2D) Fourier transformation on high-pass filtered 8 pictures of the same objects (Malach et al, 1995). Each category included 80 different images. The images were digitized grayscale photographs (300x300 pixels) that were adapted from previous studies (Epstein and Kanwisher, 1998; Epstein et al, 1999). Main Task. The stimuli – buildings and objects - were the “standard stimuli” used in the psychophysical experiments. Thus, the stimuli in the fMRI experiments came in one size. In Experiment 1, the stimuli were 4.5 cm in width, and 4.5 cm in length. In Experiment 2, the stimuli were 2.6 cm and the length was 2.6 cm. We used the same Ponzo illusion that was used in the psychophysical experiments. In Experiment 1 (full screen Ponzo), the stimuli (buildings or objects) were presented either at the upper (“distal”) or the lower (“proximal”) part of the Ponzo lines. In Experiment 2 (half screen Ponzo), the stimuli, buildings and objects, were placed in a fixed location at the center of the screen, thus appearing either at the upper end of the lower Ponzo lines (“distal”) or at the lower end (“proximal”) of the upper Ponzo lines. 2.2.3. fMRI Data Acquisition Scanning was performed with a GE 3.0-T scanner at the Tel Aviv Sourasky Medical Center, Israel. For the functional scans, we collected 30 slices aligned parallel to the temporal lobe, TR = 2000 ms TE =35 ms, flip angle = 90. We also collected high-resolution anatomical images (SPGR) including 170 sagittal slices (TR = 6.5 ms; TE =2 ms, FOV = 256). 2.2.4. Procedure Stimuli were presented using Psychtoolbox implemented in MATLAB (Brainard, 1997). Each fMRI experiment consisted of two runs of the localizer and two runs of the experimental task, which were presented in an interleaved manner, starting with the localizer. Localizer Scan. The localizer aimed to localize scene and object regions, and included 2 scans, each consisting of 17 consecutive 16 s epochs, four for each category (Scenes, Objects and Scrambled objects) and five fixation blocks (total time 272 s). Each block included 20 stimuli. The serial position 9
Description: