Preface This book is based on a set of beliefs about what are the important issues in the study of human language comprehension and production. These beliefs situate the mental lexicon as the central link in language processing. The lexicon serves, on the one hand, to relate the speech signal to mental representations of lexical form, and, on the other, to relate lexical contents to the syntactic and semantic interpretation of the message being com municated. This means that the proper psycholinguistic study of the lexicon requires us to combine theories of the form and content of lexical represen tation with theories of lexical processing-of how lexically stored knowl edge is brought to bear during the on-line processes of comprehension and production. These themes are realized in four different ways in the four parts of the book. Part I presents some general, process-oriented accounts of lexical processing. Marslen-Wilson (chapter 1), Seidenberg (chapter 2), and Fors ter (chapter 3) look at the structure of the comprehension process; Butter worth (chapter 4) and Dell (chapter 5) examine the production process. In each case, the emphasis is on the properties of the lexical system viewed from a primarily psychological process perspective-a perspective that varies radically from the distributed connectionist approach of Seiden berg to the more traditional symbolic approaches of Forster and Butterworth. Part II, which still focuses more on process than on representation, looks at the nature of the input to the lexical processing system. The questions asked here are questions about the nature of the processing relationship that holds between lexical form representations and the sensory input. Klatt (chapter 6) and Elman (chapter 7) look at the immediate processing of speech, asking how the speech signal, in all its variability, is mapped onto mental representations of lexical form. Rayner and Balota (chapter 8) and Besner and Johnston (chapter 9) look at the somewhat different issues raised by access processes in the visual domain. Preface x Part III looks much more explicitly at theories of lexical representation and at their consequences for theories of lexical process. Frauenfelder and Lahiri (chapter 10) and Cutler (chapter 11) work out some of the intriguing consequences of modern phonological theory-both for psycholinguistic theories oflexical representation and for models of the processes that map from the speech input onto these representations. The next two chapters focus on another aspect oflinguistic form-the morphological structure of complex word forms-and on its consequences for readers and listeners. Henderson (chapter 12) surveys current psychological approaches to mor phology in the visual domain; Hankamer (chapter 13) examines some of the psycholinguistic implications of a morphological parser for Turkish (which is vastly different in its morphological properties from English, to which most research in morphology has been restricted). Schreuder and Flores D'Arcais (chapter 14) turn their attention to semantic representa tions in the lexicon and to current psycholinguistic views of this domain. Part IV focuses on the role of lexical representations in the processes of syntactic parsing and semantic interpretation. What kinds of structural information are coded in the lexicon, and how is this information deployed over time as the speech is heard or the text is read? This is a richly controversial area, and the controversy is reflected in the four chapters here. Frazier (chapter 17) assigns a somewhat different and more limited role to lexical information in structuring the immediate analysis process than Tyler (chapter 15), Steedman (chapter 16), or Tanenhaus and Carlson (chapter 18). Tyler focuses primarily on the process aspects of the relation ship between lexical representations and higher-level processes, whereas Steedman examines the role of the lexicon from the perspective of an incremental parser based on categorial grammar. Tanenhaus and Carlson argue for the value of the linguistic concept of thematic role, stored lexically, in illuminating the on-line process of parsing and interpretation. The overall structure of the book reflects, then, the basic structure of the lexical processing system, with a central core of lexical representations of sounds and meanings looking both outward toward the signal and inward toward the message. But the book also reflects a field in the throes of transition. On the one hand, linguistic concepts have reappeared as a crucial input to psycholinguistic models; on the other, the radical alterna tive of a reborn associationism offers quite different ways of representing regularity and structure in lexical representations and processes. We live in interesting times. Acknowledgments This book grew out of a conference on Lexical Representation and Process held in Nijmegen, The Netherlands, under the joint sponsorship of the Max-Planck-Institut fiir Psycholinguistik and the Interfacultaire Werk groep Taal en Spraakgedrag of the University of Nijmegen. I thank my colleagues in the MPI Language Comprehension Group, in particular Uli Frauenfelder and Aditi Lahiri, for their help in organizing the conference, which we did in collaboration with Rob Schreuder of the University of Nijmegen-whose help I also gratefully acknowledge. I am especially grateful to Edith Sjoerdsma for the cheerful efficiency with which she handled the vast amounts of paperwork and organizational detail involved in running an international meeting. I also thank Mr. Koenig, head of the MPI Administration, for the effective and flexible support that he provided throughout-not forgetting the excellent party that he and his staff or ganized at the end of the conference. My fellow directors, Pim Levelt and Wolfgang Klein, were generous in their advice and support right from the beginning of this project. I am most grateful for the further efforts of Edith Sjoerdsma and for the help and support of the MRC Applied Psychology Unit in the preparation of the manuscript. I also thank Robert Bolick of The MIT Press for his advice and encouragement. And best thanks, finally, to Lolly Tyler, who bore up pretty well through all of this. William Marslen-Wilson Chapter 1 Access and Integration: William Marslen-Wilson Projecting Sound onto Meaning The role of the mental lexicon in human speech comprehension is to mediate between two fundamentally distinct representational and compu tational domains: the acoustic-phonetic analysis of the incoming speech signal, and the syntactic and semantic interpretation of the message being communicated. My purpose in this chapter is to describe the consequences of this duality of representation and function for the organization of the mental lexicon as an information-processing system-that is, for the way the system is organized to manage the on-line projection of sound onto meaning. I will do so in terms of the two major processing functions that the lexicon needs to fulfill, which correspond to the two processing domains in which it participates. The first of these, the access function, concerns the relationship of the lexical processing system to the sensory input-what I will refer to as the domain of form-based functions and processes. The system must provide the basis for a mapping of the speech signal onto the representations of word forms in the mental lexicon. If one assumes some form of acoustic phonetic analysis of the speech input, it is a representation of the input in these terms that is projected onto the mental lexicon. The integration function, conversely, concerns the relationship of the lexical processing system to the higher-level representation of the utterance. In order to complete the recognition process, the system must provide the basis for the integration, into this higher level of representation, of the syntactic and semantic information associated with the word that is being recognized. This is the domain of content-based functions and processes. The general problem for a processing theory of the mental lexicon is to understand the nature of the process that links these two functional domains. The specific and immediate puzzle is to understand how the system is able to solve this problem-to project sound onto meaning- Marslen-Wilson 4 with the speed and the seamless continuity that is evidenced by our subjec tive experience and by experimental research. Our phenomenological experience of speech is that we understand it as we hear it. We do not, under normal conditions oflanguage use, hear speech as episodes of uninterpreted sensory input followed at some delay by bursts of meaningful interpretation. Instead, our immediate experience is of a process of unbroken and continuous interpretation.1 This subjective im pression is borne out in many different experimental studies conducted over the past 15 years. These studies confirm that the speech signal is continu ously and immediately projected not only onto the lexical level but also onto levels of semantic and pragmatic interpretation. In fact, the projection from signal to message seems to be carried out just about as fast as is either neurally plausible or informationally possible. If we look, for example, at the performance of close shadowers, we see a direct illustration of the quantitative and qualitative properties of the basic transfer function of the system (Marslen-Wilson 1973, 1975, 1985; Marslen Wilson and Welsh 1978). Close shadowers are individuals who are able to repeat back connected discourse at repetition delays averaging around 250 msec (measured from the onset of a word as they hear it to the onset of the same word as they repeat it). At delays as short as this, the shadowers seem to be operating near the limits of the ability of the speech signal to deliver information. Nonetheless, in a wide variety of tests, it is clear that even at these extreme temporal limits they are repeating back the speech input in the context of its ongoing interpretation, not just at the lexical level but also in terms of the syntactic and semantic constraints provided by the current utterance context (Marslen-Wilson 1975). The picture of human speech processing revealed in close-shadowing performance is corroborated in many other studies, which confirm (on the one hand) the speed with which the speech signal is projected onto the lexicon and (on the other) the earliness with which it starts to have con sequences for higher-level interpretation. (For detailed discussions of this research, see Marslen-Wilson 1984, 1987 and Marslen-Wilson and Tyler 1980, 1981, 1987.) I take these basic performance characteristics of the lexical processing system as the starting point for the present discussion. 1 Speed and Parallelism in Access and Integration The problem of lexical processing is viewed, classically, as a problem of selection. How, given a particular sensory input, does the listener select the word form that best matches this input? How far, furthermore, is this Access and Integration 5 selection process purely a form-based access process? Is the outcome of the recognition process determined solely by goodness of fit in the form do main, or does it also involve goodness of fit in the content domain? Cast in the context of the speed and immediacy of lexical processing, the answer to these questions reveals a close interdependence between the domains of form and content in the on-line processing of spoken language. This is because the evidence for the speed of speech comprehension is also evidence for what I have called early selection (Marslen-Wilson 1987) that is, the identification of spoken words, in normal utterance contexts, before sufficient acoustic-phonetic information has accumulated to allow the identification decision to be made on this basis alone. Numerous studies, using shadowing (Marslen-Wilson 1973, 1975, 1985), monitoring (Marslen-Wilson and Tyler 1975, 1980; Marslen-Wilson, Brown, and Tyler, 1988), and gating techniques (Tyler 1984; Tyler and Wessels 1983, 1985), show not only that words are, on average, recognized in context about 200 msec from word onset, but also that the sensory information available at that point is normally quite insufficient by itself to allow the correct identification of the word being heard (Marslen-Wilson 1984, 1987). 2 This means that the on-line selection process cannot be just a matter of form-based processing. Discrimination between lexical forms cannot be treated purely in terms of information derived from the sensory input. Instead, the perceptual process involves the intersection of two sets of constraints-sensory and contextual-involving not just access but also integration. The sensory constraints derive from the goodness of fit of different word forms to the incoming acoustic-phonetic analysis of the speech signal; the contextual constraints derive from the goodness of fit of different lexical contents to the current utterance and discourse context. Neither of these kinds of constraints is, by itself, adequate to uniquely specify the correct word candidate at the observed moment of successful selection in normal processing. Together, however, they converge to define a unique intercept-to define the single correct path between sound and meaning. This on-line processing dependency between form-based and content based processing domains closely determines the basic functional structure of the lexical processing system. Specifically, it requires that the system be able to combine multiple access oflexical forms with multiple assessment of the contextual appropriateness of the lexical contents associated with these forms. Multiple access is the accessing of multiple candidates in the mapping of the acoustic-phonetic input onto the mental representations of lexical Marslen-Wilson 6 form. The sensory input defines a class of potential word forms, all of which (in principle) must be made available for assessment against the current utterance and discourse context. Multiple assessment is the corollary of multiple access. If the input generates multiple candidates, then it must be possible for the system to assess all these candidates for their contextual appropriateness. In postulating a system capable of realizing these capacities, we need, finally, to bear in mind the constraints imposed by the observed real-time performance of the human selection process-that is, by the fact that sensory and contextual constraints converge on their target, on average, within about 200 msec from word onset. The effect of this is to rule out strictly serial models of access and selection. It requires, instead, as I have argued in detail elsewhere, a processing model that embodies some form of functional parallelism (Marslen-Wilson 1987; Marslen-Wilson and Tyler 1981; Marslen-Wilson and Welsh 1978). This gives a basic outline of the processing solution that the mental lexicon finds to the problem of how to relate information in the form domain to information in the content domain. What I am concerned with in the rest of this chapter is the additional constraints that can be placed on the model of this solution. I will begin by considering how the perfor mance of the system in the access domain, as it projects the sensory input onto mental representations of lexical form, can be further specified. It is convenient to do this in the context of a more concrete model of the form-based access process, and I will use for this purpose the cohort model of spoken-word recognition. 2 The Cohort Model and the Contingency of Perceptual Choice With the details of any specific implementation left aside, the cohort model rather straightforwardly captures the requirements for functional paral lelism in the process of lexical access. For present purposes, its general properties can be laid out as follows: • It assumes discrete, computationally independent recognition elements for each lexical unit, where each such unit represents the functional co ordination of the bundle of phonological, morphological, syntactic, and semantic properties defining a given lexical entry. • Each recognition element can be directly and independently activated by the appropriate patterns in the sensory input. • The level of activation of each element increases as a function of the goodness of fit of the input pattern to the form specifications for each Access and Integration 7 element. When the input pattern fails to match, the level of activation immediately starts to decay.3 These assumptions, taken together, lead to the characteristic cohort view of the form-based access and selection process, as specified for words heard in isolation. The process begins with the multiple access of word candidates as the first one or two segments of the word are heard. All the words in the listener's mental lexicon that share this onset sequence are assumed to be activated. This initial pool of active word candidates constitutes the word initial cohort, which represents the primary decision space within which the subsequent process of selection will take place. The selection decision itself is based on a process of successive reduction of the active membership of the cohort of competitors. As more of the word is heard, the accumulating input pattern will diverge from the form specifications of an increasingly high proportion of the cohort's membership. This process of reduction continues until there remains only one candidate that still matches the sensory input-in activation terms, until the level of activation of one recognition element is criterially discriminable from the level of activation of its competitors. At this point the form-based selection process is complete, and the word form that best matches the speech input can be identified. This is a very approximate and underspecified view of the process of access and selection, but it does make explicit one very important claim about the nature of perceptual processing in the speech domain: the claim for what I have labeled the contingency of perceptual choice (Marslen Wilson 1987). The identification of any given word does not depend simply on the information that this word is present. It also depends on the information that other words are not present, since it is only at this point that the unique candidate emerges from among its competitors. This means that the outcome and the timing of the selection process are determined by the properties of the complete ensemble of perceptual possibilities open to the listener. These claims for the contingency of perceptual choice are intimately bound up with the notion of recognition point. This is the claim, deriving from the cohort analysis of the recognition process, that the point at which a word can be recognized is a function not just of the word itself but also of its nearest competitors. The concept of recognition point is central to the cohort model, and it is this that makes the model empirically powerful. It gives us the ability to take any word and to predict when it can be recognized. By looking at the word and at its cohort of competitors, we Marslen-Wilson 8 can predict the point at which the word will become unique, and, therefore, the point at which it can be recognized. It is difficult to take seriously theories of spoken-word recognition that are not able to make word specific predictions of this kind, especially in view of the accumulating evidence for the psycholinguistic validity in on-line processing of the notion ofrecognition point (Marslen-Wilson 1984, 1987; Tyler and Wessels 1983). The notion of contingent perceptual choice is also central to the new generation of parallel distributed processing models of the lexicon (Elman, this volume; Rumelhart et al. 1986; Seidenberg, this volume). Whether one is looking at localist models (such as TRACE; see Elman in this volume and McClelland and Elman 1986) or at genuinely distributed models derived by using back-propagation learning algorithms, in each case the system's response to a given input is contingent on the properties of the ensemble of simultaneous perceptual possibilities in the context of which the input is being analyzed. This holds true whether the effects of the ensemble are pre-compiled into the system as a result of its learning experience (as in the Seidenberg-McClelland pronunciation model; see Seidenberg in this volume) or whether these effects emerge on-line through the competition between different candidates (as in models such as TRACE, or as in the early interactive activation models of visual-word recognition [see, e.g., McClelland and Rumelhart 1981]). TRACE, in fact, was originally designed as a realization of the per formance characteristics embodied in the cohort model (Elman and McClelland 1984), and can be taken as a demonstration of the computa tional feasibility of this type of functionally parallel processing system though this is not to say (as Elman points out in his chapter below) that this is the only type of model that can exhibit cohort-like behavior. 3 Continuity of Information Uptake in Form-Based Access Not only does the cohort approach to form-based processing bring out the contingent nature of perceptual analysis; it also brings into focus its con tinuous and sequential nature. The cohort model has standardly taken a strong position on the continuity and the sequentiality of access and selection, assuming that the system takes maximal advantage of stimulus information as it becomes available over time. This position was supported by earlier findings (Marslen-Wilson 1978, 1980, 1984) that projection onto the lexicon was continuous at least down to the segment. There was no evidence-for example, from non-word detection tasks-that access to the