Introduction to Transformational Grammar Kyle Johnson University of Massachusetts at Amherst Fall 2004 Contents Preface iii 1 The Subject Matter 1 1.1 Linguistics as learning theory . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The evidential basis of syntactic theory . . . . . . . . . . . . . . . . . 7 2 Phrase Structure 15 2.1 Substitution Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 X phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Arguments and Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . 41 3 Positioning Arguments 57 3.1 Expletives and the Extended Projection Principle . . . . . . . . . . . 58 3.2 Case Theory and ordering complements . . . . . . . . . . . . . . . . . 61 3.3 Small Clauses and the Derived Subjects Hypothesis . . . . . . . . . . 68 3.4 PRO and Control Infinitives . . . . . . . . . . . . . . . . . . . . . . . . 79 3.5 Evidence for Argument Movement from Quantifier Float . . . . . . . 83 3.6 Towards a typology of infinitive types . . . . . . . . . . . . . . . . . . 92 3.7 Constraints on Argument Movement and the typology of verbs . . . 97 4 Verb Movement 105 4.1 The “Classic” Verb Movement account . . . . . . . . . . . . . . . . . . 106 4.2 Head Movement’s role in “Verb Second” word order . . . . . . . . . . 115 4.3 The Pollockian revolution: exploded IPs . . . . . . . . . . . . . . . . . 123 4.4 Features and covert movement . . . . . . . . . . . . . . . . . . . . . . 136 5 Determiner Phrases and Noun Movement 149 5.1 The DP Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.2 Noun Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Contents 6 Complement Structure 179 6.1 Nouns and the θ-roles they assign . . . . . . . . . . . . . . . . . . . . 180 6.2 Double Object constructions and Larsonian shells . . . . . . . . . . . 195 6.3 Complement structure and Object Shift . . . . . . . . . . . . . . . . . 207 7 Subjects and Complex Predicates 229 7.1 Getting into the right position . . . . . . . . . . . . . . . . . . . . . . . 229 7.2 Subject Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 7.2.1 Argument Structure . . . . . . . . . . . . . . . . . . . . . . . . 235 7.2.2 The syntactic benefits of ν . . . . . . . . . . . . . . . . . . . . 245 7.3 The relative positions of µP and νP: Evidence from ‘again’ . . . . . . 246 7.4 The Minimal Link Condition and Romance causatives . . . . . . . . 254 7.5 Remaining Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 7.5.1 The main verb in English is too high . . . . . . . . . . . . . . 272 7.5.2 Incompatible with Quantifier Float . . . . . . . . . . . . . . . 274 7.5.3 PRO, Case Theory and the typology of Infinitives . . . . . . . 276 8 Rethinking Things 289 8.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 8.2 Towards Deriving X Theory . . . . . . . . . . . . . . . . . . . . . . . . 294 8.2.1 Kayne’s “Antisymmetry” hypothesis . . . . . . . . . . . . . . 294 8.2.2 The “Bare Phrase Structure” proposals . . . . . . . . . . . . . 313 8.3 Embedding the theory in a framework without X Skeleta . . . . . . . 326 Bibliography 347 ii Preface These are the, always evolving, notes from an introductory course on syntactic the- ory taught at the University of Massachusetts. Its target audience is first-year grad- uate students, but no background exposure to syntax is presupposed. These notes augment a set of readings, which are: • Chomsky, Noam. Aspects of the Theory of Syntax. Cambridge, Massachusetts: M.I.T. Press, 1965. Chapter 1. • Stowell, T. “Origins of Phrase Structure.” Doctoral Dissertation,Massachusetts Institute of Technology, 1981. Chapters 1 & 2. • Sportiche, D. “A Theory of Floating Quantifiers and Its Corollaries for Con- stituent Structure.” Linguistic Inquiry 19 (1988): 425-49. • Pollock, J.-Y. “Verb Movement, UG and the Structure of IP.” Linguistic In- quiry 20 (1989): 365-424. • Vikner, S. Verb Movement and Expletive Subjects in the Germanic Languages. Oxford: Oxford University Press, 1995. Chapter 3. • Chomsky, N. “Some Notes on Economy of Derivation and Representation.” In Principles and Parameters in Comparative Grammar, ed. Freidin, R. 417- 54. Cambridge, Massachusetts: MIT Press, 1991. pp. 417-454. • Kayne, R. “Unambiguous Paths.” In Connectedness and Binary Branching, 129-64. Dordrecht: Foris Publications, 1984. pp. 129-164. • Abney, S. “The English Noun Phrase in Its Sentential Aspect.” Doctoral Dis- sertation, Massachusetts Institute of Technology, 1987. Chapters 1 & 2. • Bernstein, J. B. “Topics in the Syntax of Nominal Structure Across Romance.” Doctoral Dissertation, City University of New York, 1993. Chapter 3. Preface • Larson, R. “On theDoubleObject Construction.” Linguistic Inquiry 19 (1988): 335-92. • Johnson, K. “Object Positions.” Natural Language and Linguistic Theory 9 (1991): 577-636. • Kratzer, A. “Severing the external argument from its verb,” In Phrase Struc- ture and the Lexicon, eds. Rooryck, Johan and Zaring, Laurie. 109-137. Kluwer Academic Publishers, 1996. • von Stechow, A. “The Different Readings of wieder “again”: A Structural Ac- count.” Journal of Semantics 13 (1996): 87-138. • Kayne, R. S. The Antisymmetry of Syntax. Cambridge, Massachusetts: M.I.T. Press, 1994. Chapters 1-5. • Chomsky, N. “Bare Phrase Structure.” In Government binding theory and the minimalist program, ed. Webelhuth, G. 383-439. Oxford: Oxford University Press, 1995. iv 1 The Subject Matter Linguistic theory, and so syntactic theory, has been very heavily influenced by learnability considerations, thanks largely to the writings of Noam Chomsky. If we decide that syntactic theory is charged with the duty of modeling part of our knowledge of language, that is that it concerns cognition, physiology or whatever “knowledge” ends up belonging to, then one of the questions that arises is how this knowledge is acquired by the child. A number of considerations combine to make this task look very difficult indeed: the complexity of the acquired grammar, for example, as well as the anemic nature of the data available to the child. In addition, the fact that children appear to learn any particular language with relative ease in a very short period of time and that the course of acquisition goes through a set schedule of stages, makes the acquisition of linguistic knowledge look quite dif- ferent than the acquisition of more familiar domains of knowledge – elementary geometry, for example, or, as you shall see, syntactic theory. How is it that some- thing this complex can, on the basis of such limited information, be acquired with such ease and speed? 1.1 Linguistics as learning theory Chomsky proposed that linguistic theory itself should contribute to solving this puzzle. The classical formulation of his idea (see Aspects of the Theory of Syntax and The Sound Pattern of English) characterizes the situation as follows. Think of a grammar of L (GL) (this is what Chomsky (1986b) calls “I-Language”) as a set of rules that generates structural descriptions of the strings of the language L (Chom- sky’s E-language). Ourmodel of this grammar is descriptively adequate if it assigns 1. The Subject Matter the same structural descriptions to the strings of L that GL does. We can think of the learning process as being the selection from the Universe of GLs the very one that generates the strings of the L to be acquired. The learning problem can now be stated in the following terms: how is it that the learning procedure is able to find GL when the universe of Gs is so huge and the evidence steering the device so meager. One step towards solving this problem would be to hypothesize that the uni- verse of Gs has structure (i.e, is not so large), and this is the direction that Chomsky takes. This amounts to the claim that there are features of Gs which are built-in: certain properties which distinguish the natural class of Gs from the rest. There is a kind of meta-grammar of the Gs, then, which is sometimes referred to with the label Universal Grammar. Chomsky further hypothesizes that these properties are biologically given: that it is something about the construction of the human brain/mind that is responsible for the fact that the class of Gs are the way they are. This argument, the one that leads from the observation that GLs have features that are too complex to be learned to the conclusion that the universe of Gs is con- strained is often called “The Poverty of the Stimulus” argument. It is a classic from Epistemology, imported with specific force by Chomsky into linguistics and given a biological interpretation. This way of setting up the problem, note, allows for the Universe of Gs to be larger than the learnable Gs. There could be, for instance, constraints imposed by the parsing and production procedures which limit the set of Gs that can be attained. And it’s conceivable that there are properties of the learning procedure itself – properties that are independent of the structure of Gs imposed by Univer- sal Grammar – which could place a limit on the learnable Gs. Universal Grammar places an outside bound on the learnable grammars, but it needn’t be solely respon- sible for fitting the actual outlines of that boundary. It’s therefore a little misleading to say that the set of “learnable Gs” are those characterized by Universal Grammar, since there may be these other factors involved in determining whether a grammar is learnable or not. I should probably say that Universal Grammar carves out the “available Gs,” or something similar. But I will instead be misleading, and describe Universal Grammar as fixing the set of learnable Gs, always leaving tacit that this is just grammar’s contribution to the learnability question. Chomsky proposes, then, that a goal of syntactic theory should be to contribute towards structuring the universe of Gs. He makes some specific proposals about how to envision this in Aspects of The Theory of Syntax. He suggests that syntactic theory should include an evaluation metric which “ranks” Gs. A syntactic theory that has this feature he calls “explanatory.” Thus “explanatory theory” has a specific, technical, sense in linguistic theory. A theory is explanatory if and only if it encap- sulates the features that ranks Gs in such a way that it contributes to the learnability 2 Linguistics as learning theory problem, distinguishing the learnable Gs from the unlearnable ones. This criterion can help the analyst decide whether the model of GL he or she has proposed corre- sponds exactly to GL. In particular, the many descriptively adequate models of GL can be distinguished on this basis: select only those models that are ranked highest by the evaluation metric. This model will meet the criterion of explanatory ade- quacy. It alone will have the properties that enable, under a particular learnability theory, the acquisition of the GL that is being described. A very important role, therefore, is played by the evaluation metric. At the time of Aspects, the learning procedure was conceived of as a process very much like that which the linguist goes through. The child builds a battery of rules which gen- erate the strings of L. The evaluation metric steering this process was thought to have essentially two parts: a simplicity metric, which guides the procedure in its search through the space of grammars, and inviolable constraints, which partitions the set of Gs into the learnable ones and the unlearnable ones. Thus, for example, we might imagine that rules which used fewer symbols could be defined as “sim- pler” than ones that used a greater number of symbols. Inviolable constraints might be those, for example, expressed as part of X Theory which places constraints on phrase structure grammar, and therefore simply removes from the universe of Gs a great many possible Gs. Let’s call these models of Gs “rule based,” because the simplicity metric is defined as a rule construction procedure, and let’s call the com- panion picture of the acquisition process the “Little Linguist” model. To take a concrete example, if X Theory – the theory that places limits on phrase 1 structure in Universal Grammar – imposes the constraints expressed in (1) on all phrase structure rules, then the evaluation metric leaves to the learner only the matter of filling in the variables W, X, Y and Z, discovering their linear order, and determining what coöccurrence restrictions there are on the phrases. (1) a. XP → { (ZP), X } b. X → { X, (YP) } 0 c. X → { X , (WP) } (Understand “{α, β}” to signify that α and β are sisters, “(α)” to indicate that α is optional, and α → β to mean that α immediately dominates β.) As the child goes from step to step in matching the grammar he or she is constructing with the information coming in, these are the only decisions that have to be made. If we imagine that this set of options were to be operationalized into a concrete decision tree, then we could see this as constituting a kind of “simplicity metric.” It would constitute a procedure for searching through the space of learnable grammars that ranks the grammars. Additionally, X Theory provides information which places an 1 This will be the subject of the following chapter. 3 1. The Subject Matter absolute cap on the possible phrase markers. In this respect it also illustrates an inviolable constraint. Let’s consider another example, one that Chomsky often points to, involving transformational rules. Transformational rules map one representation to another, typically by way of relocating constituents. Interestingly, it appears that all such rules are “structure dependent.” That is, they make reference to the relative struc- tural positions of the moved thing and the position it is moved to. They don’t, for example, make reference to points in a string on the basis of their position rela- tive to some numerical count of formatives. Thus “Wh-Movement” moves maximal projections that meet certain criteria to particular positions in a phrase marker. And this operation is governed by a set of constraints that make reference to the relation between these points solely in terms of structure. There is no rule, for ex- ample, like Wh-Movement which affects terms based on how far apart they are nu- merically. Thus, the learning procedure will never have to entertain the hypothesis that GL should contain such rules. In both cases, the classic argument for distinguishing the inviolable constraint from the simplicity metric follows very closely the logic of the poverty of stimulus argument. Because it is difficult (maybe even provably impossible) to see how such things as X Theory or structure dependence could be learned, they must belong to the features that define the universe of Gs. And because they are overarching properties of the rules in some GL, they also have the right form to be inviolable constraints. There is another argument towards the same end which has gained increasing influence in the last couple decades; and this one comes to us through the narrowly linguistic study of language typology, and only tangentially from learnability con- siderations. I will call it “Humboldt’s argument,” though it no doubt has an earlier champion. Humboldt’s argument is based on the observation that there are certain properties that appear to hold true of all GLs. This can be explained, Humboldt ar- gues, only if the universe of Gs is constrained to just those which have the relevant, universal, properties. Like Chomsky, Humboldt relates this to the construction of 2 the mind, and uses the language of learnability in his account. He puts it this way: Since the natural inclination to language is universal to man, and since all men must carry the key to the understanding of all languages in their minds, it follows automatically that the form of all languagesmust be fundamentally identical and must always achieve a common objec- tive. The variety among languages can lie only in the media and the limits permitted the attainment of the objective. (von Humboldt (1836)) 2 Onemight read the last sentence of this passage asmaking the distinction, touched on above, between aspects of Universal Grammar (“the media”) and the limits our cognition places on exploiting UG (“the limits permitted the attainment of the objective”). 4