ebook img

Arden of Faversham, Shakespearean PDF

43 Pages·2017·6.61 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Arden of Faversham, Shakespearean

Chapter 9 Arden of Faversham, Shakespearean Authorship, and 'The Print of Many' JACK ELLIOTT AND BRETT GREATLEY-HIRSCH The butchered body of Thomas Arden is found in the field behind the Abbey. After reporting this discovery to Arden's wife, Franklin surveys the circumstantial evidence of footprints in the snow and blood at the scene: I fear me he was murdered in this house And carried to the fields, for from that place Backwards and forwards may you see The print of many feet within the snow. And look about this chamber where we are, And you shall find part of his guiltless blood; For in his slip-shoe did I find some rushes, Which argueth he was murdered in this room. (14.388-95) Although The Chronicles of England, Scotland, and Ireland by Raphael Holinshed and his editors records the identities and fates of Arden's murderers in meticulous detail (Holinshed 1587, 4M4r-4M6v; 1587, 5K1v-5K3v), those responsible for composing the dramatization of this tragic episode remain unknown. We know that the bookseller Edward White entered The Lamentable and True Tragedy ofA rden ofF aversham in Kent into the Stationers' Register on 3 April 1592 (Arber 1875-94, 2: 607 ), and we believe, on the basis of typographical evidence, that Edward Allde printed the playbook later that year ( STC 733). We know that Abel Jeffes also printed an illicit edition of the playbook, as outlined in disciplinary proceedings brought against him and White in a Stationers' Court record dated 18 December 1592 (Greg and Boswell 1930, 44). No copies of this pirate edition survive, but it is assumed to have merely been a reprint of White's; Je ffes was imprisoned on 7 August 1592, so his edition must have appeared prior to this date. James Roberts printed a second edition for White in 1599 (STC 734). White presumably transferred the rights to publish the play to Edward Allde in 1624, and Edward's widow Elizabeth issued a third edition in 1633 (STC 735).1 1 On the early publication history of the play, see the Introductions to the Malone Society Reprints and Revels Plays editions (Macdonald and Smith 1947, v-vii; Wine 1973, xix-xxiv). 140 JACK ELLIOTT AND BRETT GREATLEY-HIRSCH The three extant editions of Arden ofF aversham bear no indication of authorship or of the aus- pices under which the play was first produced. This is not uncommon for plays printed from 1580 to 1599, even those associated with professional companies: out of 84 such playbooks printed during this period, 54 (or 63 per cent) do not identify their authors, and 21 (or 25 per cent) men- tion neither author nor repertory. (These figures, derived from the online DEEP: Database of Early English Playbooks, exclude academic and closet drama, translations, dramatic interludes, inns-of-court plays, pageants, and occasional entertainments.) Despite the lack of company ascription and records of performance prior to the eighteenth century, critics generally assume that Arden ofF aversham belongs to the professional theatre. For Alexander Leggatt, the ability of the writer of Arden ofF aversham 'to open a vein of realism in Elizabethan drama' shows a height- ened level of professionalism and a remarkable familiarity with the drama of the time, given that 'realism is a sophisticated form, and realism as he practises it is often complex and mysterious', and he noted parallels with Shakespeare's 1 Henry VI (Leggatt 1983, 133). Will Sharpe describes 'whoever wrote Arden of Faversham' as 'one of the most innovative and daring talents the Renaissance theatre ever saw' (Sharpe 2013, 650). Likewise, Martin White's assessment of the 'undoubted strengths of the play', including 'the complexity of its characterization, the linking of language and themes, the interweaving of public and private issues, and the constant awareness of the potential of the theatrical experience', lead him to conclude 'that the author was a master play- wright' (White 1982, xiii). In his recent edition of the play, Martin Wiggins challenges this prevailing critical consensus, arguing that the author 'was not a theatre professional', but rather 'an enthusiastic amateur' (Wiggins 2008, 285-6). MacDonald P. Jackson offers a careful rebuttal of Wiggins's argument (Jackson 2014a, 104-13). Even if Wiggins is right, he does not go so far as to categorize the play as closet drama, acknowledging that the author of Arden of Faversham was 'far more likely to have been a man' writing 'in the milieu of the developing commercial theatre' (Wiggins 2008, 284). If Arden ofF aversham is a closet drama, it is an unusual, if not unique, example: it lacks the charac- teristic neo-Senecan 'high style' and declamation, and, while closet drama is frequently con- cerned with familial subjects, Arden of Faversham's 'native, bourgeois, homely settings and characters' (Hackett 2013, 156)-the generic features of domestic tragedy-are at odds with the tragedies of state more typical of the closet drama and better suited to its readership, 'Tragoedia cothurnata, fitting kings' (The Spanish Tragedy 4.1.154). Critics have given insightful readings of Arden of Faversham as a domestic tragedy (Adams 1943; Orlin 1994; Berek 2008). Moreover, a search of DEEP shows that all of the extant playbooks printed from 1580 to 1599 and designated by modern scholars as closet drama explicitly name their authors. The only external evidence for authorship appears in a catalogue of playbooks appended to his 1656 edition of Thomas Middleton's The Old Law (Wing M1048) by the publisher Edward Archer. This list attributes Arden of Faversham to Richard Bernard, a clergyman and author of a popular edition of Terence's plays in both Latin and English. Alongside ascriptions that are accurate, Archer's list has others that are highly unlikely or impossible. W. W. Greg thought that at least some of the errors were compositorial in nature, resulting from a misalignment of the columns when the table was set for print, such that Archer may have intended to designate Shakespeare as the author of Arden of Faversham, since his name appears in the misaligned entry directly above it. Even if Archer meant this, the evidence is unreliable because although Archer 'shows occa- sional signs of rather unexpected knowledge', according to Greg, 'his blunders ... are so many and so gross that very little reliance can be placed upon any particular ascription' (Greg 1945, 135). With the external evidence unreliable, scholars have turned to close analysis of the play's style for internal evidence of its authorship. Several of the major figures actively writing for the profes- sional London theatre during the 1590s, including Robert Greene, Thomas Kyd, Christopher ARDEN OF FAVERSHAM AND 'THE PRINT OF MANY' 141 Marlowe, and Shakespeare, have been proposed as author( s) ofA rden ofF aversham (Kinney 2009a, 80-91; Sharpe 2013, 650-7), with Kyd, Marlowe, and Shakespeare emerging as the primary sus- pects. Our purpose here is to subject Arden of Faversham to rigorous statistical and computa- tional analysis using the most advanced and recent techniques. Before turning to the results of this analysis, we will give our rationale for text selection and preparation, outline the methods employed, and explain their strengths and weaknesses. Text Selection and Preparation Computational authorship attribution needs a corpus of machine-readable (that is, electronic) texts, what we will call our authors-corpus, which are searched for stylistic patterns in order to generate authorial profiles that may be compared with a correspondingly generated profile for the text to be attributed. In an ideal universe, such an authors-corpus would consist only of well- attributed, sole-authored texts of sound provenance, with each of the individual authors repre- sented by equally sized bodies of writing. The full body of surviving English drama of the 1580s and 1590s is far from this ideal. Many playbooks in print-the primary form in which these plays come down to us-were anonym- ously published and/or collaboratively written (Masten 1997; Hirschfield 2004; Nicol 2012; Jackson 2012a). External evidence for plays' authorship is seldom unambiguous and often unreli- able. Whether by accident or fraud, publishers named the wrong authors on their playbooks. Early modern commentators were as prone to err as we are, as with Archer's list or the gross inac- curacies oflater cataloguers such as Edward Phillips. Some external evidence is simply inscrut- able: scholars continue to puzzle over what Philip Henslowe meant by the letters 'ne' inserted alongside records of particular performances in his Diary (Foakes 2002, xxxiii-xxxv). 2 The texts of 1 and 2 Tamburlaine the Great exemplify the problem. Early printed editions of these plays name no author, nor do the plays' entries in the Stationers' Register. The only external evidence for Marlowe's authorship appears in The Arraignment oft he Whole Creature at the Bar of Religion, Reason, and Experience (STC 13538.5), a theological treatise published in 1632, decades after the play's composition, in which the marginal gloss 'Marlow in his Poem' appears alongside a passage describing episodes in the Tamburlaine story (2H4v). It is not without irony that the source of this Marlovian attribution is itself frequently misattributed: until scholars noted a mar- ginal direction to 'See my Preface before Origens Repentance' (T1v) identifying Stephen Jerome as the author, The Arraignment was erroneously ascribed to Robert Henderson and Robert Harris, both mistaken for Robert Hobson, the volume's editor, who signed his dedication 'R. H.'. Thus 'By the most conservative standards of cataloguing', Lukas Erne remarks, 'Tamburlaine would in fact have to be regarded as an anonymous play' (Erne 2013b, 64 n. 25). The difficulties of establishing authorial canons are surveyed in relation to Shakespeare in the chapters by Gary Taylor and Rory Loughnane ('Canon and Chronology', Chapter 25) and Gabriel Egan ('A History of Shakespearean Authorship Attribution', Chapter 2) in this volume. Despite the problems, in many cases there exists a sufficiently broad scholarly consensus for a study such as this one to treat these as known entities that may be used to explore such an 2 Subsequent references to the Diary are to this edition. The conventional understanding of 'ne' as 'new' has been challenged by Winifred Frazer, who suggests it is an abbreviation designating plays performed at Newington Butts (Frazer 1991). Terence G. Schoone-Jongen points out the problems with this interpretation, not least that Henslowe continued to designate plays as 'ne' after the Newington Butts playhouse closed (Schoone- jongen 2008, 152-3). 142 JACK ELLIOTT AND BRETT GREATLEY-HIRSCH unknown as the authorship of Arden ofF aversham. The authors-corpus constructed for the pres- ent study reflects a bibliographical conservatism built on the broadest scholarly consensus. Since Arden ofF aversham was, we think, written for commercial performance or, at the very least, with the professional theatre in mind, we exclude from our authors-corpus academic and inns-of- court plays as well as civic pageants, masques, and other occasional pieces. Translations and closet drama were also excluded, with the exception of Cornelia, which was retained in the inter- est of considering all available plays by Kyd. Because Arden ofF aversham was probably composed around 1590, we use plays likely to have been first performed in 1580-94 for our authors-corpus. We seek the ideal of relying only upon well-attributed, sole-authored plays within these limits, so plays of uncertain authorship, dubious attribution, and questionable textual provenance are excluded. Thus we exclude John a Kent and John a Cumber because our only early witness is Anthony Munday's holograph manuscript ('Huntington Library MS 500'), and although it is unlikely that in this case a professional dramatist served merely as a scribe copying out others' work-unlikely but not impossible-we cannot assume that the play was sole-authored rather than co-authored by Munday (Werstine 2013, 107-47, 245-8).3 Doctor Faustus and Dido, Queen of Carthage are excluded as collaborations, though we retain The Jew ofM alta minus its Heywoodian prologues and epilogue. We have departed from our main bibliographical source, Alfred Harbage's Annals of English Drama, 975-1700, only where new research is persuasive and sound, as with Soliman and Perseda being Kyd's (Harbage and Schoenbaum 1964; Freeman 1967, 140-6; Wiggins and Richardson 2012; 2013; Erne 2001, 157-67; 2014). Table 9.1 lists the resulting authors-corpus of 34 plays (plus Arden of Faversham), along with their dates of first performance, the source texts we use, their dates of publication, and genres. In constraining our authors-corpus to well-attributed, sole-authored plays between 1580 and 1594, we use fewer plays and proceed in a different manner from the most recent attribution studies of Arden ofF aversham. Arthur F. Kinney relied upon a corpus of 112 plays dated 1580-1619, consid- erably larger than ours; and larger still is the selection of 135 plays listed in the Literature Online (LION) database as 'first performed within the two decades 1580-1600' employed by MacDonald P. Jackson for his attributions (Kinney 2009a, 91; Jackson 2006a, 256-7; 2014a, 18-19). Ward E. Y. Elliott and Robert J. Valenza were explicitly undiscerning in the creation of their corpus: 'we would start with whatever text we could get, not troubling over which version we had, or what vagaries might be presented by the original-spelling text' (Elliott and Valenza 1996, 208). There is a trade-off to be made between on one hand including as many plays as possible in an authors-corpus, which is helpful because random fluctuations cancel each other out in large data- sets (by the so-called Law of Large Numbers), and on the other hand confining oneself to plaus- ible candidate authors. In eliminating unlikely candidates for authorship-such as Ben Jonson, whose career seems to have started in the late 159os-and excluding non-professional drama, our trade-off privileges the most demonstrably relevant evidence while necessarily risking the omis- sion of pertinent outliers. Decisions about the size of an authors-corpus to be tested naturally affect the representation of particular writers in that corpus, and we include only four early Shakespeare plays where other investigators use rather more. When counting how many features are shared by a text to be attributed and the known corpus of each candidate author, candidate authors represented by only a few plays have, as it were, few 'opportunities' to display the common features. Thus, all careful 3 As a result of these constraints, Munday is entirely excluded from the corpus. Mun day's uncontested dramatic output is post-1594 and/or collaborative. Moreover, the auspices and identity of the translator of Fedele and Fortunio (STC i9447), though probably Munday, remain uncertain (Hirsch 2014). d) e u n nti o e (c urce dat 1599 1594 1598 1594 1594 1592 1592 1594 1632 1591 1592 1601 1592 1632 1584 1597 1590 1590 1594 o S Source STC 12233 STC 12267 STC 12308 STC 12265 STC 11622 STC 15086 STC 22894 STC 16678 STC 17088 STC17050 STC 17080 STC 17082 STC 17083 STC 17088 STC 17086 STC 17090 STC 17425 STC 17425 STC 17437 Genre Heroical Romance Comedy History Romantic Comedy Tragedy Tragedy Tragedy Classical History Classical Legend Classical Legend Classical Legend Pastoral Comedy Comedy Classical Legend Comedy Heroical Romance Heroical Romance History 4. 9 5 1 0- hored plays, 158 First performance 1587 1c,89 1590 1591 1587 1592 1588 1583 1588 1585 1590 1589 1591 1583 1593 1587 1587 1592 ut a e- ol m and well-attributed, s Title Alphonsus, King of Aragon Friar Bacon and Friar Bungay James the Fourth Orlando Furioso Corne/ta The Spamsh Tragedy Saltman and Perseda The Wounds of Ctvtl War Carnpaspe Endymion Ga//athea Love's Metarnorphosts Midas Mother Bombie Sappho and Phao The Woman the Moon in Tamburlaine the Great 1 Tamburlaine the Great 2 Edward the Second a h s r ave er er er Arden of F9.1. Author Greene, Robert Greene. Robert Greene, Robert Greene, Robert Kyd, Thomas Kyd, Thomas Kyd, Thomas Lodge. Thomas Lyly,John Lyly,John Lyly.John Lyly,John Lyly,John Lyly,John Lyly,John Lyly,John arlowe, Christoph arlowe, Christoph arlowe. Christoph e M M M bl a T e daturce 1633 1594 1600 1584 1594 1599 1593 1595 1623 1597 1623 1623 1592 1594 1584 1590 o S Source STC 17412 STC 17423 STC 18376 STC 19530 STC 19531 STC 19540 STC 19535 STC 19545 STC 22273 STC 22314 STC 22273 STC 22273 STC 733 STC 25781 STC 25784 STC 25783 Genre Tragedy Foreign History Comedy Classical Legend Foreign History Biblical History History Romance Comedy History Comedy Comedy Tragedy Comedy Moral Moral e c n a erform 1589 1593 1592 1581 1589 1594 1591 1590 1594 1592 1591 1590 1591 1590 1581 1588 p st Fir Title The Jew of Malta The Massacre at Paris Summer's Last Will and Testament The Arraignment of Pans The Battle of Alcazar David and Bethsabe Edward the First The Old Wife's Tale The Comedy of Errors Richard the Third The Taming of the Shrew he Two Gentlemen of Verona Arden of Faversham The Cobbler's Prophecy The Three Ladies of London The Three Lords and Three Ladies of London T 9.1. Continued. Author arlowe, Christopher arlowe, Christopher Nashe, Thomas Peele, George Peele, George Peele, George Peele, George Peele, George hakespeare, William hakespeare, William hakespeare, William hakespeare, William Uncertain Wilson, Robert Wilson, Robert Wilson, Robert e M M S S S S bl a T ARDEN OF FAVERSHAM AND 'THE PRINT OF MANY' 145 experimenters adjust the raw figures from their tests to compensate for bias arising from particu- lar writers forming larger or smaller proportions of the total authors-corpus. The source text used for each of the plays is also outlined in Table 9.1, with base transcriptions from LION checked and corrected against facsimiles from Early English Books Online-Text Creation Partnership (BEBO-TCP). Since our analysis concerns word-use and distribution, and not orthog- raphy, spelling was regularized using VARD 2, a software tool developed by Alistair Baron for regularizing spelling variations in historical corpora (Baron, Rayson, and Archer 2009; Lehto, Baron, and Rayson 2010). Spelling was modernized, but early modern English word forms with present-tense -eth and -est verb-endings, such as liveth and darest, were retained.4 Homographs such as the noun and verb spelt as will were tagged in the source texts to enable distinct counts for each, which is particularly important for a play like Arden containing a character named Black Will. Methods In English writing, the words that appear most often are those that perform syntactic rather than semantic functions, such the, and, and at, and other so-called function words.5 Nouns, adjectives, and verbs appear less frequently. The various computerized tests used in authorship attribution operate on words that appear at different general rates of occurrence in English. The Zeta test employed by Hugh Craig and Arthur F. Kinney operates on words that are relatively infrequently found in English. In our analysis of this layer oflanguage, we have adopted Zeta and introduced a newer method, Random Forests. The counting of frequently occurring words-especially func- tion words-has had considerable success in the field of authorship attribution. For these we employ the Delta and Nearest Shrunken Centroid techniques. Our techniques and the kinds of words they look for will now be described in language that simplifies the technical details avail- able elsewhere (Juola 2006). 4 According to David Crystal, the Northern -(e)s form gradually replaced the Southern -(e)th during the seven- teenth century (Crystal 2008, 188-92). Interpretation of the choices between verb endings 'is not straightforward', although 'metrical constraints are the usual explanation', and while dialect is a possible explanation for the use of particular verb-endings, Crystal demonstrates other reasons, such as rhyme, use of formulaic or mock-formulaic language, archaism, and characterization (Crystal 2008, 188, 190-1). 5 The 221 function words used in our analysis, with differentiations between homograph forms (using our labels) indicated within square brackets, are: a, about, above, after, again, against, all, almost, along, although, am, among, amongst, an, and, another, any, anything, are, art, as, at, back, be, because, been, before, being, besides, beyond, both, but, by[adverb], by[preposition], can, cannot, canst, could, dare, darest, dareth, did, didst, do, does, doing, done, dost, doth, down, durst, each, either, enough, ere, even, ever, every,few,for[ adverb] ,for[ conjunction], for[preposition],from, had, hadst, has, hast, hath, have, having, he, hence, her[adjective], her[personalPronoun], here, him, himself, his, how, I, if, in[adverb], in[preposition], into, is, it, itself, least, 1ike[adjective], /ike[adverb], 1ike[preposition], likest, /iketh, many, may, mayst, me, might, mightst, mine, most, much, must, my, myself, neither, never, no[adjective], no[adverb], no[exclamation], none, nor, not, nothing, now, 0, of, off, oft, often, on[adverb], on [preposition], one, only, or, other, our[ royalPlural], our[ truePlural], ourselves, out, over, own, past, perhaps, quite, rather, round, same, shall, shalt, she, should, shouldst, since, sith, so[adverbDegree], so[adverbManner], so[ conjunction], some, something, somewhat, still, such, than, that[ conjunction], that[ demonstrative], that[ relative], the, thee, their, them, themselves, then, there, these, they, thine, this, those, thou, though, through, thus, thy, thyself, till, to[adverb], to[infinitive], to[preposition], too, under, until, unto, up[adverb], up[preposition], upon[adverb], upon[preposition], us[royalPlural], us[truePlural], very, was, we[royalPlural], we[truePlural], well, were, wert, what, when, where, which[interrogative], which[relative], while, whilst, who[interrogative], who[relative], whom, whose, why, will[verb], with, within, without, would, wouldst,ye,yet,you,your,yours,yourself, and yourselves. 146 JACK ELLIOTT AND BRETT GREATLEY-HIRSCH Delta Introduced by John Burrows in 2002, the Delta test first counts the frequency of occurrence of a large number of high-frequency words in the text to be attributed and in the authors-corpus considered collectively (Burrows 20ow; 2003; Hoover 2004; Argamon 2008 ). When counting the frequencies of occurrence in the authors-corpus, the individual counts for individual texts in that corpus are retained; that retention allows us to derive (a) a mean figure for the whole authors-corpus, and (b) a measure oft he variation from that mean shown by various texts in the authors-corpus. That variation is called the standard deviation from the mean.6 Delta next derives the z-score for each word's frequency of appearance in the text to be attrib- uted, which score reflects its difference from the frequency of that word's occurrence in the authors-corpus. For each word, this z-score is calculated by subtracting the mean frequency of its occurrence in the authors-corpus from the frequency of its occurrence in the text to be attrib- uted, and then dividing this figure by the standard deviation of the word's occurrence in the authors-corpus. This last step is vital as it allows us to express the frequencies of occurrence of each word in the text to be attributed (in this case, Arden of Faversham) in terms of the general variability in using that word shown by all the candidate authors considered collectively. This method reveals which words are being used in the text to be attributed at rates that are signifi- cantly larger or smaller than the corresponding rates in the authors-corpus. Moreover, division by the standard deviation amplifies the effect of those cases where the authors-corpus is relatively consistent (has a low standard deviation) in its frequency of use of a particular word. Because the z-scores are scaled to both (a) absolute rates of usage of a word and (b) the standard deviation in the rates of usage of that word, they are equally comparable, one with another, for words that are fairly common and those that are fairly uncommon in the works being tested. Having derived z-scores that represent, for each word, the differences between the authors- corpus considered collectively and a text to be attributed (in our case, Arden of Faversham), the Delta test next calculates the z-scores for the differences between the authors-corpus considered collectively and each of its sub-corpora belonging to each of the authors in the authors-corpus. That is, it now derives z-scores for how, on each word, all the Greene plays in the authors-corpus differ from the authors-corpus considered collectively, then how all the John Lyly plays do, and so on. This produces z-scores lists for Greene, Lyly, Kyd, and so on. Next we compare the z-scores list from Arden of Faversham with the list derived by the same method for each candidate author. The phenomenon we want to track is where a particular author uses certain words more often than is usual (that is, 'usual' for the author-corpus) and also uses certain other words less often than is usual (again, 'usual' for the author-corpus) and where the same thing is true of the same words in the list for Arden of Faversham. The calcula- tion to achieve this is, for each word, to deduct the candidate author's z-score from the Arden of Faversham z-score, throwing away the sign if negative to leave just the absolute difference, and then to take the numerical mean of all the differences to produce the statistic called delta for the 6 For instance, the five-figure sets (8, 7, 5, 9, u] and (2, 17, 5, 3, 13] each have a mean of8, since this is one-fifth of 8 + 7 + 5 + 9 + 11 (= 40) and also one-fifth of 2 + 17 + 5 + 3 + 13 (= 40). But the figures in the second set are more widely different from their mean than are those in the first set. To express this greater variation the standard devi- ation for each set is derived by multiplying by itself (that is, squaring) each figure's difference from its set's mean, taking the mean of the resulting squares, and then finding that mean's square root. For the first set this is the square root of one-fifth of (8 - 8) + (7 - 8) + (5 - 8) + (9 - 8) + (11 - 8), which comes to 2, and for the second set this is the square root ofone-fifth of (2 - 8) + (17 - 8) + (5 - 8) + (3 - 8) + (13 - 8), which comes to about 5.9. The second set's higher standard deviation reflects the greater 'spread' from the mean that its figures would show if visualized spa- tially along an axis. ARDEN OF FA VERSHAM AND 'THE PRINT OF MANY' 147 collective difference of Arden ofF aversham from that candidate author. The author with the low- est delta score is the one most likely amongst the candidate field examined to be author of Arden of Faversham. Nearest Shrunken Centroid This approach was originally developed in 2002 for use in bioinformatics, but has since been adopted for stylometry (Tibshirani et al. 2002; Jockers and Witten 2010 ). The method constructs a series of authorial profiles based on the counts of the frequencies of each of a set of words (t yp- ically, function words) in each text or, more commonly, in each arbitrarily sized subsection (seg- ment) thereof. To understand first the notion of Nearest Centroid we may take a trivial case and suppose that we are counting the occurrences of just two words, the and a, in three equally sized short segments labelled A, B, and C. The counts might be as follows: the a A 26 10 B 15 21 c 31 12 The method treats a pair of counts as the Cartesian coordinates for a point in two-dimensional space, as in a traditional x/y flat graph: • • • 148 JACK ELLIOTT AND BRETT GREATLEY-HIRSCH It is clearly visible that, on these counts, segments A and Care closer to one another than either is to segment B, and there are simple mathematical formulas for quantifying these distances across the two-dimensional plane of the graph. We could imagine extending this procedure to count the occurrences of three words across text segments A, B, and C, and for each segment this would give us a triplet of numbers that we could treat as Cartesian coordinates in three-dimensional space and plot as an x/y/ z graph. In this space too, the distances between points (each representing a segment's three occurrence-counts) can be seen and easily computed. Beyond three dimensions it is difficult for human minds to visualize the resulting spaces and graphs, but the mathematical formulas for measuring the dis- tances between points work just as well no matter how many dimensions are used. In this method we use as many dimensions as there are function words to be counted, so that for each text segment there is produced a list of coordinates, each representing the count for that segment's occurrences of a particular function word. Thus if we are tracking the occurrences of 100 function words then segment A will generate a list of 100 coordinates that might begin 15, 35, 3, and after 94 more numbers end with 53, 44, and 76. This list represents a point in 100-dimensional space and its distance from any other point in that space-representing any other segment's counts for these same function words-may be calculated. When this is done for text segments by different authors the points tend to cluster in space, with the points for segments by the same author appearing nearer to one another than they are to the points for segments by different authors. The centroid of a cluster of points is, as its name implies, the point at its centre that represents the average of all the points' coordinates. The average of all points for a given author is that author's centroid. When we count the function word occurrences for a new text segment whose authorship is to be attributed we arrive at a new list of 100 coordinates that can be plotted in the existing 100-dimensional space, and we can find the author whose centroid is nearest to this new point, and attribute the segment to that author. The refinement that gives Nearest Shrunken Centroid its name involves diminishing the significance of the counts for function words that are inconsistently used by authors. If the segments for a given author have widely differing occurrences of a particular word (high stand- ard deviation) then this word's contribution to the authorial centroid is proportionally scaled down. At the end of this process, authors are represented by a profile of scores for only their con- sistently used words. Using the same distance-measuring process the new segment is attributed to the authorial centroid closest to it. Random Forests This methods chooses a likely author for a text segment using a large number of decision trees (Breiman 2001; Tabata 2014; Kennedy and Hirsch 2016). A decision tree is a machine-learning algorithmic technique for deriving from a set of existing data with a shared attribute, called the training set, a series of rules for predicting that attribute for a new piece of data. For exam- ple, a hypothetical family may choose to go to their local cinema based upon various criteria such as the weather (in fine weather they prefer to have a picnic), the genre of what is showing in the cinema (they prefer romantic comedies to horror stories), and the availability of a fam- ily ticket price reduction. From the data about their cinema-going in the past year, including the weather (temperature, wind-speed, and precipitation), the genres of the films they saw, and the prices they paid for tickets, a decision tree algorithm would attempt to reconstruct

Description:
STC 19535. 1593. Peele, George. The Old Wife's Tale. 1590. Romance. STC 19545. 1595. Shakespeare, William. The Comedy of Errors. 1594. Comedy. STC 22273 not straightforward', although 'metrical constraints are the usual explanation', and while dialect is a possible explanation for the use of.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.