ebook img

Computers and the Humanities. Vol. 37 PDF

480 Pages·2016·3.01 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computers and the Humanities. Vol. 37

ComputersandtheHumanities 37: 1–2,2003. 1 TheACHPage TEI Consortium Members Meet in Chicago Oneconsequence ofbeinganincorporated legalentity,whetherornotfor-profit,is thatyouhavetoholdannualmembers’meetings.TheTEIConsortium–thataugust inheritor of the pioneering work in standardizing and expressing a coherent view ofwhattextreallyisandhowitshouldbemanagedandrepresentedindigitalform carriedoutoverthelastdecadeofthe20thcentury(partlyunderthesponsorshipof theACH)–isnoexception.Itssecondannualmembers’meetingdulytookplacein thepleasantsurroundings oftheNewberryLibrary,Chicagooverasunnyweekend thislastOctober. This year’s programme had a strong digital library theme full of controversy and debate, with keynotes from Susan Hockey (an elder statesperson of the TEI, and now a professor of library science at University College London) and from John Price Wilkin (doyen developer of the digital library at the University of Michigan), nicely complemented by thought-provoking contributions from Mark Olsen (University of Chicago), Bill Kretzschmar (University of Georgia) and Wendell Piez (Mulberry Technologies). All the presentations given are avail- able from the TEI Members’ website; see http://www.tei-c.org/Members/2002- Chicago/. Attheconclusion ofitssecondyearofoperation asamembershipConsortium, theTEIalreadyhassomething onwhichtocongratulate itself:thecomplete trans- lation into XML of its Guidelines which was published in June 2002. Perhaps of more long-term significance however is the fact that the technical work needed to bringthatstandardfullyuptodatehasalsobegun.NewTEIworkgroupsreporting at the Meeting include one focussing on character encoding issues, chaired by ChristianWitternfromKyotoUniversity,andoneonstand-offmarkuptechniques, chaired by David Durand of Ingenta PLC. The TEI editors Syd Bauman and Lou Burnard also presented plans for moving the production of the Guidelines into a new XML schema based environment. A newly formed work group on issues relating to SGML to XML migration also met; its recommendations, due early nextyear,willbeofconsiderable impact. As a community-driven initiative, the TEI must focus on the interests of its members as well as undertaking the necessary technical work to maintain the standard. One interesting indication of how this community-focus may develop was the notion of forming particular Special Interest Groups which might act as advocacy groups, identifying training and support opportunities within particular 2 THEACHPAGE areas, notably perhaps the digital library community. Another major talking point throughoutthemeetingwasthechallengeofhowtobuildupmembership,atatime when academic budgets are under increasing pressure. Members were realistic, but optimistic: the TEI scheme has become part of the intellectual landscape. No serious alternative has yet emerged as a solution to the problems addressed by the Poughkeepsie Conference of 1987 which set up the TEI, though those same problems are still being re-discovered. A major task for the new TEI is therefore toexpanditsoutreachactivities, toensurethatitstrainingandsupportservicesare widelyavailable andofahighprofessional standard. With that in view, the meeting endorsed proposals to organize amajor training event in the summer of 2003, in addition to next year’s inevitable annual meeting in November. The investment made in the TEI by hundreds of organizations and individuals world-widesuggests thatthemomentumwhichledtotheformationof theTEIConsortiumwillcontinue torollitforwardastheonlywayofmaintaining theTEIGuidelines.For,asBasilBuntingsaidofPound’s‘Cantos’,“theyresemble theHimalayas:youcanignorethemifyoulike–butyouwillhavetogoanawfully longwayround.” LouBurnard EuropeanEditoroftheTEIGuidelines ComputersandtheHumanities 37: 3,2003. 3 Introduction: ACH/ALLC 2001 Proceedings LORNAHUGHES1andJOHNLAVAGNINO2 1HumanitiesComputingGroup,NewYorkUniversity,USA 2CentreforComputingintheHumanities,King’sCollegeLondon,UK We are delighted to present a selection of work presented at the ACH/ALLC 2001 conference (New York University, June 2001). Thanks to the contributions ofmanydedicatedpeoplewhotookontheworkofdoingresearch,writingpapers, reviewing papers, assembling the program, and making the endless arrangements toprovidehousing,food,water,andInternetaccessforthedelegates,wewereable to preserve our reputations as the Rebecca of Sunnybrook Farm and the Sarastro (respectively) ofthehumanities-computing world. Inoneoftheconference’shighlights, theRobertoBusaAwardwaspresentedto ProfessorJohnF.BurrowsoftheUniversityofNewcastle,Australia.Thistriennial award was instituted by the ACH and ALLC “to recognize outstanding achieve- ment in the application of information technology in humanistic research”. The citationforthepresentation oftheawardread: The Association for Computers and the Humanities and the Association for Literary and Linguistic Computing grant the Roberto Busa Award for 2001 to John F. Burrows, for exemplary contribution to scholarship in humanities computing. Hisimaginativeapplicationofstatisticstotheliteratureofthe17th tothe20thcenturieshasinspiredagenerationofcolleaguesandstudents.More than anyone else he has bridged the gap between literary criticism and statis- tics, enriching both areas and making the latter a part of mainstream literary scholarship. In doing so he has helped to put humanities computing on solid ground. Professor Burrows’s Busa Award lecture is followed in this issue by a small selection of other work presented at the conference. This year’s theme was “Digital Media and Humanities Research”, and a comparison with past confer- ence programs and proceedings shows how large a place questions of publication have come to occupy in the field: where once the principal focus of research was in scholarly analysis, today as much work goes into publication, and it is almost routine for scholars to assume that they are addressing an unlimited audi- ence around the world. No collection this small can represent the breadth even of what went on at the conference, but we hope that this sample will serve as some indication ofwherethefieldwasatthestartofthemillennium. ComputersandtheHumanities 37: 5–32,2003. 5 ©2003KluwerAcademicPublishers. PrintedintheNetherlands. Questions of Authorship: Attribution and Beyond A LectureDelivered on theOccasion oftheRobertoBusaAward ACH-ALLC 2001,New York JOHNBURROWS CentreforLiteraryandLinguisticComputing,UniversityofNewcastle,Australia Just here, in Washington Square, not long after the inception of this great univer- sity, Henry James’s Catherine Sloper is last seen when, “picking up her morsel of fancy work, she had seated herself with it again – for life, as it were”. By1901, a coupleofgenerationslater,shemightwellhavefoundintellectual andimaginative enrichment by enrolling as a woman student at NYU. By 2001, she might even have joined ourranks. Hadshe taken up our sortof work(for life, asitwere), she wouldhavefoundherself,aswedonow,amongfriendsfromallaroundtheworld. In an occasional lecture of this kind, you are very much at risk. Given so free a hand, I might move at a stately pace from acknowledgements to reminiscences, and then onward to a wealth of geriatric platitudes. I shall try not to overdo the reminiscences. Likeeverybodyelse,Ishallconfinemyplatitudes tothoseIoptim- istically regard as the priceless wisdom of old age. I shall even try to show you something new. But my debts, to begin with them, are many and profound. I am most grateful to you all for this award. When Iconsider the wonderful things that so many of you are doing, opening up far-reaching new avenues in humanities research, I am astonished at being chosen as successor to Father Busa. I do feel bound, of course, to accept your decision. After this news first reached us, one of our daughters brought out an old cartoon from The New Yorker. It shows a man reading hiswifealetterhehasjustreceived. Hesays, “ThisistheoneIhavebeen waitingfor.ItisaletterofapprovalfromeveryoneIhaveeverknownforeverything Ihaveeverdone”.IhaveyourletterandIthankyou. In an era when we all rely increasingly on institutional support for our researches, I have been generously treated by my university, by the Australian ResearchCouncil, and(inthevitalearlystages) byStJohn’s College,Cambridge. I have had excellent advice over the years from first-rate statisticians and programmers.1 Myresearch-colleagues inNewcastleandelsewherehavegivenme their support and encouragement for up to twenty years.2 Only my family have surpassed theminpatience andenthusiasm. When your choice falls upon an old man, you must expect some tales of those far-off days when our public universities were still recognizable as universities; whengender wasatermweusedingrammar;andwhenacomputerlesspowerful 6 JOHNBURROWS than a modern laptop would fill a room. In those days – il y avait une fois – I spent asabbatical leave inOxford working onJane Austen’s literary vocabulary. I wascompiling alittleconcordance ofMansfield Park,focussing onwhatItookto be a set of interesting words. My method was to read attentively, to underline my chosenwords,andthentoentertheminanelaboratecard-index.Onmostmornings oftheweek,wedroveouryoungestdaughtertoaday-nursery inBanburyRoad.A little waydowntheroad, the OxfordUniversity Computing Centrewasalready in business and Susan Hockey and Lou Burnard were soon to begin helping people like meto use smart new mark-up systems like COCOA.But computers were not yet on my horizon and I did not even know that the Centre existed. I was not to benefitfromSusanandLou’shelpandgoodadviceforsomeyearsafterthat. Another memorable episode occurred when I gave my first paper on the outcomeofthesehand-counts. ItwasataconferenceattheUniversityofAdelaide and the Chair of my session was an elderly Professor of English. As I walked towardsourroomforthesession, Icameupbehindhimandoneofhisoldfriends. Theypaused brieflyatourdoor.Thefriendglanced intotheroomandsaid,“Well, Iwishyoujoyofthatone”.Thenhewentoffhappilytohearadifferentpaper.My Chairman, however, rose handsomely to the occasion. After a few introductory remarks, he said, “And now Dr Burrows will take us back to the roots of our subject”. He then sat down beside me and slept peacefully for half an hour. As I finished my presentation, he stirred himself and said, “Dr Burrows has taken us backtotherootsofoursubject.Arethereanyquestions,orshallwebreakfortea?” Morethanaquarterofacenturylater,Iamstilldiggingbusilyaway.Idohopethat youwillnotallgotosleep. My hand-counting of words taught me three things. The many words I had singledoutfellintosuchrevealingpatternsthatitbecameclearthattheschemewas wellworthpursuing and ought tobeenlarged. Itwasclear thatJane Austen wrote soexactlythatshewasaperfectsubjectforthisapproach. Andthetaskbecameso onerousthathand-counting wouldnotdo.So,in1979,webeganmakingmachine- readable versions of Jane Austen’s novels. From the first, we had the full support ofJohnLambert, thenDirectorofComputing Servicesinouruniversity. (Afterhe retired in 1991, John joined my research-centre as its programmer. He continued enthusiastically and effectively in that role until his death last month.) But, back therein1979,thepreparatoryworkmovedonmorerapidlyinCambridge,whereSt John’sCollegemademewelcomeforamemorableyearandwhereJohnDawson’s generosity andmanyskillscameintoplay. For what seemed a long time, it was all input and I despaired of intelligible output.Inthosedays,however,scholarswerestillfreetopursuetheirideaswithout being forced into premature publication. It was also feasible, at that time, to seek research-funding forprojectsbasedmoreonwhatmightbedonethanonwhatone hadalreadydoneandeventosucceedagainsttheopposition ofsomeassessors. At thetimeofmyfirstapplication,IhadyettopublishinthisfieldandIwasfacedwith anassessorwhobluntlydeclaredthatitwouldbeagrossmisuseofpublicfundsto QUESTIONSOFAUTHORSHIP:ATTRIBUTIONANDBEYOND 7 supportwhatIproposed.Fortunatelytherewereotherassessorsandthecommittee didnotagreewithhim.TheprojectwasfundedandoursacrilegiousassaultonJane Austen went ahead atabetter pace. Ourable and good-hearted research-assistant, AlexisAntonia,joinedtheteamandIemployed myfirstprogrammer. Enoughofthesefondmemories.Itistimeformetotakealeapoftwentyyears and examine some of the longer-term results. Ihave chosen, as my title indicates, toconsiderthematterofauthorship,mychiefpreoccupationinrecentyears.3 After glancing atthe present state of theart in this area of computer-assisted research, I wouldliketodescribeapromisingnewapproachforwhichyouarethefirstpublic audience. I shall turn, finally, to contemplate some possible developments for a futureinwhichmanyofyouwillplaymorepartthanI. Theideathattextsofdoubtfuloriginmightbeattributedtotheirtrueauthorsby counting up the occurrences of salient features originated long before the advent of the computer. But the computer has enabled us to pick out less conspicuous features and to count them far more swiftly and accurately than our predecessors. If only in the initial gathering of data, the heroic labors of scholars like Thomas Mendenhall or G. Udney Yule can soon be surpassed by anyone with the ability to run a user-friendly program on texts downloaded from an online archive. That is not to say, of course, that our analyses of the results are always as judicious as thoseofthepioneers. Afteralivelyperiodofexperimentandcontroversyduringthe1970sand1980s, whenM.W.A.Smithwasastalwart gatekeeper, wehaveentered aphaseofquiet butworthwhileprogress intheareaofattribution. Wehavenotescaped thebattles thatsoexcitethemediawheneveraputativespecimenofShakespearianaislocated. Onmatterstobeconsideredlater,wehavenotyet,asJosephRudman(1998,2000, p. 170) would wish, fixed upon a single method of analysis or identified a “veri- fiably unique style”. Butour methods are increasingly reliable, our use of them is evermorerigorous, andwehavevastnewcorpora tostrengthen ourcomparisons. Most of the recent and current work focuses on phenomena that occur very frequently rather than on those whose rarity is their hallmark. The most common wordsofthelanguagehavebeengivenmoreattentionthaneverbefore,whetherin studieswherethewordsareallowedtochoosethemselvesonnoothergroundthan theirrelativefrequencyorelseinstudieswheredistinctions betweenlexicalwords andfunction-words areattempted, theformerthen being discarded astoosubject- specific. In a searching comparative study, Richard Forsyth and David Holmes (1996)haveshownthatverycommonstringsofcharacters(oftenoverridingword- boundaries) can yield somewhat more accurate results than the common words themselves. The case for still maintaining an interest in the common words and for allowing them to choose themselves rests upon two grounds. Such procedures involve the least possible intrusion by the investigator and they offer the most transparently intelligible results. The precise form of a given authorial problem has a strong bearing on the choiceofasuitablemethodofanalysis.Amongthemainanalytical toolscurrently 8 JOHNBURROWS in use, artificial neural networks (Waugh, 2000) and discriminant analysis (Craig, 1999a)arebothattheirbestinclosedinquiries wheretheonlyquestioniswhether Specimen X belongs to Set A or Set B. Although such problems do arise, this degree of closure is usually confined to the last stages of a larger inquiry, when most candidates have been eliminated. Neither of these approaches is as service- able in more open games, especially because they lack the transparency so useful for exploring the evidence. Cluster analysis (Craig, 2000) is preferable to either of these methods for such exploratory purposes and can also yield corroborative evidence.Itsdisadvantages arethatthedetailedevidenceremainsopaqueandthat, forjustthisreason, tworathersimilarspecimenscanturnawayfromeachotherin theearlyiterationsoftheprocessandendupmuchmorewidelyseparatedthanthey should.Thelifeoftheliterarystatisticianisfullofsuchlittledisappointments. But there ismuchtobesaidforusing methods that allowsuch outcomes tobestudied andtheircauseseasilyunderstood. It is partly for this reason that, as David Holmes (1998, p. 114) has said, prin- cipalcomponentanalysis(pca)iscurrentlythefirstportofcallincomputer-assisted studies of authorship. It has been put to good use, both in authorial and in quite other studies, byagrowing number of scholars. See,for example, Baayen (1996), Binongo (1995), Binongo and Smith (1999), Burrows and Craig (2001), Burrows andLove(1999), Craig(1999b), Forsyth(1999), McKenna(1999), Tabata(1994), Tweedie(1998).Someofthisworkdealssuccessfully withFrenchandLatintexts. The fact that pca displays the phenomena most responsible for a given outcome meansthattheevidenceismoretransparent thaninthemethodsmentioned above. And, especially with the shrewd use of control-specimens evident in some recent studies, pcacanyieldextremelyaccurate inferences. In turning to its chief limitation and proposing a remedy, I do not wish to displace pca but to complement it and so to consolidate it in the role for which it is best fitted, in the middle stages of the game. The crucial point is that pca is not intrinsically a test of authorship but only of comparative resemblance. This offersusgreatversatility, butcancreateasubtletrap.Figure1,forexample,treats of two sets of English Restoration verse, written by Shadwell and Tate respec- tively. It also shows that entry X, representing a further specimen, lies closer to Tate’s entries than to Shadwell’s. But entry X actually represents Absalom and Achitophel, a poem that is undoubtedly the work of John Dryden. This does not mean that the test has failed or even that the result is genuinely misleading. It simplymeansthat,though XmaybemorelikeTatethanShadwell,theauthorship of X is not adequately tested here. Properly stated, the original question here is not “Who is the author of X?” but “Do the entries in this scatter-plot fall into any intelligible pattern?” Since the two main sets of entries do fall into authorial sets, furtherauthorialinferencesarenotinappropriate. Butmorestringenttestingwould beneededbeforeanysuchinferencescouldbetakenseriously. Muchofourrecent progress hasresteduponourincreasing reluctance toacceptfacileresults. QUESTIONSOFAUTHORSHIP:ATTRIBUTIONANDBEYOND 9 Figure1. Shadwell, Tate,andatestpiece. Text-plotfor the99most common words ofthe corpus. A test like this would serve better if we knew in advance that X was the work of either of a given pair of candidates and not of anybody else. But, as I have said, such cases are so unusual that even a stringent test in the closed form “A vs B” is seldom of much use except in the end-game. The authorship of a group of three Restoration poems, variously assigned to Aphra Behn and Rochester, is of this form (Burrows, 1995). The extent of Tate’s additions to Dryden’s text in TheSecond Partof“Absalom and Achitophel” isasubtler example(Burrowsand Love,1999,pp.169–174).Wherethereareseveralcandidates,successiveiterations in the closed form can offer a ponderous way forward. The selection of particular “markerwords”onthegroundthatagivenauthormakesmuchmore(ormuchless) use of them than other authors of the time is helpful in the later stages of these more complex analyses, allowing us to take full advantage of the latent power of principal component analysis. 10 JOHNBURROWS And yet there is a conflict between what we wish to do and what our tests permit. Whereas our tests are best fitted for cases in the closed form, we would wish totackle more open cases –including those cases where Ais aprincipal but uncertain candidate withno single obvious rival and even those cases where there is no recognized candidate at all. The clandestine political and erotic verse of the Restoration and(asHaroldLovepointsoutinaforthcoming book)theintellectual journalism of the nineteenth and early twentieth centuries abound in open cases of both kinds. Until wecan make progress with problems like these, our role will remain strictly ancillary to the traditional work of scholarship, corroborating or casting doubt on the product of other sorts of evidence but rarely opening fresh ground. We are still bound, it appears, by Richard Bailey’s dictum (1979, p. 7), proposed over twenty years ago and lately put even more strictly by Binongo and Smith (1999, p. 464). We should confine ourselves, that is to say, to cases where thechoice lieswithinanarrow rangeofwell-matched setsandweshould proceed withonlytwoauthors’textsatatime. Againstthisbackground,Iwishtosketchanewpathforward.AtthetimeIwas told of this award and invited to deliver this lecture, I was just beginning to write anarticleabouttheworkIhavebeendoinginthelasttwoyearsorso.Itwastobe entitled “Delta:ameasureofstylistic difference and aguide tolikely authorship”. (The term “Delta” was chosen to represent D for difference and also as a gesture ofrespect forUdneyYuleandthose otherpioneers inourfieldwhotriedtoderive simple expressions of stylistic difference. Udney Yule’s Characteristic K remains oneofthemostremarkable oftheseattempts.)Aversionofmyarticle willstillbe necessarybecauseapubliclectureisnoplaceforthethoroughexpositionofanew technique.Butitseemedappropriatetoacknowledgetheimmensecomplimentyou havepaidmebypresenting youwithafreshcontribution toourprocedures. Thefirststep istoestablish afrequency-hierarchy forthemostcommonwords in a large group of suitable texts. The texts are grouped in subsets representing the work of numerous authors of an appropriate era. With texts of a bygone era, itis usual and desirable to standardize spelling and toexpand contracted forms of expression inorder toreduce the influence oftrivial oraccidental variations. (Just suchvariations werestudied bysomeofthepioneers ofstylometry. Butwhenone works with common word-counts, they are merely a distortion.) It has also been our practice, in Newcastle, to tag some of the more common homographic forms inorder to distinguish the different uses ofwords like soand that.(The effects of taggingarebeneficialbutthecostishigh,theintrusionuponthedataisregrettable, and the interchange ofinformation withcolleagues ismade moredifficult.) When the word-counts have been made, the frequencies are standardized as proportions of each authorial subset so that the larger subsets do not exert an undue influence onthecomposition orranking ofthehierarchy. Working on these lines, we have formed a database of verse by twenty-five poets of the English Restoration period.4 These yielded the frequency-hierarchies wehaveusedforseveralrecentstudiesofauthorshipbasedonprincipalcomponent

Description:
Kluwer, 2003. — 480 pp.After 2004 - Language Resources and EvaluationIssue 1 – February 2003 TEI Consortium Members Meet in ChicagoIntroduction: ACH/ALLC 2001 ProceedingsQuestions of Authorship: Attribution and Beyond A Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, N
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.