ebook img

Computers and the Humanities. Vol. 33 PDF

422 Pages·2.189 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computers and the Humanities. Vol. 33

ComputersandtheHumanities 33: 1–9,1999. 1 ©1999KluwerAcademicPublishers. PrintedintheNetherlands. The Text Encoding Initiative at 10: Not Just an Interchange Format Anymore – But a New Research Community ELLIMYLONASandALLENRENEAR ScholarlyTechnologyGroup,BrownUniversity Abstract. Mylonas and Renear introduce a volume of selected papers from The Text Encoding Initiative 10th Anniversary Conference, held at Brown University in November 1997. The Text EncodingInitiative(TEI),waslaunched in1987 andsponsored bytheAssociationfor Computers and the Humanities, the Association for Literary and Linguistic Computing, and the Association for Computational Linguistics. It had as its original objective the development of an interchange language for textual data. This effort was completely successful and the TEI Guidelines are now widelyacceptedasthestandardinterchangeformatfortextualdata.MylonasandRenearalsonote thattheTEIhasaccomplishedtwoothermajorachievements:ithasproducedapowerfulnewdata description language (which is influencing the development of new WWW standards); and, most importantly, ithas motivatedthe development of an entirelynewresearch community, focused on understandingtheroleoftextstructureandmarkupintheuseofemerginginformationtechnologies inculture,scholarship,andcommunication. Keywords:TEI10,SGML,TEI,markupconference,researchcommunities This special issue of CHum contains a selection of papers from TEI10, the Text EncodingInitiative’stenthanniversary conference. Overonehundred peoplefrom three continents attended the meetings in Providence, Rhode Island in December 1997. Andy van Dam of Brown University opened the proceedings with a retro- spectivekeynoteoncomputing,textandhypertext,andJonBosakofSunCorpora- tionconcluded withhisvisionoftheimpactoftheTEI’sprinciples forhumanities text encoding, through their contributions to XML, on the management of infor- mation in the corporate world. A summary of Bosak’s address is included in this specialissue.Attendeesheardpresentationsof21papers,onabroadrangeoftopics including knowledge representation, electronic publication, linguistic corpora and markup theory. Papers took many forms: project reports, theoretical analyses, and encoding conundrums. Unfortunately, therewasnotroomtoincludeallthepapers inthisvolume, although allwereinformative and thought-provoking. Weselected thirteen, eightofwhichareincluded asfullpapersandfiveasprojectreports. 2 ELLIMYLONASANDALLENRENEAR We begin the volume with DeRose’s description of XML, the TEI’s influence on its development and its relevance to the TEI and its users; this is a look at the historyofthefutureoftextencoding.Romaryetal.provideanoverviewofaserver and front end for accessing and using disparate collections of TEI encoded data. PophamandBurnarddescribeprogressindevelopingandapplyingtheTEIheader, anearlycontribution oftheTEItometadata inSGML,andonethatisparticularly relevantnow,withtheproliferation ofelectronic textsanddigitallibraries.Several papers and reports primarily address encoding theory, both for SGML in general and the TEI in particular. Birnbaum, Cournane and Flynn discuss the problems of using the TEI writing system declaration to encode texts in multiple character sets, Welty and Ide examine ways to use content markup to represent and access knowledge representation andsemantic structures. Simonsshowsoneapproach to integrating HyTime’s architectural forms with object-oriented modeling systems, and Smith explores some alternative ways of encoding textual variants. Finally, Bauman and Catapano’s report weighs alternative ways of encoding the physical structureofbooks. Thelastgroupofpapersandreportsfocusesonimplementationandusageofthe TEI. The first two papers delve into some linguistic encoding problems: Resnick, Broman and Diab describe their use of TEI markup to encode parallel corpora; Estival and Nicholas show their application of TEI encoding to capture syntactic analyses. The final three reports describe the experience of using TEI to encode electronic theses and dissertations (Erickson), the conversion and representation ofWWWpagesfortheanalysisoflinguistic usage(Walker)andtheimpactofthe TEIonelectronictextdeliverybytheOxfordTextArchive(MorrisonandFix).The volume concludes with Jon Bosak’s closing keynote, which ended the conference with an uplifting but sobering message; the TEI community has been ahead of the pack with respect to text encoding up to now. TEI knowledge and experience has influenced the development of new widely adopted standards, but the rest of the world has caught up, and we’re now faced with an almost insurmountable opportunity. This volume, while it provides a permanent record of some aspects of the conference, ofnecessity omitsmanyfascinating conversations, good papers, good meals,andfunnysongs.Somethingsjustcan’tfit,evenintoaspecialissue. November1987, PoughkeepsieNewYork On a wet snowy day in early November of 1987 a diverse group of 32 scholars gathered for a meeting at Vassar College in Poughkeepsie, New York. They came from many different disciplines and represented leading professional societies, libraries,archives,andprojectsinanumberofcountriesinEurope,NorthAmerica, andAsia. Theyweremeeting toaddress whatthey believed tobe avery large and urgentproblemthatwas,fairlysuddenlyitseemed,facingthehumanities–anarea ofhumanendeavorthatisnotexactlyknownforoftenbeingconfronted withlarge THETEXTENCODINGINITIATIVEAT10 3 urgent problems. But these scholars believed they had identified such a problem, andtheyhadtraveledfromvariouscountriesaroundtheworldtoPoughkeepsie, in November,tosolveit. The problem in question was the proliferation of systems for representing textual material on computers. These systems were almost always incompatible, often poorly designed, and growing in number at nearly the same rapid rate as the electronic text projects themselves. This threatened to block the development of the full potential of computers to support humanistic inquiry – by inhibiting the sharing of data and theories, by making the development of common tools arduousandinefficient,andbyslowingthedevelopmentofabodyofbestpractice inencoding systemdesign. Perhaps because of the diversity of professions, disciplines, and nationalities – or perhaps simply because the participants were mostly humanities scholars – thediscussions overthosetwodayswereoftendifficult:assumptions varied,inter- ests diverged, participants sometimes became frustrated with one another, and the weather continued to be uniformly miserable. Those present could not have been particularlyconfidentofsuccess,evenattheirinitialandapparently,ifdeceptively, modest goal of developing acommon interchange format for electronic texts. But we doubt that any of them could have had even a glimmer of what would in fact resultfromthosetwoNovemberdaysinPoughkeepsie: First,thattheywouldindeedcreate,againstratherlargeodds,asuccessful and widelyaccepted interchange formatfortextualdatainthehumanities. Second, that this representation scheme would be not merely an inter- change format, but in fact a new data description language, one that would providescholarsusingemerginginformationtechnologieswithapowerfulnew purchase on cultural material (so powerful in fact, that its innovations would be picked up by the wider commercial publishing and information manage- mentcommunityandbecomepartofthefoundation fortheglobalinformation infrastructure). And, third, and perhaps most importantly, that an unexpected result of these effortswouldbesomething thatisoneoftherarestandmostvaluable ofintel- lectual events: the emergence of an entirely new research community – and, moreover, acommunity organized around one ofthe most important scientific projects facing us today: developing a deep understanding of textual commu- nication in order to support and theorize the use of emerging information technologies incultureandcommunication. Inwhatfollowswewillexpandalittleontheseachievements.1 Theyareofcourse rather astonishing accomplishments to trace to that small, improbable meeting in Poughkeepsie –butthefactsspeakforthemselves. 4 ELLIMYLONASANDALLENRENEAR ABriefPartialHistory AshortaccountofthehistoryofTEIwillsituateoursubsequentremarks–andthe occasionoftheconference.2 The meeting described above was convened by the Association for Computers in the Humanities and funded by the National Endowment for the Humanities. It resulted in a statement – “The Poughkeepsie Principles” – articulating nine prin- ciples that were to frame the development of a set of text encoding guidelines.3 The organization of the work of developing the guidelines was then undertaken by the three TEI sponsoring organizations: The Association for Computers in the Humanities, the Association for Literary and Linguistic Computing, and the Association for Computational Linguistics. A Steering Committee was organized from representatives of the sponsoring organizations, and an Advisory Board of delegates from various professional societies wasformed. Tolead theactual work two editors were chosen, and four working committees appointed. By the end of 1989wellover50scholarswerealreadydirectlyinvolvedandthesizeoftheeffort wasgrowingrapidly. The initial phase resulted in the release of the first draft (known as “P1”) of the Guidelines in June 1990. A second phase, now involving 15 working groupsmakingrevisionsandextensions,immediatelybeganandreleaseditsresults throughout 1990–1993. Then, after another round of revisions, extensions, and supplements, thefirstofficialversion oftheGuidelines wasreleased inMay,1994 (“P3”). Early on in this process a number of leading humanities textbase projects adopted the Guidelines as their encoding scheme – while the Guidelines were still very much a moving target of rapidly changing drafts – identifying problems and needs and proposing remedies. A wider community was introduced to the Guidelines through workshops and seminars, which ensured a steady source of experience useful to support continuing development. As more scholars became acquaintedwiththeGuidelines,comments,corrections,andrequestsforextensions arrivedfromaroundtheworld. Intheendwellover100scholarsfrommanydisciplines,professions,andcoun- tries,wereactiveinthecoregroup thatwasdeveloping theGuidelines. Thisalone would make the Guidelines an exemplary achievement in collaboration, one on a scalefairlyrareinthehistoryofthehumanities. Thelargeandcollaborative natureoftheTEIalsomakesithardtoidentifyand distribute credit – without sliding down a long slippery slope.4 But as this is an introduction toacelebratory volume, wewillnotentirely relegatecredit toafoot- note:sowesaluteherethetwoeditors,MichaelSperberg-McQueen (Universityof IllinoisatChicago),andLouBurnard(OxfordUniversity)whowere–andstillare –theuncontested intellectual leaders ofthisenterprise. Now we would like to say a few more words about the three specific achieve- ments mentioned above – interchange guidelines, data description language, and researchcommunity. THETEXTENCODINGINITIATIVEAT10 5 TheTEIasInterchangeGuidelines The original motivation of TEI was to develop interchange guidelines that would allowprojectstosharedata(andtheoriesaboutdata)andpromotethedevelopment ofcommontools.Itwasevidentby1987thatthiswasurgentandimportant.Itwas also a major challenge, given the diversity of data, disciplines, national commu- nities, and the rapidly accelerating rate at which projects were being conceived and carried out. The only hope for a successful solution would be to secure a wide involvement of the relevant disciplines, professions, and communities. But this itself was a problem. How could one have, working productively together, Danish data archivists, German medieval philologists, American corpus linguists, Japanesecomputerscientists,Norwegianphilosophers,BritishChaucerspecialists, Canadiansoftwareengineers,Americanlibrarians,...andsoon?Butthisisexactly what the TEI did, as over 100 scholars, from almost as many specialties, dozens of countries, and representing many different major communities, professional societies, libraries, and projects labored long and hard to produce the Guidelines, making it almost as much a monument to courage and social imagination as to scientificinnovation andscholarship. It is easy to talk about accommodating diversity, about interdisciplinarity, about multiculturalism, about communication across various intellectual gaps and divides.Butfeweffortsalongtheselinesaremorethansuperficial.Theexperience of the TEI makes it evident why this is so. Not only do different disciplines have quite different interests and perspectives, but also, it seems, different conceptual schemes:fundamentally differentwaysofdividinguptheworld.Whatisanobject of critical contest and debate for one discipline is theory-neutral data for another, andthen completely invisible toathird. What isstructured andcomposite forone fieldis atomic for another and an irrelevant conglomeration to athird. Sometimes these differences of perspective occurred within a discipline, sometimes across historical periods of interest, sometimes across national or professional commu- nities. Practices that would seem to have much in common could vary radically – andyethave enough incommonfor differences tobeaproblem! Andevenwhere agreementinsubstance wasobtained,disagreements overnuances, ofterminology forinstance, couldderailatenuous agreement. It was hard not to feel that this was a rare opportunity to see genuine, which is to say, difficult, communication across disciplines – if not for the first time, then perhaps for the first time in such a committed and sustained way. It was an extraordinarily liberal experience, for those of us lucky enough to participate in even a small way, as specialties, points of view, and interests were not discounted or obliterated, but, as a matter of policy and principle, taken up, accommodated, andbroughtintoproductive heuristic contactwithoneanother. Wewon’t pretend tohavefigured outjusthowthiswasbrought offsosuccess- fully. It goes without saying that part of the explanation was the intellect and determination of the editors; part the political skills of the Steering Committee, and part the collective wisdom, good will, and industry of the participants in 6 ELLIMYLONASANDALLENRENEAR the committees and working groups. But the obstacles – rooted in such differ- ing research cultures, interests, assumptions, and long-standing differences of approach, make the success of the TEI, purely as a social achievement, some- thingveryextraordinary, andimprobable, andstillwithoutacompletely satisfying explanation. Butasuccess itwas.Andtodaymostscholarly textbases, aswellasmuchdata inotherfields,usesomeformoftheTEIGuidelines. TheTEIasaDataDescriptionLanguage Inasense theremaynotappear tobemuchofadifference, technically, between a common language for exchanging data and alanguage fordescribing it. Butthere is a difference in nuance and ambition, and it’s a difference that makes a differ- ence. The principal original objective of the Poughkeepsie meeting was simply to developalanguage toallowresearchers, regardless ofhowtheywererepresenting their information within their projects, to share their data, and their claims about it, with other projects, without ambiguity, and in a format that made it available tomachine-processing with generic tools. Many features –line breaks, alphabetic characters, diacritical marks, punctuation, pagination – seemed easily susceptible tothissortofstandardization;simpleconventionswereneededtodistinguishadded information fromoriginal content; somestandards forindicating provenance were in order etc. This much seemed manageable certainly. But ambition grew and the task became both broader and more specific: to define a standard language for usefully identifying and describing the salient features of text as viewed from the perspective of some particular discipline or methodology. Now this was daunting tobesure,giventhesocialandtechnological situation described above.Butitwas also in its way still fairly modest: an effort simply to standardize how we would presentourdataonthoseoccasions whenwewishedtoshareit. But very quickly it became clear that there was an instability in the effort to elaborate a language that only represented, without advancing or improving, our current practice and understanding. This may be partly because current practice was already so subtle, partly because of the difficult constraints and demands of interdisciplinarity, and partly because the discipline of formal definition – which requires us to make explicit our implicit understanding – can also help improve and extend our knowledge. But it was inevitably also partly the recognition that here wehad anopportunity, not justto represent past practice, but toimprove and enablefuturepractice. TheTEIthusquicklybecameadatadescriptionlanguageoffargreatersubtlety and power than any that had before been produced. It improved our ability to describe our data, not just our ability to exchange descriptions. It supported a disciplined elucidation of our practices, methods, and concepts, and it opened the way to new methods of analysis, new understandings, and new possibilities for representation and communication. Evidence that this is indeed a language of THETEXTENCODINGINITIATIVEAT10 7 new expressive capabilities can be found in the experience of pioneering textbase projects which draw on the heuristic nature of the TEI Guidelines to illuminate textualissuesandsuggest newanalysesandnewtechniques. But we also would note confirmation of this accomplishment from an unusual direction, at least for the humanities. As Jon Bosak (Sun Microsystems) and Steve DeRose (INSO Corporation and Brown) make clear in their contributions to this volume, techniques pioneered by the Text Encoding Initiative have been takenupintowiderdevelopmentoftechnicalandengineeringstandardssupporting networkedcommunication.Somuchsothatitnowseemslikelythatinayearorso, whenanyonefollowsalink–whethertolookatmedicalx-rays,buyanappliance, orwatcharockvideo –theywillbeusingprotocols, namelyXML’sXPointerand XLink,baseddirectly ontechniques developed bytheTEI.5 TheTEIasaResearch Community As impressive as the two previous accomplishments – a successful interchange formatandapowerfuldatadescription language –are,webelievethatinfactthey palebeforethethird. Try this exercise: thumb through this issue of CHum, glancing at the titles, abstracts,authornamesandaffiliations,andbibliographiccitations.Theimpression should be clear and incontrovertible: an entire research community is flourishing now that did not exist at all before 1987. This new community appeared almost from out of nowhere, and spread rapidly around the world, infiltrating a wide variety of professions, disciplines, and institutions in many countries. Like any research community it defines itself with its shared interests, concepts, tools, and techniques, and has developed, or is developing, the characteristic organizational apparatus: conferences, journals, researchcenters, emaillists,andthelike. The subject matter of this new research community is textual communica- tion, with the principal goal of improving our general theoretical understanding of textual representation, and the auxiliary practical goal of using that improved understanding to develop methods, tools, and techniques that will be valuable to other fields and will support practical applications in publishing, archives, and libraries. By empirical or sociological standards this community is already enormously successful, as shown by its rapid growth in participants, published research, and influence. Ultimately, of course, its significance will be assessed by analytic and normative standards, not by a popularity poll, however expert and plausible the respondents. We might ask, for instance, following the framework proposed by the Hungarian philosopher of science Imre Lakatos, whether this is a research community with “degenerating problemshifts”, where ad hoc hypotheses of little newpredictivepowerareadducedonlytoprotectoldertheoriesfromrefutation;or whether it is one with “progressive problemshifts”, where problems generate new theories, eachmorepowerfulandexplanatory thanthepreceding. 8 ELLIMYLONASANDALLENRENEAR Weknowthatitistoosoontoanswerthisquestionjudiciously,andthatacredi- bleanswerwouldrequiretheproductionofananalysiswhichnoonehasyetcarried out. But, made incautious by the festive mood of an anniversary occasion, the editorsofthisintroductionconfesstheyconsidertheevidencealreadyintobedeci- sive. It is clear from the literature, including the articles in this volume, that there is an explosion of new connections (to knowledge representation systems, formal semantics and ontology, object orientation methodologies, etc.), new theorizing (non-hierarchical views of text, antirealism, etc.), and new applications and tools. There can be little doubt of the vitality of this community. And that is itself an explanation of its sociological success: researchers and practitioners around the world have taken a look at what we think may be called “the TEI community” ...andareplacing theirbets. InouropinionthisisthemostimportantfinalresultofthePoughkeepsieconfer- ence: a flourishing research community, providing new insights into the nature of text, and new techniques for exploiting the emerging information technologies. Given the sudden and staggering contemporary significance of information tech- nologiesincultureandscholarship wethinkthisnewcommunityhasformednota momenttoosoon. November1997, Providence,RhodeIsland Ten years almost to the day after the Poughkeepsie meeting, scholars from many disciplines and from around the world again gathered on another snowy day in November. This time the meeting was in Providence, Rhode Island, over 100 people from North America, Europe and Australia attended, and they were there not to confront a seemingly intractable problem of interchanging textual data, but tocelebratetheenormoussuccessoftheTextEncodingInitiativeinaccomplishing aresolution ofthatproblem–andmuch,muchmorebesides. HappyBirthdayTEI!Hatsoff! Notes 1 TheTextEncodingInitiative,liketheproverbialelephant,lookslikedifferentthingstodifferent people. This introduction in no way represents an official or standard account of the TEI, and is certainly not a balanced, let alone comprehensive account. We suspect that our perspective may differquiteabit,inspecificclaimsaswellasemphasis,fromthatofthosewhoweremorecentrally involved. 2 AhistoryoftheTEIintroducesthespecialtripleissueofCHumofpapersfromtheinitialwork groups: Ide and Sperberg-McQueen, “The Text Encoding Initiative: ItsHistory, Goals and Future Development.”CHum29:1,1995. pp.5–15.Adetailedbibliographyofarticleswrittenabout and fortheTEIisat:http://www.uic.edu/orgs/tei/talks/teij32.html.ThemainTEIpageatUIC isat:http://www.uic.edu/orgs/tei.SeealsoSusanHockey(withDonaldWalker).“Developing EffectiveResourcesforResearchonTexts:CollectingTexts,CatalogingTexts,TaggingTexts,Using TextsandPuttingTextsinContext”inLiteraryandLinguisticComputing,8,1993,pp.235–242. 3 http://www-tei.uic.edu/orgs/tei/info/pcp1.html THETEXTENCODINGINITIATIVEAT10 9 4 Butwe’lltakeonetentativestepdownthatslipperyslopeandmentionafewpeoplewhowethink ofasthe“fathers”and“mothers”oftheTEI;theywereallSteeringCommitteememberswhoplayed particularlyimportantrolesinitsformationanddevelopment:SusanHockey(thenatOxford,nowat theUniversityofAlberta),NancyM.Ide(Vassar),DavidBarnard(thenatQueen’sUniversity;now at Regina); Donald E. Walker (Bell Communications Research; deceased); and Antonio Zampolli (Pisa).Throughout thisperiodtheTEIreceivedsupport fromtheUSNationalEndowment forthe Humanities,DirectorateGeneralXIIIoftheCommissionoftheEuropean Union,andtheAndrew W.MellonFoundation. 5 SeeDeRoseandDurand,“TheTEIHypertextGuidelines,”CHum29:3(1995).SteveDeRose,who wasinstrumentalindevelopingtheTEIlinkingstructures,istheeditoroftheXLinkandXPointer specifications.MichaelSperberg-McQueenisalsoaco-editoroftheW3CXMLspecification.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.