ebook img

k dixez? A corpus study of Spanish Internet orthography PDF

20 Pages·2010·0.36 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview k dixez? A corpus study of Spanish Internet orthography

k dixez? A corpus study of Spanish Internet orthography ............................................................................................................................................................ Mark Mysl´ın and Stefan Th. Gries University of California, Santa Barbara, CA, USA D ....................................................................................................................................... o w Abstract n lo a Newtechnologieshavealwaysinfluencedcommunication,byaddingnewwaysof de d communication to the existing ones and/or changing the ways in which existing fro forms of communication are utilized. This is particularly obvious in the way in m which computer-mediated communication (CMC) has had an impact on com- h mistuicnsicoaftiaonn.ewInlytehvisolevxinpglofroartomryofarStpicalen,iswheInatreerncoetncoerrtnheodgrwapithhystohmatedcifhfaerrafcrtoemr- ttp://llc standardSpanishspelling.Threetypesofdeviationsfrom‘thenorm’areconsid- .ox ered:areduction(post-vocalicd/[(cid:2)]deletionin-ado),atransformation(namely ford the spelling change from ch to x), and reduplication (of characters). Based on a jo u corpusofapproximately2.7millionwordsofregionallybalancedinformalinter- rn Correspondence: a StefanTh.Gries net Spanish compiled in 2008, we describe the spelling changes and discuss a ls.o DepartmentofLinguistics, variety of sometimes interacting factors governing the rates of spelling variants rg UniversityofCalifornia such as overall frequency effects, functional (pragmatic, sociolinguistic, and a SantaBarbara, iconicity-related)characteristics,andphonologicalconstraints.Wealsocompare t U SantaBarbara, ourfindingstodatafromMarkDavies’s(2002)CorpusdelEspan˜ol(100million niv CA93106-3100,USA e words, 1200s–1900s, http://www.corpusdelespanol.org) as well as other sources rs Est-gmriaesil@:linguistics.ucsb.edu and relate them to the discussion of the register/genre of Internet language. ity o ................................................................................................................................................................................. f C a 1 Introduction Inthisarticle,weareconcernedwithanaspectof liforn communication that is often regarded as somewhat ia a New technologies have always influenced commu- peripheral, namely orthography. CMC and other t S nication, by adding new ways of communication to forms of electronic discourse have given rise to an theexistingonesand/orchangingthewaysinwhich forms of orthography that deviate from standar- ta B existing forms of communication are utilized. This dized conventions and are motivated by segmental a rb isparticularlyobviousinthewayinwhichcomputer- phonology, discourse pragmatics,andotherexigen- a ra mediated communication (CMC) has had an ciesofthechannel(e.g.thefactthattypedtextdoes o impact oncommunication. One very obviously vis- notstraightforwardlyexhibitprosody).Morespecif- n A imbluenwicaaytioinnwishitchheClaMrgCehnausmbbeeernoinfflnueewncilninggucoismtic- iocratlhlyo,grwapehwyillofexpInloterrenesteveSrpaalnnisehw, twrehnicdhs iins tbhye pril 2 8 expressions such as ‘regular’ words (e.g. bcc, blog, now the third most widely used language on the , 2 0 podcasting, etc.), emoticons and similar symbols Internet (Fig. 1). 1 0 (e.g. ‘;-)’, ‘:-|’, ‘:-S’, etc.), abbreviations standing In keeping with the dominant role of English on for complete phrases (e.g. lol for ‘laughing out the Internet, there is now quite a lot of work on loud’, brb for ‘be right back’, IMHO for ‘in my Internet English. However, in spite of its growing humbleopinion’,AFAIKfor‘asfarasIknow’,etc.). importance, there is still very little work on the LiteraryandLinguisticComputing,Vol.25,No.1,2010. (cid:2)TheAuthor2010.PublishedbyOxfordUniversityPresson 85 behalfofALLCandACH.Allrightsreserved.ForPermissions,pleaseemail:[email protected] doi:10.1093/llc/fqp037 AdvanceAccesspublishedon6January2010 M.Mysl´ınandSt.Th.Gries SIOcannotbecharacterizedasaarigidone-to-one grapheme mapping from standard Spanish since, while being somewhat systematic in some respects, it also exhibits considerable internal variation. For example, in (1) above, que is spelt in two different ways: k and q. In this article, we attempt to explore and characterize several of the most visible ways D in which SIO differs from standard Spanish. ow n Therefore, before we discuss a few case studies lo a in more detail, we would like to provide a brief d e overview of the kinds of patterns we observed in d our corpus (whose makeup will be outlined fro m below) for future work on this topic. We classified h the deviations from standard Spanish spellings ttp into two categories: one with differences that were ://llc fairlyclearlyrelatedtoinformalSpanishphonology, .o x and one where phonological relations were much fo rd less apparent. The distinction between the cate- jo u Fig. 1 Top 10 languages on the Internet (in millions of gories was done heuristically; nothing theoreti- rn a users; Internet WorldStats 2009) cally relevant hinges on it (cf. Tables 1 and 2 for ls overviews).1 .o rg Obviously, space does not permit a full-fledged a analysis of all these ways in which SIO differs from t U n standard Spanish orthography. In this largely iv characteristics of Internet Spanish (e.g. Cervera e exploratory article, we therefore decided to focus rs 2L0li0st1e,rrMi o2r0a0la2)2.0M01o,sMt oorfenthoesdeeslotusdRieisosa2re00s1t,riacntldy onthreedifferentmechanismsbywhichSIOdiffers ity o from standard Spanish: f C impressionistic, offer no quantitative data, and a address only the chat genre, some under the – a deletion, namely from -ado to -ao; lifo assumption that it is representative of all Internet – a change, namely from ch to x; rnia Spanish.Infact,thereisnotanevenmodestlycom- – repetitions, e.g. from hola to hoola. a prehensive overview of the many different facets of t S Theremainderofthisarticleisstructuredasfollows. a n Internet Spanish, which, when combined, can Section 2 discusses our data and methods, in par- ta change the orthographic characteristics of standard B ticular how we compiled a corpus of SIO. Sections a Spanish considerably. Cf. (1) for an example of rb 3, 4, and 5 discuss our case studies in detail, pro- a SitpsasntaisnhdaInrdteizrendetoortrhthooggrarpaphhyyin(h(e2r)e.after SIO) with vdiadtiangasdwetealillaosnthtehequreatnrtiietvaatilveanmdecthleoadnsinwgeoufstehde, ra on A (1) hace muxo k no pasaba x aki,, jaja,, pz the linguistic factors we studied, and the results. p aprovehio pa saludart i dejar un komentario Section 6 concludes. One terminological remark is ril 2 8 aki n tu space q sta xidillo:)) ps ia m voi in order: although the medium of communication , 2 <http://profile.myspace.com/ is, strictly speaking, written, we will refer to inter- 01 0 index.cfm?fuseaction¼user.viewprofile& locutors and their communication as speakers and friendID¼198138943> utterances because SIO, while shaped by the (2) Hace mucho que no pasaba por aquı´, jaja. medium, exhibits many of the characteristics of Pues aprovecho para saludarte y dejar un spoken language (cf., e.g. Baron 2000, 2003 and comentarioaqu´ıentuspacequeesta´ chidillo. Crystal 2001 for good overviews of the different Pues ya me voy. kinds of CMC and some of their characteristics). 86 LiteraryandLinguisticComputing,Vol.25,No.1,2010 AcorpusstudyofSpanishInternetorthography Table 1 Phonologically motivated features ofSIO Standardorthography Internet Examples Phonologicalcorrelate orthography ([aeiou])[bdg]([aeiou]) \1\2 hablabas!hablaas Intervocalicvoicedplosiveelision saludos!saluos megusta!meusta u([aeio]) w\1 buena!wena Pre-/w/voicedplosiveelision D igual!iwal13 ow n ([aeiou])s \1h somos!somoh Post-vocalic/s/debuccalization lo a ([aeiou])s \1 llegamos!llegamo Post-vocalic/s/elision d e ^es[^aeiou] ^s? eesspta´er!o!taspero P(craen-/scCo/m/bei/naepwhaitehrepsiosst-vocalic/s/elision) d fro m ([aeiou])[bv] \1v iba!iva Post-vocalic/b/spirantization h ([^aeiou])[bv] \1b vemos!bemos Non-post-vocalic/b/isaplosive ttp h hacer!acer hhasnophoneticvalue ://llc ch sh echo!esho /tR/deaffrication .ox fo rd jo u Table2 Non-phonologically motivated featuresofSIO rn a Standardorthography Internetorthography Examples ls.o [sz] c([ei]) [csz][sxz]\1 hermosa!hermoza hice!hize rg a hizo!hiso hace!haxe t U [usz] ch x cuidate!cxidate hizo!hixo n iv mucho!muxo e rs t([aeiou]) th\1 besitos!besithos ity c([aou]) qu([ei])g([^ei]) k\1 poco!poko cuidate!kuidate o quiero!kiero agrega!akreka f C a cu qu cuando!quando lifo [iy] ll [iy] muy!mui mis!mys rn ia llego!iego llamar!yamar a ([dmtq])u?e$ \1 porque!porq te!t t S ie e quiero!kero an ta B a 2 Data messages rather than stance toward the actual pro- rba ra filepages).Themeanlengthofentryinthecorpusis o n As a first step, we needed to compile a corpus of 19.5 words (sd¼36.2). Table 3 provides an over- A p ilannfogrumagael RSIOto.cTraowtlhsaetleecntedd,fworeuumssedantdhesosccirailpntientg- voibetwainoefd.the web sites from which the data were ril 2 8 working web sites (cf. Gries 2009 for details as well While it is hard to assess to what degree this , 2 0 as R Development Core Team 2008). In May 2008, corpus is representative of, or balanced with 1 0 we compiled a corpus of approximately 2.7 million regard to, Internet Spanish, we consider it rela- words of informal Internet Spanish, consisting of tivelyrepresentativeinthesensethatthehighlyper- user-generated descriptions of photos and videos, sonal discourse of the social networking sites and as well as comments on these and postings on the less intimate, more diversified discussions socialnetworkingsiteprofiles(which,althoughgen- of the photo and video sites should go some way erallytermed comments, oftenexpressgreetings and to represent differently involved sub-categories of LiteraryandLinguisticComputing,Vol.25,No.1,2010 87 M.Mysl´ınandSt.Th.Gries Table 3 Composition oftheSpanishInternet Orthography corpus Website Genre Approximate percentageofcorpus www.fotolog.com Comments 43 www.hi5.com Comments 27 www.fotolog.comandwww.youtube.com Descriptions 21 www.youtube.com Viewercomments 9 D o w n Internet language. In addition, further efforts were published dictionaries (<http://en.wiktionary.org/ lo a madetoensuresomedegreeofdialectalrepresenta- wiki/Category:es:Vulgarities>, accessed June 1, de d tivity, as Spanish varies widely by country and 2009). fro region. To this end, we used the search-by-country m feature of both Fotolog and hi5.com and selected h tsthhpeeesaekfiitrnhsgtrecetohuurseneetrrsyu,.stheUressRinfrgsocmtrhipetesfaricinehdnidsocflrfiiismctisainloafStepelaaycnhhisahor--f 3PoRset-dvuocctaiolincs[(cid:2)in] DSepleeltliionng:in ttp://llc.ox Words Ending in -ado fo vested all of the comments on the profile pages of rd each of these friends (each of the three country 3.1 Introduction jou representatives had between 100 and 200 friends). rn Asurprisingmajorityofthecountryrepresentatives’ ThefirstfeatureofSIOweinvestigateisthedeletion als of a single character in a way that reflects pronun- .o friends were in fact from other Spanish-speaking rg countries, which seemed roughly distributed by ciation in certain speech varieties. Intervocalic a voiced stops are generally spirantized but can be t U population, with Mexico, Spain, Argentina, and n the USA well represented. No measure was taken deletedcompletelyintheonsetofanunstressedsyl- ive to ‘correct’ this phenomenon, as it is a kind of lable in rapid or informal speech, with d being the rs most commonly affected segment (cf., for example, ity self-balancing middle ground between equal repre- o sentationofdifferentgeographicvarietiesofSpanish Pin˜eros 2009, p.319).Llisterri (2002, p.69) reports f C d as the most commonly elided word-interior seg- a aonfdsppearokperosrtoifoenaaclhrevparreiesteyn.t2aNtioonrebgaiosendalosnorntiunmgbfeear-s ment in his chat corpus, and looks closely at words liforn ture exists for YouTube videos, so the sampling ending in -ado, generally a past participle marker, ia a method was simply to use the web site’s search- and its various inflections (pp. 73–76), correlating t S orthographic omission with colloquial and espe- a by-language function and then use R to automati- n callyharvestallcomments anddescriptionsforsev- cially Andalusian Spanish phonology. ta B eral thousand of the most viewed videos uploaded a 3.2 Methods rb by Spanish-speaking users. a ra In order to compare our SIO data to other data, In our own investigation of d-deletion, we focus o n we utilized two other sources. First, we used Mark only on -ado without its feminine and plural- A DEsapvaine˜so’sl ((C2d0E0)2)as1a00refmeriellniocen cowropruds tCoorrpepursesednetl itnoflceoctmedpavraerifarneqtsue-nacdiae,s-oafdowso,radnsdw-iathdads.elIentioonrdteor pril 2 8 standard Spanish orthography. Second, since much wordswithoutdeletion,wesearchedourcorpusfor , 2 0 Internet discourse involves many colloquial and all words ending in -ado or -ao. To avoid interfer- 1 0 vulgar terms, we also compiled a list of general ence from phenomena other than d-deletion, such Spanish vulgarities in all of their inflections based asapparentlytypographicallyerroneousd-insertion, on the list of vulgar-tagged words on the we immediately discarded the handful of word Wiktionary open-source Spanish dictionary, since forms that end in -ao in standard Spanish spelling, this Internet-user-generated list seemed more such as cacao ‘cacao’. We took these words to be inclusive and up-to-date than formally those occurring in the CdE more than five times 88 LiteraryandLinguisticComputing,Vol.25,No.1,2010 AcorpusstudyofSpanishInternetorthography Table 4 Type frequencies of -ado/-ao forms in the SIO significantly with regard to the numbers of forms corpusand inLlisterri (2002) that take -ado and -ao versus those that only take -ao ((cid:2)2¼0.1, df¼1, P(cid:2)0.75).5 SIOcorpus Llisterri’s(2002) One conclusion from this is that, while the two chatcorpus genres differ in terms of interactivity—more inter- only-ado 893 96 active chat data of Llisterri (2002) versus less inter- only-ao 419 104 -adoand-ao 277 65 active comment/description data in our SIO D Total 1589 265 corpus—they exhibit the same degree of d-deletion o w so characteristic of informal speech. nlo a d 3.4 Results 2: the 50 most frequent e Table 5 Distribution of the two spelling variants across d bothcorpora words in the SIO corpus versus the CdE fro m Data -ado -ao Total Inordertoexamined-deletionin-adomoreclosely h in frequent words and quantify differences with ttp SCIdOEcorpus 6547213 912 6646225 non-Internet Spanish, we compared the spellings ://llc Total 6994 93 7087 of the fifty most frequent forms in our list (after .o x the above-mentioned modifications) with their fo spellings in the CdE.6 For each word type we con- rdjo with-aobutneverwith-ado.Wefurtherrefinedour u structed a 2(cid:3)2 table of the kind exemplified in rn list of -ado and -ao matches by discarding a Table 5 (on the basis of the word pasado ‘past, ls – alternatespellingsoftheabove-mentionedstand- passed’).For thiskindoftable,wethendetermined .org ard -ao words, such as shao in the case of chao the percentage of d-deletion in each corpus for all a ‘ciao’; word forms (for pasado, 91/662 or 13.75% in the t U n – the proper name Pao; SIO corpus, and 2/6425 or 0.03% in the CdE). iv e – nao when occuring as the Portuguese na˜o ‘no’; Figure 2 shows the difference in percentage of rs – standalone occurences of ado and ao. d-deletionintheSIOcorpusandtheCdEasafunc- ity o tion of overall frequency in the SIO corpus (both f C Finally, since English words are not infrequent in a the corpus,3 we checked for English words by com- axesareonalogarithmicscale).Plottedwordforms lifo paring our list of matches with words occurring in representtheword’spercentageofd-deletionversus rn overall frequency in SIO, and corresponding dia- ia the British National Corpus over five times (based a on Kilgarriff 1996), but did not find any matches monds linked by dashed lines represent the word’s t S thatwerenotalsoSpanishwords.Forthisstudy,we pwearsceanttteasgteedofind-tdheeleCtidoEn)i.ntheCdE(whentheword anta thendecidedtoexaminethe50mostfrequentforms B In Figure 2, d-deletion is far more frequent in a (by combined -ado and -ao occurrence). rb our Internet corpus than in the CdE. Most of the a ra 3.3 Results 1: our SIO corpus versus 50 most frequent word forms occurred with dele- on tion zero times in the CdE, and only a handful of A Llisterri’s (2002) chat corpus thesedidthesameinSIO.However,d-deletiondoes p Withtheaboveconsiderations,wefound1589types not simply apply across the board; rather, there are ril 2 8 of-ado/-aowordsthatweredistributedasshownin several, sometimes competing or interacting factors , 2 0 Table 4 (with Listerri’s (2002) type frequencies for that motivate different proportions of d-deletion. 1 0 comparison).4 First, there is the factor of word frequency. In To determine whether the frequencies of types SIO, the percentage of d-deletion appears to have that allow -ao differs between Llisterri’s data and a roughly inverse relationship to frequency, so that ours, we computed a chi-square test for indepen- more frequent words tend to exhibit less deletion. denceontheitalicizedbottomtworowsofTable4. One reason for this may be that the most frequent Accordingtothistest,thetwodatasetsdonotdiffer words are more entrenched in the speakers’ LiteraryandLinguisticComputing,Vol.25,No.1,2010 89 M.Mysl´ınandSt.Th.Gries D o w n lo a d e d fro m h ttp ://llc .o x fo rd jo u rn a ls .o rg a t U n iv e rs ity Fig. 2 Percentages of-aoin SIO(compared toCDE) asafunction offrequency o f C a linguistic systems and are especially entrenched in functions, sometimes described by speakers as lifo the standard Spanish spelling, given the fact that indexingmorecoolnessthanthetraditionalspelling rn ia very few of them have special pragmatic functions (c.f., for example, <http://www.urbandictionary a that make them particularly frequent in Internet .com/define.php?term¼dood>, accessed July 16, t S a Spanish. Thus, speakers are more likely to simply 2009). However, Figure 2 also reveals that in n ta fall back on their standard orthography. SIO there is a strategy to make even the most B A second, related factor is pragmatics. The most ‘boring’ word a good place to exhibit ‘coolness’ arb frequent words not only exhibit less d-deletion for and, thereby, make it a likely place for d-deletion: a ra theabove-mentionedfrequencyreason,buttheyare even frequent and pragmatically rather neutral o n also not ‘good’ places to exhibit ‘coolness’ or, more words are likely to undergo d-deletion if they A p fgorromupally,otfo iynoduicnagte oanned’s ahffiipliatioInntetronetthe usosecriasl. awlhreialedyd-edxehleibtiiotnotohcecrufresatiunre7s.2o2f%SIOof. tFhoerienxsatmanpclees, ril 2 8 Metalinguistic awareness offers confirmation of of the standard form demasiado ‘too (much)’, , 2 0 this. For example, a pragmatically neutral word it occurs at the appreciably higher rate of 23.21% 1 0 such as English in does not attract much attention for the c-substituted variant demaciado. The evenwhenspelledinnovatively(e.g.innn).Thespel- standardformestado‘been’exhibits5.21%deletion, ling of a pragmatically relevant word such as dude, while the shortened stado exhibits 21.88% and incontrastwithin,issalienttospeakers:innovative the even shorter tado (not among these top 50) spellings are more creative and varied (e.g. dood, exhibits 92.11% deletion (35 of 38 instances). dyude) and can be explicitly relevant to discourse Thus, speakers of Internet Spanish appear to 90 LiteraryandLinguisticComputing,Vol.25,No.1,2010 AcorpusstudyofSpanishInternetorthography D o w n lo a d e d fro m h ttp ://llc .o x fo rd Fig.3 Theinteractionofmeaningandspellingforhelado(leftpanel,(cid:2)2¼35.38;df¼2;P<0.001;V¼0.65)andpesado jou (right panel,(cid:2)2¼26.26; df¼1; P<0.001;V¼0.68) rn a ls .o construct a distinct style/social identity in the way relatively clearly differentiated from the bulk of rg a their spelling reflects two interrelated rules: ‘mod- the data, fall into this category: three words that t U ify words that have special pragmatic functions are attested in both the SIO corpus and the CdE n iv and, if you are really determined to modify a (cagado ‘fucked up’, pesado ‘heavy, annoying, jerk’, e rs common-or-garden kind of word, then make big/ and helado ‘ice cream, blowjob’) and three that are ity several changes.’7 only attested in the SIO corpus (qliado/culiado o A final determinant is phonology. The only two ‘motherfucker’, aweonado ‘asshole’ (standard spel- f C a words in the top 50 that are not stressed on the ling ahuevonado), and pelado ‘thug, dude’). lifo penultimate syllable and thus virtually never Qliado/culiado and aweonado are not attested in rn ia undergo d-deletion in speech, sa´bado ‘Saturday’ the CdE with or without deletion, and in SIO all a and agrado´ ‘pleased’, do not exhibit d-deletion at of these (except one token of culiado) occur exclu- t S a all in SIO. sively with d-deletion. n ta 3.5 Results 3: vulgar in SIO ticuTlawrolyoifnttehreessetinfogr.mNso,thoelnaldyoaarendthpeeysathdeo,oanrleyptwaro- Barb We have already seen that d-deletion is more fre- with additional non-slang meanings, but they also a ra quent among words with a special pragmatic func- exhibit a lower rate of d-deletion, which reinforces o n tion. This is confirmed by a closer look at both the the correlation between informal meaning and A p visuolngarwteitrhmsthreeprewsoenrdtesdliinstFediguares 2vaunlgdaar coinmpoaur-r ivnefroyrmstraolnogrtchoorrgerlaaptihoyn.sMsuocrhetshpaetcitfhicearlleyd,utcheedrespaerle- ril 2 8 Wiktionary source. ling is strongly preferred with the vulgar meaning, , 2 0 butstronglydispreferredwiththenon-vulgarmean- 1 3.5.1 Vulgar words among the 50 most ing. These correlations are represented in cross- 0 frequent words tabulation plots (cf. Gries to appear: Section Among the top 50 d-deleted words in the corpus, 4.1.2.2) in Figure 3: observed frequencies that are slang and vulgar words exhibit the highest propor- larger or smaller than expected are plotted in black tion of d-deletion. In Figure 2, the six forms with and grey respectively, and the physical size of the the highest percentage d-deletion, which are number reflects the size of the effect (based on LiteraryandLinguisticComputing,Vol.25,No.1,2010 91 M.Mysl´ınandSt.Th.Gries Table 6 Percentage d-deletion among vulgarwords 4 Changes in Spelling: From Form -ado -ao Percentageof-ao ch To x culiado‘motherfucker’ 1 106 99.07 4.1 Introduction cagado‘fuckedup’ 14 39 73.58 aweonado‘asshole’ 0 45 100.00 The next feature we explore is a substitution that tirado‘fucked(pp.)’ 16 9 36.00 changes the number of graphemes representing a chingado‘fucked(pp.)’ 4 1 20.00 single phonological segment to one, resulting in a D cachado‘screwed(pp.)’ 0 2 100.00 o one-to-onesound-to-characterratio.Thisphenom- w Total 35 202 85.23 n enonisnotuncommoninSIO:llcanshortentoior lo a y to represent [j], and qu can shorten to q or k to d e d theresiduals).Notethat,forheladointheleftpanel, repTrehsiesnstp[ekll]i.ng change of ch to x to represent the from the Marascuilo procedure shows that ‘ice cream’ pronunciation [tS] is interesting in at least two h danifdferthferomambeiagcuhouosthmereasniginngifsicaonftlhyelwadhoeredaos nthoet roetshpeercstsu.cFhircsht,anitgeiss sbleigchautlsyemxohraescaomvaprileictyateodf pthhaon- ttp://llc meaning of ‘blowjob’ differs significantly from netic values in SIO (cf. Table 2), and although .ox both others. it represents [ks] in standard Spanish, it has no ford Incontrasttothisfrequentdeletionamongslang obvious and widespread connection to [tS] in jo u and vulgar terms, most words that occurred exclu- non-Internet Spanish. Morala (2001) speculates rn a sively without d-deletion in SIO have more formal ch!x occurs exclusively in Spain and is potentially ls .o meanings or functions: actualizado ‘updated’, edu- explicableonthebasisofbilingualismwithCatalan, rg cado ‘polite’, confirmado ‘confirmed’, significado in which x can represent the phonetically similar a ‘meaning’, agrado ‘(a) pleasure’, feriado ‘holiday’. [ˆ].Inourcorpusconstructedsevenyearslater,how- t U n ever, we have no trouble finding ch!x attested by iv e users from Latin America, and we therefore expect rs 3.5.2 Vulgar words as determined in Wiktionary ity factors besides Catalan bilingualism to be at play. o Comparing the list of words tagged as vulgar in Second,wehaveseenabovethatchangesofspel- f C a Wiktionary to the -ado and -ao forms in our ling are related to matters of ‘coolness’ and social lifo corpus yielded the six matches in Table 6. group affiliation, and Mayans i Planells (2000) sug- rn While 2077 of 10,367 non-vulgar -ado words gests shortenings of this kind are largely socially ia a occurred with deletion (20.03%), 202 of the 237 motivated, representing a deliberate eschewal of t S tokens of vulgar words (85.23%) occurred with tradition and formality. The ch!x change would an d-deletion, which, according to a binomial test, is seem to be a particularly good candidate to study ta B virtually impossible by chance (P<0.001). this because the character x indexes coolness in a rb In sum, there is not only a strong correlation of Spanish and English alike: (i) it is frequently used a ra d-deletion with words with special pragmatic func- in supposedly hip pop culture marketing situations o n tions in general, but also a particularly strong one much like the e (e.g. in e-commerce and e-surance) A wvaittihonvuinlgathrewporredvsi.oTuhsissecintiotunrnabsouuptptohretsinofulureonbcseeor-f aXn-dmetnh,eXi-(fiel.egs., XinteirPrao,dxatrnedmieP,heotcn.e)(:ii)cf.I,teis.ga. Xchbaorx-, pril 2 8 phonology on d-deletion. Just as the vulgar words acter that is generally rather infrequent: it accounts , 2 0 fuckin’ in English is hardly ever pronounced with for not even 0.3% of all letters of all word types in 1 0 the standard pronunciation involving word-final the BNC and each occurrence is therefore more [˛] (cf. Kiesling 1998), the vulgar Spanish words noteworthy than an occurrence of, say, a t. (iii) It here exhibit a strong dispreference for the stan- is a character that readily invokes the word sex dardpronunciationwith-ado,testifyingtotheinflu- becausethatwordisamongthemost frequentcon- ence of phonological patterns on orthographic tent words with this letter: in our SIO corpus, even regularities. though it is a loan word, sexy is the second most 92 LiteraryandLinguisticComputing,Vol.25,No.1,2010 AcorpusstudyofSpanishInternetorthography frequent word with an x that has not undergone Table7 Thefrequenciesofchandxinword-initialposi- ch!x. tions and elsewhere ch x Total 4.2 Methods word-initial 13,553 7,460 21,013 In order to find instances of ch!x in our corpus, elsewhere 22,244 7,961 30,205 we searched for all word forms that contained ch Total 35,797 15,421 51,218 andthencheckedeachoneforalternationwithxin D o thecorpus,althoughpresenceofalternationwasnot w n a necessary condition for inclusion. However, we change to x. We decided to look at the following lo a considered forms that occurred only with x and contrasts: de d notchnottobeapartofthisalternation.Weexam- – word-initial versus elsewhere; fro ined the 50 most frequent forms by combined ch m and x occurrence and identified several potentially – pre-vocalic versus elsewhere; h pwreorbeledmisacatircdteydp,eass. Stthaendoanlloynseixcht/oxkaesnsweolfl eacshecwhe/erxe –– pinotsetr-vvooccaalliiccvveerrssuusseellsseewwhheerree;8 ttp://llc – hard letters (a, o, u before which c and g are .o foundinentrieswritteninGerman.Examiningeach x realized as stops) versus soft letters (e and i, fo individualconcordanceoftheremainingambiguous before which c and g are realized as fricatives). rd forms,weretainedthethreetokensofchksincethey jo u wereusedasaformofchico/a‘boy/girl’,butrevised Usingthefirstcontrastforillustration,thecurrently rn a the frequency of xk to zero because each token was most frequent way to evaluate such data would be ls .o usednotasanalternationofchkbutasaformofpor by means of chi-square test by generating a table rg que´‘why’ or porque ‘because’ (on the basis of the such as Table 7 and then compute a chi-square a pronunciation of the multiplication symbol x as test and, ideally, also an effect size measure such t U n por). The frequency of xo was revised to 2, as it as ’/Cramer’s V. ive occurred as an alternation of (e)cho ‘I miss’ only However, given that many movie descriptions rs ity twice, serving in other cases as a form of pero ‘but’ and/or comments will contribute more than one o (againonthebasisofthemultiplicationsymbol)or ch/x spelling to these data, the chi-square test’s f C a entry-finaliconicrepresentationofhugandkiss.The assumption of the independence of data points is lifo frequency of xoxo was likewise revised to 1, as it violated (cf. Evert 2009 for detailed discussion). rn occurred only once as an alternation of chocho Evert’s (2009) recommendation is to, therefore, ia a ‘cunt’. We retained all ch and x forms of bechos not make each use of ch/x a data point, but each t S and grachias (i.e. besos ‘kisses’ and gracias ‘thanks’) description/comment. We therefore decided to an asthesecanreflectanaffricatedpronunciationvari- compute an index for each description/comment ta B ant instead of a direct s!x or c!x orthographic that quantifies the degree to which it prefers ch a rb alternation, and we had no reason to assume the or x, and the index we chose is the difference a ra pronunciations were not intended to be affricated. coefficient that has been used in, for example, o n As above for -ado/-ao, we searched our results for Leech and Fallon (1992). Imagine a comment A Eanndglidsihscwarodreddsi2n0t7hetympeasntnheartdwesecrreibneodtianlsSoecStpioanni3sh2 cwointhtaxin.iTnhge5difwfeorredn-cienictoiaelffifcoiremntsiswtihthenccho,mapnudted3 pril 2 8 words or proper nouns. Finally, we used the same as in (3): , 2 0 list of vulgarities as in the previous section. 1 occurrences of x(cid:4)occurrences of ch 0 While the very nature of the -ado/-ao deletion ð3Þ occurrences of x þ occurrences of ch processdeterminesmuchofthechange’sphonolog- 3(cid:4)5 icalcontexts(cf.above),thechangech!xcanoccur ¼ ¼(cid:4)0:25 3þ 5 inmanydifferentplaces,whichiswhywedecidedto investigate to what degree the place of ch in the Thatis,thevalueofthedifferencecoefficientranges word or the syllable correlated with the rate of from–1toþ1,andthesmallerorlargeritis,theless LiteraryandLinguisticComputing,Vol.25,No.1,2010 93 M.Mysl´ınandSt.Th.Gries D o w n lo a d e d fro m h ttp ://llc .o x fo rd jo u rn a ls .o rg a t U n iv e Fig. 4 Percentages ofch!xinSIO (compared toCdE)asafunction of frequency rsity o f C a or more x is preferred. We computed such differ- These three preferences were then compared across lifo encecoefficientsforeachcontrast,i.e.wecomputed the positions to determine which position prefers rnia onedifference coefficient for eachdescription/com- which spelling. a ment for all word-initial uses and one for all other t S a uses, obtaining two difference coefficients for each 4.3 Results 1: the 50 most frequent nta description/comment. We then computed, for each words in the SIO corpus versus the CdE Ba description/comment, the difference word-initial rb We found ch!x in 902 out of 5252 word types a minusnotword-initial,whichyieldedthefollowing ra results: (17.2%) and 15,421 out of 51,218 word tokens on (30.1%). In order to examine this alternation A – dwihffeenretnhceew>o0r,da-innditiwaelpcoosnitsiiodnerpedrefaerdsifxf,etrhenencethoef mrinogrewciltohseelyithinerthvear5ia0nmt,owstefcreaqlcuuelnattefdorpmerscoencctaugre- pril 2 8 larger than 2/3 as reflecting a strong preference deletion values for each word type in SIO and in , 2 for x; the CdE in the way described in Section 3.4.9 The 01 0 – when both positions have the same preference, results are shown in Figure 4. then the difference is close to 0; Figure 4 confirms that ch!x is a largely – when the word-initial position disprefers x, then web-exclusive phenomenon. Only one possible the difference is <0, and we considered a differ- token of ch!x, graxias ‘thanks’, occurred in ence of smaller than –2/3 as reflecting a strong the CdE, and even this is ambiguous since, unless preference for ch. it is assumed to represent the affricated 94 LiteraryandLinguisticComputing,Vol.25,No.1,2010

Description:
orthographic omission with colloquial and espe- cially Andalusian Spanish the proper name Pao;. – nao when occuring as the Portuguese na˜o 'no';.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.