ebook img

A Cognitively Grounded Measure of Pronunciation Distance. PDF

0.24 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Cognitively Grounded Measure of Pronunciation Distance.

A Cognitively Grounded Measure of Pronunciation Distance Martijn Wieling1*, John Nerbonne2, Jelke Bloem3, Charlotte Gooskens2, Wilbert Heeringa2, R. Harald Baayen1,4 1Department of Quantitative Linguistics, University of Tu¨bingen, Tu¨bingen, Germany, 2Center for Language and Cognition Groningen, University of Groningen, Groningen,TheNetherlands,3AmsterdamCenterofLanguageandCommunication,UniversityofAmsterdam,Amsterdam,TheNetherlands,4DepartmentofLinguistics, UniversityofAlberta,Edmonton,Alberta,Canada Abstract Inthisstudywedeveloppronunciationdistancesbasedonnaivediscriminativelearning(NDL).Measuresofpronunciation distanceareusedinseveralsubfieldsoflinguistics,includingpsycholinguistics,dialectologyandtypology.Incontrasttothe commonlyusedLevenshteinalgorithm,NDLisgroundedincognitivetheoryofcompetitivereinforcementlearningandis abletogenerateasymmetricalpronunciationdistances.Inafirststudy,wevalidatedtheNDL-basedpronunciationdistances bycomparingthemtoalargesetofnative-likenessratingsgivenbynativeAmericanEnglishspeakerswhenpresentedwith accented English speech. In a second study, the NDL-based pronunciation distances were validated on the basis of perceptualdialectdistancesofNorwegianspeakers.ResultsindicatedthattheNDL-basedpronunciationdistancesmatched perceptual distances reasonably well with correlations ranging between 0.7 and 0.8. While the correlations were comparabletothoseobtainedusingtheLevenshteindistance,theNDL-basedapproachismoreflexibleasitisalsoableto incorporate acousticinformation otherthan soundsegments. Citation:WielingM,NerbonneJ,BloemJ,GooskensC,HeeringaW,etal.(2014)ACognitivelyGroundedMeasureofPronunciationDistance.PLoSONE9(1): e75734.doi:10.1371/journal.pone.0075734 Editor:JohanJ.Bolhuis,UtrechtUniversity,Netherlands ReceivedJune25,2013;AcceptedAugust16,2013;PublishedJanuary9,2014 Copyright: (cid:2)2014Wieling etal.Thisisanopen-accessarticledistributedunder thetermsoftheCreativeCommonsAttributionLicense, which permits unrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalauthorandsourcearecredited. Funding:ThisworkhasbeenfundedbytheNetherlandsOrganisationforScientificResearchandtheAlexandervonHumboldtFoundation.Thefundershadno roleinstudydesign,datacollectionandanalysis,decisiontopublish,orpreparationofthemanuscript. CompetingInterests:Theauthorshavedeclaredthatnocompetinginterestsexist. *E-mail:[email protected] Introduction For example, the Levenshtein distance between two accented pronunciationsofthewordWednesday,[wenzdeI]and[wen sde] Obtaining a suitable distance measure between two pronunci- is3 asillustrated by thealignmentinTable 1. ationsisimportant,notonlyfordialectologistswhoareinterested A clear drawback of this variant of the Levenshtein distance is infindingtherelationshipbetweendifferentdialects(e.g.,[1]),but thatitdoesnotdistinguishthesubstitutionofsimilarsounds(such alsoforsociolinguistsinvestigatingtheeffectofpoliticalborderson as [o] and [u]) from more different sounds (such as [o] and [i]). vernacular speech [2], language researchers investigating the Consequently, effort has been made to integrate more sensitive typological and genealogical relationships among the world’s segmentdistancesintheLevenshteindistancealgorithm[1,12].As languages (e.g., [3]), applied linguists attempting to gauge the manuallydeterminingsensitivesegmentdistancesistime-consum- degree of comprehensibility among related languages [4], and ing and language-dependent, Wieling and colleagues [13] researchersmeasuringtheatypicalityofthespeechofthebearers developed an automatic method to determine sensitive segment ofcochlearimplants[5].Furthermore,havingadistancemeasure distances. Their method calculated the pointwise mutual infor- between word pronunciations enables quantitative analyses in mation between twosegments, assigning lower distances between whichtheintegratedeffectofgeographyandsociolinguisticfactors segmentswhichalignedrelativelyfrequentlyandhigherdistances can be investigated (e.g., [6]). Standard sociolinguistic analyses between segments which aligned relatively infrequently. Results focusonwhetherspecificcategoricaldifferencesarepresentinthe indicated that the obtained segment distances were acoustically speechofpeoplefromdifferentsocialgroups.Byusingameasureof sensible and resulted in improved alignments [14]. Applying the pronunciation difference, we allow more powerful numerical adaptedmethodtotheexamplealignmentshownaboveyieldsthe analysis techniques to be used. For these analyses to be associated costs shown inTable 2. meaningful,however,themeasurementsofpronunciationdistance While Levenshtein distances correlate well (r=0.67) with need tomatchperceptual distances as closelyas possible. perceptual dialect distances between Norwegian dialects [15], There are various computational methods to measure word or there is no cognitive basis to link the Levenshtein distance to pronunciation distance (or similarity), of which the Levenshtein perceptual distances (but see [16] for an attempt to adapt the distancehasbeenthemostpopular[1,7,8,9,10].TheLevenshtein Levenshtein algorithm in line with theories about spoken word distance determines the pronunciation distance between two recognition). This is also exemplified by the fact that the transcribed strings by calculating the number of substitutions, Levenshtein distance is symmetrical (i.e. the distance between insertionsanddeletionstotransformonestringintotheother[11]. speaker A and B is the same as the other way around), while PLOSONE | www.plosone.org 1 January2014 | Volume 9 | Issue 1 | e75734 ACognitiveMeasureofPronunciationDistance Table1. BasicLevenshtein distancealignment. Table2.Levenshtein distancealignmentwith sensitive sounddistances. w e n z d e ¯I ¯ w e n s d e w e n z d e ¯¯I 1 1 1 w e n s d e 0.031 0.020 0.030 doi:10.1371/journal.pone.0075734.t001 doi:10.1371/journal.pone.0075734.t002 perceptual dialect distances may also show an asymmetrical itispossibletodirectlycalculatetheassociationstrengthbetween pattern [15]. cues and outcomes in the stable (i.e. adult) state where further Asexposuretolanguageshapesexpectationsandaffectswhatis learning does not substantially change the association weights. judged similar to one’s own pronunciation and what is different, Baayen and colleagues [23] have proposed an extension to we turn to one of the most influential theories about animal and estimatemultipleoutcomesinparallel.Their‘naivediscriminative human (discrimination) learning: the model of Rescorla and learning’ (NDL) approach (implementing the Danks equations Wagner[17].Thebasicassumptionofthismodelisthatalearner [22]) lends itself for efficient computation and is readily available predictsanoutcome(e.g.,themeaningofaword)basedontheset via their R package ‘ndl’. More details about the underlying of available cues (e.g., the sounds of a word). Depending on the computations canalso befoundin[23]. correctnessoftheprediction,theassociationstrengthsbetweenthe Afterallassociationstrengthsoftheadultstatearedetermined, outcome and the cues are adjusted so that future prediction theactivation(i.e.activationstrength)ofanoutcomegivenasetof accuracy improves. Concretely, if an outcome is present together cuescanbecalculatedbysummingthecorrespondingassociation with a certain cue, its association strength increases, while the strengths.Especiallytheseactivationsareimportantforprediction. association strength between an absent outcome and that cue Forexample,Baayenandcolleagues[23]foundthattheestimated decreases.Whenanoutcomeisfoundtogetherwithmultiplecues activation of words correlated well with experimental reaction (i.e. when there is cue competition), the adjustments are more timestothose words. conservative (depending on the number of cues). The learning Here we propose to use naive discriminative learning to theoryofRescorlaandWagnerisformalizedinasetofrecurrence determine pronunciation distances. The intuition behind our equations which specify the association strength Vitz1 of cue Ci approachisthataspeakerofacertaindialectorlanguagevariety with outcome O at time tz1 as Vtz1~VtzDVt, where the ispredominantlyexposedtospeakerswhospeaksimilarly,andthis i i i change inassociationstrength DVt isdefined as: inputshapesthenetworkofassociationstrengthsbetweencues(in i our case, sequences of three sound segments representing the DVt~ pronunciation, i.e. substrings of the phonetic transcription) and i outcomes (in our case, the meaning of the pronounced word) for 8>>>>><a0ib1(l{ P Vj) if PRESEiNfTA(BCSi,Et)N&TP(RCiE,tS)ENT(O,t) tthriegrsapmesa,kera.lloTwhse uthsee omf seeaqsuureencetsoofbethcoremeesegsemnesintitvs,esot-ocaltlhede PRESENT(Cj,t) adjustments sounds undergo in the context of other sounds, and >>>>>:aib2(0{ P Vj) if PRESENT(Ci,t)&ABSENT(O,t) t(Friogrracmomshpaavreisobne,enweexwpeilrlimalesontreedpworitthreinsudltisalwechteonloguysinbgefourneig[r2a4m]. PRESENT(Cj,t) andbigramcues.)Byexposingthespeakertoanewpronunciation Inthisdefinition,PRESENT(X,t)denotesthepresenceofcueX (in the form of its associated cues) we can measure how well the at timet andABSENT(X,t)itsabsence attime t. Wheneverthe speakerislikelytounderstandthatpronunciationbyinspectingthe activation strength of the corresponding outcome. The activation cue occurs without the outcome being present, the association strength of the outcome will depend on the association strengths strength is decreased, whereas it is increased when both the cue betweentheoutcomeandthecuesinvolvedinthepronunciation. and outcome are present. The adjustment of the association If only cues are present which have a high association strength strengthdependsonthenumberofcuespresenttogetherwiththe outcome.Thestandardsettingsfortheparametersarel~1,alla0s with the outcome, the activation of the outcome will be high, equal, and b ~b . whereas the activation of the outcome will be somewhat lower if 1 2 oneofthecueshasalowassociationstrengthwiththeoutcome.By TheRescorla-Wagnermodelhasbeenusedtoexplainfindings calculating the activation strength difference for two different in animal learning and cognitive psychology [18] and more pronunciationsofthesameword,weobtaina(gradual)measureof recently, Ramscar and colleagues [19,20,21] have successfully pronunciation distance. For example, the word ‘with’ would be used this model in the context of children’s language acquisition. highly activated when a native English listener hears [wıh]. For example, Ramscar and colleagues [21] showed that the However,whenaMandarinspeakerwouldincorrectlypronounce Rescorla-Wagnermodelclearlypredictedthatexposuretoregular ‘with’ as[wız],thiswould resultin asomewhat lower activation. plurals (such as rats) decreases children’s tendency to over- Of course, using an adult state with fixed association weights regularize irregular plurals (such as mouses) at a certain stage in between cues and outcomes is a clear simplification. Language their development. changeisacontinuousprocessandtheexperienceofalistener(i.e. Danks[22]proposedparameter-freeequilibriumequations(i.e. theassociationweightsbetweencuesandoutcomes)willobviously where Vtz1~Vt) for the recurrence equations presented above: i i beaffectedbythis.However,asthenewlanguageexperienceonly n Pr(ODC){PPr(CDC)V~0, where Pr(CDC) represents the makesupasmallpartofthetotallanguageexperienceofalistener, i j i j j i j~0 theeffectofthepastexperienceismostimportantindetermining conditional probability of cue Cj given cue Ci, and Pr(ODCi) the the association weights. As a consequence, and in line with the conditionalprobabilityofoutcomeOgivencueC.Consequently, results of Labov’s ([25]: Ch. 4) Cross-Dialectal Comprehension i PLOSONE | www.plosone.org 2 January2014 | Volume 9 | Issue 1 | e75734 ACognitiveMeasureofPronunciationDistance (CDC) studies (which evaluated how well American English All speech samples were transcribed by three phonetically speakers understand speakers from their own and other regions), trainedtranscribers(consensuswasreachedinthefewcaseswhere our model will yield lower meaning activations (i.e. more the transcriptions differed; [26]) according to the International misunderstandings) when sound change is in progress (i.e. the PhoneticAlphabet(IPA).Thetranscriptionsincludediacritics,and original sound segments will have a higher association strength theassociatedaudiofilesareavailable.Forthisstudy,weextracted with the meaning than the new sound segments). In similar 395 transcribed speech samples and their audio from the Speech fashion, our model predicts higher meaning activations for Accent Archive. The total number of native U.S.-born English pronunciations closer to one’s own pronunciation variant (i.e. speakers in this dataset was 115. The remaining 280 speech the‘‘localadvantage’’).Wealsoemphasizethatourmodelisable samples belonged to speakers with a different native language or tocapturedifferencesinunderstandabilityperword(aseachword whowere born outside oftheUnited States. has its own frequency of occurrence) – which might explain 1.2.ObtainingNDL-basedpronunciationdistances. For Labov’s finding that certain sounds are not always correctly every transcribed pronunciation, we extracted all possible sets of identified, even if they are characteristic of local speakers ([25]: sequencesofthreesoundsegments(diacriticswereignored,anda pp. 84–85). separatesegmentwasaddedtomarkwordboundaries)ascues.To Furthermore, the model we propose is general, as it does not model a native AE listener, we randomly selected about half (i.e. focusonaselectionoflinguisticfeatures(suchasvowels),buttakes 58) of the native AE speakers. We used their pronunciations to intoaccountallsound(sequences)indeterminingtheunderstand- generate thepronunciationcues,andpairedthesewithmeanings ability of acertain pronunciation. as outcomes (i.e. the pronunciation trigrams were linked to the Besides being grounded in cognitive theory of competitive correspondingmeanings).Weusedonlyhalfofthenativespeakers reinforcementlearning,aclearbenefitofthisapproachisthatthe forthelistenermodelinordertopreventoverfitting,i.e.learning pronunciation distances obtained do not need to be symmetrical, the peculiarities of the speakers rather than thefeatures of native as they depend on the association strengths between cues and American English. The pronunciation of the other half of the outcomes,whicharedifferentforeveryspeaker.Thisisillustrated speakersisusedtorepresentaverageAmericanEnglishspeechto in Section2.2 below. which the pronunciation of individual speakers is compared. To evaluate the effectiveness of this approach, we conducted (While we could have used the speech of a single speaker for the two experiments. The first experiment focused on investigating listener model and the speech of another individual speaker to foreignnessratingsgivenbynativeAmericanEnglish(AE)speakers representnativeAmericanEnglishspeech,thiswouldhavebiased the model to the specific dialectal variants of these speakers.) As when judging accented English speech, while the second the association strength between cues and outcomes depends on experiment focused on the asymmetric perceptual distances of the frequency with which they co-occur, we extracted word Norwegian dialect speakers. frequency information from the Google N-Gram Corpus [27]. As we noted in the introduction, the Levenshtein distance has Thetotalfrequencyofeachmeaningoutcomewasequallydivided beenappliedtopronunciationtranscriptionstoassaythedegreeto amongalldifferentpronunciationsassociatedwithit.Forexample, whichnon-localpronunciationssound‘‘different’’fromlocalones ifthefrequencyoftheword‘frog’equals580,000,thefrequencyof (indialectology,see[1]),butalsotopredictthecomprehensibility each of the 58 pronunciations was set to 10,000. We then of other language varieties (in applied sociolinguistics, see [4]). estimated the weights of the model using the ‘ndl’ package in R Since pronunciations maysoundnon-native or non-local without (version 0.2.10) which implements the Danks equations [23] suffering in comprehensibility, one might suspect that the two introduced above. The resulting network of association strengths notions are not the same, even if they are clearly related. In the between pronunciation cues and meaning outcomes represents a present paper we construct a model of an artificial listener to nativeAElistener.Asanexample,Table3showspartoftheinput discriminate well enough between words given sound trigrams, usedforestimatingtheweightsandTable4showstheassociation which is essentially a comprehension task. But we shall evaluate strengths obtained after the weights have been estimated (i.e. the thesamemodelonhowwellitpredictshumanjudgmentsofhow ‘adult’ associationweightsof a nativeAE listener). similar thespeech istoone’sownpronunciation (i.e.hownative- It is clear from Table 4 that the cues found together with a likeforeignaccentssound,orhowcloseapronunciationistoone’s certainoutcomegenerallyhaveapositivevalue.Themorelikelyit own dialect). To the degree to which these experiments succeed, is the cue is found together with the associated outcome (and, we may conclude that the degree of comprehensibility is largely crucially, not with other outcomes), the higher the association thesame asthedegree ofnativeness (orlocalness). strength between thetwowillbe. Giventhetableofassociationstrengthsrepresentingasimulated Materials and Methods nativeAElistener,itisstraightforwardtodeterminetheactivations 1. Accented English speech 1.1. Material: the Speech Accent Archive. The Speech Table3.Partofthetableusedforestimatingtheassociation Accentarchive[26]isdigitallyavailableathttp://accent.gmu.edu strengths. The‘#’marks the word boundary. and contains a large sample of speech samples in English from peoplewithvariouslanguagebackgrounds.Eachspeakerreadthe sameparagraphof69words(55ofwhichareunique)inEnglish: Speaker Outcome Pronunciation Cues Frequency english23 with [w¯Ih] #w¯I,w¯Ih,¯Ih# 28,169,384 ¯ ¯ ¯ ¯ PleasecallStella.Askhertobringthesethingswithherfromthestore: english167 with [w¯Ið] #w¯I,w¯Ið,¯Ið# 28,169,384 ¯ ¯ ¯ ¯ sixspoonsoffreshsnowpeas,fivethickslabsofbluecheese,andmaybe english23 her [h ] #h ,h , # 852,131 asnackforherbrotherBob.Wealsoneedasmallplasticsnakeanda english167 her [ ] # # 852,131 bigtoyfrogforthekids.Shecanscoopthesethingsintothreeredbags, andwewillgo meether Wednesday atthetrain station. doi:10.1371/journal.pone.0075734.t003 PLOSONE | www.plosone.org 3 January2014 | Volume 9 | Issue 1 | e75734 ACognitiveMeasureofPronunciationDistance outcomeoccurredmorethanonce(suchas‘we’,whichoccurs Table4.Theassociationstrengthsforthecuesandoutcomes twice in the paragraph of text), we averaged the activations inTable1foroursimulatednativeAElistenerafterthesehave associated with the corresponding pronunciations (i.e. the beenestimated onthe basis ofthe inputof 58randomly selectednative AEspeakers. associatedcues).Foreachoutcome,wesubsequentlyaveraged theactivationsacrossall 57speakers.Thisisourbaselineand can be interpreted as the activations (for 55 individual meanings)ofournativeAElistenermodelwhenbeingexposed Cue Associationstrengthfor‘with’ Associationstrengthfor‘her’ tothespeech ofan averagenativeAE speaker. #w¯I 0.2519 0.0000 ¯ 2.Foreachindividualspeaker(mostlynon-native,seebelow),we w¯¯Ih 0.3738 0.0000 obtained the activations of our native AE listener model for ¯¯Ih# 0.3738 0.0000 each of the 55 meanings. Again, whenever an outcome w¯Ið 0.3741 0.0000 occurred more than once, we averaged the activations ¯ ¯Ið# 0.3741 0.0000 associated with thecorresponding pronunciations. ¯ #h 0.0000 0.4973 3.For each individual speaker, we calculated the activation difference compared to the baseline for all 55 meanings h 0.0000 0.2433 separately. We then averaged these activation differences # 0.0000 0.2594 across the 55 meanings. This resulted in a single value for # # 0.0000 1.0000 each speaker and represents the NDL-based pronunciation distancewith respecttoan averagenative AEspeaker. doi:10.1371/journal.pone.0075734.t004 Asthespecificsampleofspeakersusedforestimatingthenative ofeachoutcomeforacertainpronunciation(convertedtocues)by American English listener model may influence the results, we summing the association strengths between the cues in the repeated the random sampling procedure (in which 58 speakers pronunciation and the outcome. The top half of Table 5 shows were selected whose pronunciations were used to estimate the thatthepronunciationsofnativeAEspeakersstronglyactivatethe listener model) 100 times to generate 100 slightly different native corresponding outcome (the values are equal or very close to the AElistenermodels.Obviously,thisalsoresultedinachangeofthe maximumof 1). remaining57speakerswhowereusedtorepresentanaverageAE Of course, we can also use the association strengths (of the speaker (see step 1, above). Consequently, we obtained 100 simulated native AE listener) to calculate the activations for (slightly different) NDL-based pronunciation distances for each accented speech. The bottom part of Table 5 clearly shows that individual speakercompared toan averageAE speaker. accented speech results in lower activations (and thus reduced 1.3. Validating automatically obtained foreignness understanding), compared to the pronunciations of native AE ratings. We evaluated the computed pronunciation distances speakers (shown in the top part of Table 5). In some cases, a by comparing them to human native-likeness ratings. For this foreign speakermight use a cue whichwould neverbeused by a purpose, we developed an online questionnaire for native U.S. native AE speaker (such as ‘#x ’ in Table 5). As these cues were Englishspeakers.Inthequestionnaire,participantswerepresented notencounteredduringtheestimationofthemodel,noassociation with a randomly ordered subset of 50 speech samples from the strengths have been set for those cues and, consequently, their SpeechAccentArchive.Wedidnotincludeallspeechsamples,as values donot contributetotheactivation oftheoutcome. our goal was to obtain multiple native-likeness-judgments per To determine pronunciation distances with respect to native sample.Foreachspeechsample,participantshadtoindicatehow AmericanEnglish,weexposedourmodelofanativeAEspeaker native-like each speech sample was. This question was answered to both native American English speech as well as accented usinga7-pointLikertscale(rangingfrom1:veryforeignsounding English speech and investigated the activation differences of the to7:nativeAEspeaker).Participantswerenotrequiredtorateall meaning outcomes.We usedthefollowing procedure: samples, but could rateany number ofsamples. 1.For each of the native American English speakers not Of course, more advanced methods are possible to measure considered when constructing the listener model (i.e. the native-likeness, such as indirect measures which assess the remaining57nativeAEspeakers),wecalculatedtheactivation understandability of the accented pronunciations in a certain of the listener model for each of the 55 different meaning context(cf.[25:Ch.4]).However,asourdatasetwaslimitedtoa outcomes (i.e. all unique words in our dataset). Whenever an small fixed paragraph of text, we used a simple rating approach Table5.Theactivationsofdifferentoutcomesonthebasisoftheassociationstrengthsbetweenthecuesandoutcomesforour simulatednative AElistener (shown inTable2). Speaker Outcome Pronunciation Cues Activationofoutcome english23 with [w¯Ih] #w¯I,w¯Ih,¯Ih# 0.9995 ¯ ¯ ¯ ¯ english167 with [w¯Ið] #w¯I,w¯Ið,¯Ið# 1.0000 ¯ ¯ ¯ ¯ english23 her [h ] #h ,h , # 1.0000 english167 her [ ] # # 1.0000 mandarin10 with [w¯Iz] #w¯I,w¯Iz,¯Iz# 0.2519 ¯ ¯ ¯ ¯ serbian10 her [x ] #x ,x , # 0.2594 doi:10.1371/journal.pone.0075734.t005 PLOSONE | www.plosone.org 4 January2014 | Volume 9 | Issue 1 | e75734 ACognitiveMeasureofPronunciationDistance which, nevertheless, resulted in consistent ratings (see results, 4.For all 58 individual meaning outcomes, we calculated the below). differencebetweentheactivationsofL forD andthebaseline i j Viae-mailandsocialmediaweaskedcolleaguesandfriendsto D and average these 58 differences to get a single value i forwardtheonlinequestionnairetopeopletheyknewtobenative representing the NDL-based pronunciation distance between AEspeakers.Inaddition,theonlinequestionnairewasadvertised D andD (from theperspective of L). i j i on Language Log by Mark Liberman. Especially that announce- mentledtoanenormousamountofresponses.Asaconsequence, TheaboveprocedureisrepeatedforallcombinationsofDiand we replaced the initial set of 50 speech samples five times with a Dj resulting in 210 NDL-based pronunciation distances (15615, new set to increase the number of speech samples for which we butthe15diagonalvaluesareexcludedastheyarealwaysequalto couldobtainnative-likenessratings.Astherewassomeoverlapin 0). Table 6 shows these distances for a set of three Norwegian thenativeAEspeechsamplespresentineachset(usedtocalibrate dialects. Note that the NDL-based pronunciation distances the ratings), the total number of unique samples presented for between these dialects are clearly asymmetric. The dialect of rating was 286, of which 280 were samples from speakers who Bjugn is closer to the dialect of Bergen from the perspective of were not born intheU.S. Bergen(0.545)thanthedialectofBergenisfromtheperspectiveof Bjugn (0.559). 2. Norwegian dialects To evaluate these distances, we correlated them with the corresponding perceptual distances (obtained from[15]). 2.1.Material. TheNorwegiandialectmaterialistakenfrom the study of Gooskens and Heeringa [15], who perceptually Results evaluatedtheLevenshteindistanceonthebasisofIPAtranscribed audio recordings of 15 Norwegian dialect speakers reading the 1. Results for accented English speech fable ‘‘The North Wind and the Sun’’ (containing 58 unique Atotalof1143nativeAmericanEnglishparticipantsfilledinthe words). The original dataset was created by Jørn Almberg and questionnaire (658 men: 57.6%, and 485 women: 42.4%). Kristian Skarbø and is available at http://www.ling.hf.ntnu.no/ Participants were born all over the United States, with the nos. The transcriptions (including diacritics) were made by the exception of the state of Nevada. Most people came from sameperson,ensuringconsistency.Perceptualdistances(reported California (151: 13.2%), New York (115: 10.1%), Massachusetts in Table 1 of [15]) were obtained by asking 15 groups of high (68:5.9%),Ohio(66:5.8%),Illinois(64:5.6%),Texas(55:4.8%), school pupils (in the corresponding dialect areas) to rate all 15 andPennsylvania(54:4.7%).Theaverageageoftheparticipants dialectalaudiosamplesonascalefrom1(similartoowndialect)to was 36.2years(SD:13.9)andevery participantrated on average 10 (not similar to own dialect). Perceptual dialect distances were 41 samples (SD: 14.0). Every sample was rated by at least 50 thencalculated by averaging these ratings pergroup. participantsandthejudgmentswereconsistent(Cronbach’salpha: 2.2.Methods. Followingthesameprocedureasdescribedin 0.853). Section 1.2, we converted the pronunciations for each of the 15 TodeterminehowwellourNDL-basedpronunciationdistances speakersinoursampletocuesconsistingofthreesequentialsound onthebasisoftrigramcuesmatchedthenative-likenessratings,we segments (diacritics were ignored, and a separate segment was calculatedthePearsoncorrelationrbetweentheaveragedratings added to mark word boundaries). The word frequencies were andtheNDL-basedpronunciationdistancesforthe286speakers. extracted from a Norwegian word frequency list (on the basis of Since we had 100 sets of NDL-based pronunciation distances subtitles and obtained from http://invokeit.wordpress.com/ (basedon100differentrandomsamplingsofthenativeAmerican frequency-word-lists). English speakers used to estimate the model), we averaged the TodeterminepronunciationdistancebetweendialectsD andD i j corresponding correlation coefficients, yielding an average corre- from the perspective of a listener of dialect D, we used the i lation of r=20.72 (p,0.001). Note that the direction of the following procedure: correlationsisnegativeastheparticipantsindicatedhownative-like 1.WeestimatedtheNDLmodel(i.e.resultinginaspecificweight each sample was, while the NDL-based pronunciation distance matrix associating cues with outcomes) using the cues on the indicateshowforeignasampleis.Asascatterplotclearlyrevealed basisofthepronunciationsfromthespeakerofdialectD.This a logarithmic relationship (see Figure 1), we log-transformed the i model can beseen as representing an experienced listener (L) NDL-basedpronunciationdistances,increasingthecorrelationto i of dialectD. r=20.80 (p,0.001). The logarithmic relationship suggests that i peoplearerelativelysensitivetosmalldifferencesinpronunciation 2.We expose L to the cues on the basis of the pronunciations i in judging native-likeness, but as soon as the differences have from dialect D and measure the activation of each of the i reached a certain magnitude (i.e. in our case an NDL-based corresponding 58 meaning outcomes. (Because we only had a pronunciationdistanceofabout0.2)theyhardlydistinguishthem singlespeakerinoursampleforeachdialect,wecouldnotuse anymore. The sensitivity to small differences is also illustrated by separate pronunciations for estimating the listener model and representing the speaker.). Whenever an outcome occurred morethanonce(somewordswererepeated),weaveragedthe Table6.Part ofthe NDL-basedNorwegiandialect activations associated with the corresponding pronunciations pronunciation distances. (i.e. the associated cues). These activations are used as the baseline, and can be interpreted as the activations (for the 58 individualmeanings)ofL whenbeingexposedtospeechofits Bergen Bjugn Bodø i owndialect. Bergen X 0.545 0.584 3.WeexposeL tothecuesonthebasisofthepronunciationsof i Bjugn 0.559 X 0.319 another dialect D and measure the (averaged, when a word j Bodø 0.574 0.314 X occurred more than once) activation of each of the corre- sponding 58meaningoutcomes. doi:10.1371/journal.pone.0075734.t006 PLOSONE | www.plosone.org 5 January2014 | Volume 9 | Issue 1 | e75734 ACognitiveMeasureofPronunciationDistance the (slight) increase in performance when trigram cues are used pronunciation distances increased the correlation strength to whichincorporatediacritics.Inthatcase,thecorrelationstrength r=0.72(p,0.001).Inlinewiththeresultsfortheaccentdata,the increases to r=20.75 (r=20.82 for the log-transformed NDL- LevenshteindistancesandtheNDL-basedpronunciationdistances basedpronunciationdistances).Theseresultsarecomparablewith correlate highly, r=0.89(p,0.001). theperformanceoftheLevenshteindistancewhenappliedtothis Theaforementionedresultsareallbasedonusingtrigramcues. dataset (r=20.81, p,0.001 for the log-transformed Levenshtein Using unigram cues instead of trigram cues severely reduced distance;unpublisheddata).Infact,theLevenshteindistancesand performance (r=0.10, log-transformed: r=0.31), whereas using the NDL-based pronunciation distances also correlate highly, bigram cues was almost as good as using trigram cues (r=0.67, r=0.89(p,0.001). log-transformed:r=0.71).Similarasbefore,addingunigramand/ We should note that this correlation is close to how well or bigram cues to the trigram cues did not really improve individualratersagreewiththeaveragenative-likenessratings(on performance. In contrast to the accent data, incorporating average:r=.84,p,.0001).Consequently,theNDL-basedmethod diacritics in the cues also did not help; the correlation then is almost as good as a human rater, despite ignoring supraseg- dropped to r=0.65 (log-transformed: r=0.66). This is likely mental pronunciation differences (suchas intonation). caused by therelatively smalldataset. Figure1alsoshowsthatpronunciationswhichareperceivedas native (i.e. having a rating very close to 7), may correspond to Discussion NDL-based pronunciation distances greater than 0. In this case, the NDL-based method classifies certain native-like features as In the present paper we have shown that pronunciation being non-native. This may be caused by our relatively small distances derived from naive discriminative learning match sample of only 58 speakers whose pronunciations were used to perceptual accent and dialect distances quite well. While the model the native AE listener. Real listeners have much more results were on par with those on the basis of the Levenshtein experience with their native language, and therefore can more distance, the advantage of the present approach is that it is reliably distinguish native-likefrom foreign cues. grounded in cognitive theory of comprehension based on Theaforementionedresultsareallbasedonusingtrigramcues. fundamental principles of human discrimination learning. Fur- When using unigram cues instead, the correlation between the thermore,theLevenshteindistanceistheoreticallylesssuitablefor perceptual native-likeness ratings and the NDL-based pronunci- modeling the degrees of difference in the perception of non-local ationdistancesdroppedtor=20.54(log-transformed:r=20.57). and non-native speech because it is a true distance, i.e. always Whenusingbigramcues,theperformancewasalmostonparwith symmetric,whileperceptionsofsimilaritymayalsobeasymmetric using trigram cues (r=20.69, log-transformed: r=20.79). Using [15].TheNDL-basedapproachnaturallygeneratesasymmetrical unigram and/or bigram cues together with trigram cues did not distances. affectperformance,asthesesimplercuesarenotdiscriminativein We noted above that the task of recognizing words based on thepresence oftrigram cues. phonetic cues is essentially a comprehensibility task. A second contributionofthepresentpaperisthereforetodemonstratethat 2. Results for Norwegian dialects models constructed to comprehend local speech automatically The correlation between the NDL-based pronunciation dis- assignscoresofnon-nativeness(orofnon-localnessamongdialects) tancesandtheperceptualdistanceswasr=0.68(p,0.001),which ina way that modelsnative speakers judgments. is comparable to the correlation Gooskens and Heeringa [15] One may wonder why the NDL-based method only slightly reported on the basis of the Levenshtein distance (i.e. r=0.67). improved upon the results of the Levenshtein distance for the Similar to the first study, log-transforming the NDL-based Norwegiandataset,especiallysincethatdatasetischaracterizedby asymmetric perceptual distances. We note here that the 15 NDL models(oneforeachlistener)areonlybasedonthepronunciation ofasinglespeaker.Consequently,itdoesnottakeintoaccountthe variationwithineachdialect(takenintoaccountbylistenersliving in the dialect area), which would have allowed for more precise estimates of the association weights. A general limitation is that Gooskens and Heeringa [15] already indicated that intonation is one of the most important characteristics in Norwegian dialects, andnosuchcueshavebeenusedhere(asthesewerenotavailable tous), thereby limitingthe ability todetect relevant asymmetries. Nerbonne and Heeringa ([28]: 563–564), on the other hand, speculate that there is a limit to the accuracy of validating pronunciation difference measures on the basis of aggregate judgments of varietal distance. If one supposes that poorer measures are noisier – but not more biased – than better ones, then the noise will simply be eliminated in examining large aggregates. If this is right, we cannot expect to change mean differences by adopting more accurate measurements. They suggest that improved validation will therefore have to focus on smaller units suchasindividual words. While we have not explored thisin the present paper, another important advantage of the NDL approach is that cues are not Figure 1. Logarithmic relationship between NDL-based pro- only restricted to phonetic segments. Cues with respect to nunciationdistancesandperceptualdistances. pronunciation speed or other acoustic characteristics (such as doi:10.1371/journal.pone.0075734.g001 intonation) can be readily integrated in an NDL model (e.g., PLOSONE | www.plosone.org 6 January2014 | Volume 9 | Issue 1 | e75734 ACognitiveMeasureofPronunciationDistance linking cues representing different intonation patterns to the Acknowledgments individualmeanings).AproblemoftheNDLmethod,however,is that it only accepts discrete cues. A continuous measurement We are very grateful to Mark Liberman for his post on Language Log invitingnativeAmericanEnglishspeakerstoratethenative-likenessofthe therefore needs to be discretized to separate cues, and this Englishspeechsamplesusedinthisstudy.WealsothankMichaelRamscar introduces a subjective element in an otherwise parameter-free forinterestingdiscussionsabouttheideasoutlinedinthispaper. procedure. Asourdatasetsonlyconsistedofafewdozenwords,ourmodel Author Contributions washighlysimplifiedcomparedtothecognitivemodelofahuman listener who will have access to thousands of words. It is Conceived and designed the experiments: MW RHB. Performed the nevertheless promising that pronunciation distances on the basis experiments: MW. Analyzed the data: MW. Contributed reagents/ ofoursimplifiedmodelsmatchperceptualdistancesatleastaswell materials/analysis tools: JB CG WH. Wrote the paper: MW JN JB CG WHRHB. as current goldstandards. References 1. Heeringa W (2004) Measuring Dialect Pronunciation Differences using 15. GooskensC,HeeringaW(2004)PerceptiveevaluationofLevenshteindialect LevenshteinDistance.PhDthesis,RijksuniversiteitGroningen. distancemeasurementsusingNorwegiandialectdata.LanguageVariationand 2. VallsE,WielingM,NerbonneJ(2013)Linguisticadvergenceanddivergencein Change16(3):189–207. Northwestern Catalan: A dialectometric investigation of dialect leveling and 16. WielingM,NerbonneJ(2007)Dialectpronunciationcomparisonandspoken border effects. LLC: Journal of Digital Scholarship in the Humanities 28(1): word recognition. In: Osenova P, et al., editors. Proceedings of the RANLP 119–146. WorkshoponComputationalPhonology.pp.71–78. 3. BakkerD,Mu¨llerA,VelupillaiV,WichmannS,BrownCH,etal.(2009)Adding 17. RescorlaRA,WagnerAR(1972)AtheoryofPavlovianconditioning:Variations typology to lexicostatistics: A combined approach to language classification. in the effectiveness of reinforcement and nonreinforcement. In: Black AH, LinguisticTypology13(1):169–181. Prokasy WF, editors. Classical conditioning II: Current research and theory. 4. Beijering K, Gooskens C, Heeringa W (2008) Predicting intelligibility and NewYork:Appleton-Century-Crofts.pp.64–99. perceivedlinguisticdistancesbymeansoftheLevenshteinalgorithm.Linguistics 18. SiegelSG,AllanLG(1996)ThewidespreadinfluenceoftheRescorla-Wagner intheNetherlands15:13–24. model.PsychonomicBulletinandReview3(3):314–321. 5. Sanders NC, Chin SB (2009) Phonological distance measures. Journal of 19. Ramscar M, Yarlett D, Dye M, Denny K, Thorpe K (2010) The effects of QuantitativeLinguistics43:96–114. feature-label-order and their implications for symbolic learning. Cognitive 6. Wieling M, Nerbonne J, Baayen RH (2011) Quantitative social dialectology: Science34(6):909–957. Explaining linguistic variation geographically and socially. PLOS ONE 6(9): 20. RamscarM,DyeM,PopickHM,O’Donnell-McCarthyF(2011)TheEnigma e23613.doi:10.1371/journal.pone.0023613. ofnumber:Whychildrenfindthemeaningsofevensmallnumberwordshardto 7. KesslerB(1995)ComputationaldialectologyinIrishGaelic.In:Proceedingsof learnandhowwecanhelpthemdobetter.PLOSONE6:e22501.doi:10.1371/ the Seventh Conference on European Chapter of the Association for journal.pone.0022501. ComputationalLinguistics.pp.60–66. 21. Ramscar M, DyeM, McCauley S (2013) Errorand expectationin language 8. Nerbonne J, Heeringa W (1997) Measuring dialect distance phonetically In: learning:Thecuriousabsenceof‘mouses’inadultspeech.Language:Inpress. ColemanJ,editor.WorkshoponComputationalPhonology.Madrid:Special 22. Danks D (2003) Equilibria of the Rescorla–Wagner model. Journal of InterestGroupoftheAssociationforComputationalLinguistics.pp.11–18. MathematicalPsychology47:109–121. 9. WichmannS,HolmanEW,BakkerD,BrownCH(2010)Evaluatinglinguistic distancemeasures.PhysicaA389:3632–3639. 23. BaayenRH,MilinP,FilipovicDurdevicD,HendrixP,MarelliM(2011)An 10. Wieling M, Heeringa W, Nerbonne J (2007) An aggregate analysis of amorphousmodelformorphologicalprocessinginvisualcomprehensionbased pronunciation in the Goeman-Taeldeman-van Reenen-Project data. Taal en onnaivediscriminativelearning.PsychologicalReview118:438–482. Tongval59(1):84–116. 24. HeeringaW,KleiwegP,GooskensC,NerbonneJ(2006)Evaluationofstring 11. LevenshteinV(1965)Binarycodescapableofcorrectingdeletions, insertions distance algorithms for dialectology. In: Nerbonne J, Hinrichs E, editors. andreversals.DokladyAkademiiNaukSSSR163:845–848.InRussian. LinguisticDistances.Sydney:COLING/ACL.pp.51–62. 12. Heeringa W, Braun A (2003) The use of the Almeida-Braun system in the 25. Labov W (2010) Principles of Linguistic Change, Cognitive and Cultural measurementofDutchdialectdistances.ComputersandtheHumanities37(3): Factors,Vol.3.Malden:Wiley-Blackwell. 257–271. 26. Weinberger SH, Kunath SA (2011) The SpeechAccent Archive: Towards a 13. WielingM,Prokic´ J,NerbonneJ(2009)Evaluatingthepairwisealignmentof typologyofEnglishaccents.LanguageandComputers73:265–281. pronunciations.In:BorinL,LendvaiP,editors.ProceedingsoftheEACL2009 27. BrantsT,FranzA(2009)Web1T5-gram,10Europeanlanguages.Version1. Workshop on Language Technology and Resources for Cultural Heritage, Philadelphia:LinguisticDataConsortium. SocialSciences,Humanities,andEducation.pp.26–34. 28. Nerbonne J, Heeringa W (2010) Measuring dialect differences. In: Auer P, 14. WielingM,MargarethaE,NerbonneJ(2012)Inducingameasureofphonetic Schmidt JE, editors. Language and Space: Theories and Methods. Berlin: similarityfromdialectvariation.JournalofPhonetics40(2):307–314. MoutonDeGruyter.pp.550–566. PLOSONE | www.plosone.org 7 January2014 | Volume 9 | Issue 1 | e75734

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.