ebook img

Mandarin Chinese Words and Parts of Speech. A Corpus-based Study PDF

292 Pages·2017·2.222 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mandarin Chinese Words and Parts of Speech. A Corpus-based Study

Mandarin Chinese Words and Parts of Speech A Corpus-based Study Chu-Ren Huang, Shu-Kai Hsieh and Keh-Jiann Chen Firstpublished2017 byRoutledge 2ParkSquare,MiltonPark,Abingdon,OxonOX144RN andbyRoutledge 711ThirdAvenue,NewYork,NY10017 RoutledgeisanimprintoftheTaylor&FrancisGroup,aninformabusiness ⃝c 2017Chu-RenHuang,Shu-KaiHsiehandKeh-JiannChen BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary. LibraryofCongressCataloguing-in-PublicationData Names:Huang,Chu-Ren,editor.|Hsieh,Shu-Kai,editor.|Chen,Keh-jiann,editor. Title:MandarinChinesewordsandpartsofspeech:acorpus-basedstudy/editedbyChu-Ren Huang,Shu-KaiHsiehandKeh-JiannChen.Description:MiltonPark,Abingdon,Oxon;NewYork, NY:Routledge,[2017]|Series:RoutledgestudiesinChineselinguistics|Includesbibliographical referencesandindex.Identifiers:LCCN2016052262|ISBN9781138949447(hardback:alk.paper)| ISBN9781315669014(ebook)Subjects:LCSH:Mandarindialects–Partsofspeech–Dataprocessing.| Mandarindialects–Termsandphrases–Dataprocessing.|Mandarindialects–Grammatical categories–Dataprocessing.|Chineselanguage–Dialects–Dataprocessing.|Computational linguistics.Classification:LCCPL1893.M262017|DDC495.15–dc23LCrecordavailableat https://lccn.loc.gov/2016052262 ISBN:978-1-138-94944-7(hbk) ISBN:978-1-315-66901-4(ebk) TypesetinTimesNewRoman byOutofHousePublishing Contents Foreword xiii Overview:FromLinguisticsStudiestoLanguageResources AnnotationandProcessing 1 0.1 TwoFoundationalIssuesinLinguisticStudiesofChinese 1 0.1.1 ChineseWordSegmentation 1 0.1.2 ChinesePartsofSpeech 2 0.2 LanguageResourcesAnnotationandProcessing 3 0.2.1 LanguageResources 3 0.2.2 FromAnnotationtoLinguisticKnowledge 4 0.3 Summary 5 PARTI Words,SegmentationUnits,andSegmentationStandards 7 1 Introduction 9 1.1 OriginandBackgroundInformation 9 1.2 Objectives 10 1.3 ResearchandImplementationPlan 10 1.3.1 Researchplan 10 1.3.2 ImplementationTimeline 11 1.4 CharacteristicsoftheStandard 11 2 WordSegmentationStandardandLevelsofStandards 13 2.1 WordSegmentationStandard 13 2.1.1 Definition 13 2.1.2 BasicPrinciples 13 2.1.3 SubsidiaryPrinciples 15 2.2 LevelsofSegmentationStandard 21 3 “Sou”WenJieZi:StudiesonIdentificationofWordsand SegmentationUnits 27 3.1 SegmentationStandardsfortheDeterminer–Measure Construction 27 3.2 SegmentationPrinciplesofReduplicationConstruction 30 3.3 SegmentationPrinciplesofAffixation 36 3.4 SegmentationPrincipleofVerb-complementCompounds 40 3.5 SegmentationPrincipleofPost-verbal‘yu’ 43 3.6 SegmentationRulesforConstructionslike為wei2‘for/as’/成 cheng2‘tobecome’/作zuo4‘as’ 50 3.7 SegmentationRulesforConstruction“verb+給gei3” 52 3.8 SegmentationRulesforConstruction“verb+有you3” 54 3.9 Segmentationrulesfor的de0,地de0,之zhi1 57 3.10 SegmentationPrincipleforNegation/negative 61 3.11 SegmentationPrinciplefor没(有)mei2(you3) 64 3.11.1 SegmentationPrinciplefor非fei1 65 3.11.2 SegmentationPrinciplefor别bie2,休xiu1,甭beng2 66 3.12 SegmentationprincipleforA-not-AQuestions 67 3.13 SegmentationPrincipleforWordswithInsertedElements 69 3.14 SegmentationPrincipleforBlendWords 72 3.14.1 SharingInitial/ending/bothWords 72 3.14.2 TelescopicCompounds 75 3.15 SegmentationPrincipleforPost-verbalModification 77 3.16 SegmentationPrincipleforProperNouns 80 3.17 SegmentationPrincipleforIdiomChunk 81 4 IllustrativeExamplesofImplementationof SegmentationStandard 90 4.1 WordSegmentationStandard 90 4.2 SegmentationofWordsinDifferentLevels 96 5 ComparisonofTwoSegmentationStandards 97 5.1 TheDifferencesbetweenMainlandandTaiwan WordSegmentationStandards 97 5.1.1 PrincipleDifferences 97 5.1.2 DetailComparison 98 5.2 FutureDevelopmentsforWordSegmentationStandards 99 5.2.1 TheEffectivenessofDefinition 99 5.2.2 ApplicableScopeofCombiningBoundMorphemewith AdjacentWordsintoSegmentationUnit 100 5.2.3 AboutCombinationofModifier–HeadConstructions 101 5.2.4 TheDependencyandIndependencebetweenthe SegmentationPrincipleandtheStandardLexicon 104 5.3 Conclusion 105 PARTII PoSAnalysisofContemporaryChinese 107 6 IntroductiontoCKIPPartsofSpeechSystem 109 6.1 WordanditsPOSTagintheCKIPLexicon 109 6.1.1 AnnotationGuidelinesforBoundMorphemes 110 6.1.2 AnnotationGuidelinesforSentences 110 6.1.3 AnnotationGuidelinesforDeterminer–Measure Compounds 111 6.1.4 AnnotationGuidelinesforReduplicatedWords 111 6.1.5 AnnotationGuidelinesforVerb–Complement Compounds 111 6.2 POSAnnotation 112 6.2.1 PolyfunctionalityofWords 113 6.2.2 MultipleSyntacticClassificationofWords 115 7 V:Verbs 117 7.1 PrinciplesofVerbClassification 117 7.1.1 ActivityorState 117 7.1.2 TransitivityofVerbs 119 7.1.3 PhrasalFormsofArguments 119 7.1.4 ThematicRolesofArguments 120 7.1.5 SyntacticBehaviorsofVerbs 120 7.2 VerbClasses 121 7.2.1 VA:IntransitiveActivityVerbs 121 7.2.2 VB:Quasi-transitiveActivityVerbs 126 7.2.3 VC:ActivityTransitiveVerbs 129 7.2.4 VD:DitransitiveVerbs 136 7.2.5 VE:SententialObjectActionVerbs 140 7.2.6 VF:VerbPhraseObjectActivityVerbs 146 7.2.7 VG:ClassificatoryVerbs 149 7.2.8 VH:StateIntransitiveVerbs 151 7.2.9 VI:StateQuasi-transitiveVerbs 154 7.2.10 VJ:StateTransitiveVerbs 156 7.2.11 VK:StateSentential-objectVerbs 158 7.2.12 VL:StativeVP-objectVerbs 160 8 A:Non-PredicativeAdjectives 166 8.1 ListofPossibleSubclasses 166 8.2 ClassificationGuidelines 166 9 N:ContentWords 168 9.1 ClassificationofContentWords 169 9.1.1 Na:Nouns 170 9.1.2 Nb:ProperName 172 9.1.3 Nc:PlaceWords 173 9.1.4 Nd:TimeWords 174 9.1.5 Ne:Determinatives 177 9.1.6 Nf:MeasureWords 180 9.1.7 Ng:Localizers 194 9.1.8 Nh:Pronouns 195 9.2 ConceptualStructureofNouns 196 9.2.1 Framework 196 10 D:Adverbs 201 10.1 Da:QuantityAdverb 201 10.1.1 SyntacticFeatures 201 10.1.2 Subcategories 203 10.2 Dba:ModalAdverb 205 10.2.1 AnalysisPrinciples 206 10.2.2 Subcategories 207 10.3 Dbb/Dbc:EvaluativeAdverb 210 10.3.1 PrinciplesofAnalysis 210 10.3.2 Subcategories 211 10.4 Dc:NegativeAdverb 211 10.5 Dd:TimeAdverb 211 10.6 Df:DegreeAdverb 212 10.6.1 Subcategories 212 10.7 Dg:LocationAdverb 213 10.8 Dh:MannerAdverb 213 10.8.1 PrinciplesofAnalysis 213 10.9 Di:AspectAdverb 214 10.10 Dj:InterrogativeAdverb 215 10.10.1 PrinciplesofAnalysis 215 10.11 Dk:SententialAdverb 215 11 P:Preposition 216 11.1 SyntacticCharacteristicsofPrepositions 216 11.2 PrinciplesofAnalysis 218 12 C:Conjunction 223 12.1 Ca:JuxtaposingConjunctions 223 12.1.1 Subcategories 223 12.1.2 PrinciplesofAnalysis 227 12.2 Cb:CorrelativeConjunctions 228 12.2.1 Subcategories 229 13 T:Particles 230 13.1 Subcategory 230 13.2 ListsofParticles 231 14 I:Interjections 232 PARTIII Resources 233 15 OnlineResources 235 16 FurtherReading/RecentStudies 237 AppendixI:ACompleteListofInflectional,Derivational, andCompoundingAffixes 239 AppendixII:AffixesfromGB13715 240 AppendixIII:ComparisonTableofGB13715andCNS14366 241 AppendixIV:PrimarySourcesofPartA 244 AppendixV:SampleSegmentedText[Ya(Elegant)Level] 245 AppendixVI:ACompleteListofPartsofSpeechinMandarinChinese 254 AppendixVII:ACompleteTableofLocalizers 266 AppendixVIII:ConceptualStructureofNouns 280 References 281 Index 285 Foreword The Chinese Knowledge Information Processing Group (CKIP) was founded based on the vision of Prof. Chin-Chun Hsieh to build an infrastructure and to lay a foundational knowledge for computational Chinese language processing in 1986. Keh-jiann Chen soon joined to lead the group as a computer scientist and Chu-Ren Huang joined as a co-leader as a linguist in 1987. One of the first decisions that was made by the CKIP group was that to get the computational processing right, we had to get the linguistic facts and generalizations right first. Since Chinese linguistics, especially research adopting modern linguistics theoriesandapproaches,wasstillintheearlystagesofdevelopment,thisdecision meantthatsolidknowledgeoflinguisticgeneralizationsofChinesewouldbethe first priority. Hence, the first task that the group embarked on was not to write codeandprograms,buttocreateafullgrammaraswellasafullmachine-readable dictionaryofMandarinChinese.Andverynaturally,thegroupsoonadoptedthe data-driven, corpus-based approach that was starting to gain recognition in the studyofEnglish.Itisthison-the-groundapproachthatmadetheanalysisofdata and discussion of issues first conducted more than 20 years ago still relevant today.Wehopethat,bymakingthesecontents,whichwereoriginallypublished inChineseinTaiwan,availableinEnglish,wecanbringattentiontosomeofthe basiclinguisticfactsandgeneralizationsunderliningissuesinChinesewordhood and parts of speech classification that may have been lost in scholarly debates. It is beyond the scope of this volume to deliberate on different approaches to suchissuesbutsomeexamplesarediscussedinouroverviewchaptersandinthe publicationslistedforfurtherreading. The research reported here couldn’t have been carried out without the help of many colleagues. Chin-chun Hsieh lit the first spark and has always been supportive. Many theoretical and computational linguists provided advice at different stages of our work; they include but are not limited to Yung-O Biq, Fenfu Tsao, Charles T.-C. Tang, Chi-chen Jane Tang, Keh-yih Su, Shuan-fan Huang,JasonS.Chang,andShiwenYu.Researchreportedherewassupportedby variousgrantsinTaiwan,includingAcademiaSinica,ITRI,NSC,andtheNational Standardization Bureau. We are also grateful to the generous support from the Routledgeteam,includingAndreaHartillandCamilleBurns. This work couldn’t have been done without the dedication of all past CKIP members,includingthosewhomaynothavebeendirectlyinvolvedinthedrafting oftheoriginaltext.Wethankyouallandapologizefornotbeingabletolistall ofyouexceptforthosewhowereinvolvedindraftingthetwooriginaltechnical reports.TheyareLi-LiChang張麗麗,Li-PingChang張莉萍,Feng-yiChen陳 鳳儀, Jing-yu Chen 陳鏡瑜, Lian-chen Chief 漆聯成, Yun-chin Chou 周芸青, Wei-MeiHong洪偉美,Zhao-mingGao高照明,Hui-tingHuang黃惠婷,Rui-ju Huang黃瑞珠,Huan-HuiLin林煌賄,Shu-meiLiu劉淑梅,Wen-juanMao毛 文娟, Ruo-ping Mo 莫若萍, Wen-jen Wei 魏文貞, Jiunn-hsiung Wu 吳俊雄, and Meili Yeh 葉美利. In addition, we would like to thank the translation team who provided the first translation and some initial editing. The team members are mainly from the Hong Kong Polytechnic University and National Taiwan University but also include other colleagues in Taiwan who were former CKIP members: Stella Cong, Menghan Jiang, Chris Kwan, Yunfei Long, Hongchao Liu, Hongzhi Xu, Meili Yeh, Shu-Kai Hsieh, Chen-Chun E, Shu-Ling Huang, Meng-Xian Shih, Chang-Chia Hsu, Yu-Yun Chang, Chih-Yao Lee. In addition, we would like to thank Kathleen Ahrens (who was also a CKIP member) and Karl Neergaard for providing help in technical editing at different stages of the manuscript. Last,butnotleast,wewouldliketothankthePolyUPKUJointResearchCentre on Chinese Linguistics, as well as the National Taiwan University’s LOPE lab for providing logistical support, including the coordination work done by Stella Cong.Anyremainingerrorsare,ofcourse,ours. Chu-RenHuang Shu-KaiHsieh Keh-JiannChen HongKongandTaiwan, 12September,2016 Overview From Linguistics Studies to Language Resources Annotation and Processing Workpresentedinthisvolumeisatranslation,withcorrectionsandsomeupdates, of two classical volumes written and published by the Chinese Knowledge InformationProcessing(詞庫小組CKIP)groupatAcademiaSinica(CKIP1993 and 1996). The content presented here is the result of the first corpus-based, data-driven comprehensive research on two of the most basic issues in Chinese linguistics:thedefinitionsofwords(inlightofcomputationalwordsegmentation) andthedefinitionandclassificationcriteriaforgrammaticalcategories(orParts of Speech, PoS). The research on word segmentation and PoS classification reportedherewasthetheoreticalandempiricalunderpinningfortheconstruction of the Academia Sinica Balanced Chinese Corpus (Sinica Corpus, Chen et al. 1996); the volume can also be considered as a companion volume for Sinica Corpus(http://asbc.iis.sinica.edu.tw/). 0.1 TwoFoundationalIssuesinLinguisticStudiesofChinese The issues and data analysis presented here remain central to any theoretical inquiry,pedagogicalpractice,andcomputationalapplicationsofChinesebecause oftwocharacteristicsofthelanguage.First,Chinesewritingdoesnotmarkword boundaries(unlikeotherlanguagessuchasEnglish,wherespacescanbetreated asworddelimiters).Second,Chineselacksmorphologicalmarkinginassociation withcategoricalalternations. 0.1.1 ChineseWordSegmentation ThispartaddressesthewordhoodissueinChinese.Inparticular,asegmentation standard is proposed to achieve linguistic felicity, computational feasibility, and data uniformity. Linguistic felicity is maintained by the definition of a segmentation unit that is equivalent to the theoretical definition of a word, as well as a set of segmentation principles that are equivalent to the functional definition of a word. Specifically, since the proposed segmentation standard is intended for Chinese natural language processing, it is very important that it reflects linguistic reality as well as computational applicability. Computational feasibility is thus ensured by the fact that the above functional definitions are

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.