ALR 12 The 12th Workshop on Asian Language Resources Proceedings of the Workshop December 12, 2016 Osaka, Japan Copyrightofeachpaperstayswiththerespectiveauthors(ortheiremployers). ISBN978-4-87974-722-8 ii Preface This 12th Workshop on Asian Language Resources (ALR12) focuses on language resources in Asia, which has more than 2,200 spoken languages. There are now increasing efforts to build multi-lingual, multi-modal language resources, with varying levels of annotations, through manual, semi-automatic andautomaticapproaches, astheuseofICTspreadsacrossAsia. Correspondingly, thedevelopmentof practicalapplicationsoftheselanguageresourceshasalsobeenrapidlyadvancing. TheALRworkshop series aims to forge a better coordination and collaboration among researchers on these languages and in the NLP community in general, to develop common frameworks and processes for promoting these activities. ALR12 collaborates with ISO/TC 37/SC 4, which develops international standards for "Language Resources Management," and ELRA, which is campaigning LRE map, in order to integrate efforts to develop an Asian language resource map. Also, the workshop is supported by AFNLP, which has a dedicated Asian Language Resource Committee (ARLC), whose aim is to coordinate the important ALR initiatives with different NLP associations and conferences in Asia and other regions. This workshop consists of twelve oral papers and seven posters, plus a special session to introduce ISO/TC 37/SC 4 activities to the community, to stimulate further interactions between research and standardization. ALR12programco-chairs KoitiHasida President,GSK TheUniversityofTokyo Kam-FaiWong President,AFNLP TheChineseUniversityofHongKong NicolettaCalzolari HonoraryPresident,ELRA ILC-CNR Key-SunChoi Secretary,ISO/TC37/SC4 KAIST iii Organisers KoitiHasida Kam-FaiWong NicolettaCalzolari Key-SunChoi ProgrammeCommittee KenjiAraki NormaziahAziz KhalidChoukri KohjiDohsaka KentaroInui HitoshiIsahara KaiIshikawa SatoshiKinoshita KiyoshiKogure HaizhouLi JosephMariani FumihitoNishino WinPaPa AyuPurwarianti LuQin HammamRiza HiroakiSaito KiyoakiShirai VirachSornlertlamvanich Keh-YihSu KumikoTanaka-Ishii TakenobuTokunaga MasaoUtiyama v Table of Contents AnextensionofISO-Spaceforannotatingobjectdirection DaikiGotou,HitoshiNishikawaandTakenobuTokunaga....................................1 AnnotationandAnalysisofDiscourseRelations,TemporalRelationsandMulti-LayeredSituationalRe- lationsinJapaneseTexts KimiKaneko,SakuSugawara,KojiMineshimaandDaisukeBekki..........................10 DevelopingUniversalDependenciesforMandarinChinese HermanLeung,RafaëlPoiret,Tak-sumWong,XinyingChen,KimGerdesandJohnLee.......20 DevelopingCorpusofLectureUtterancesAlignedtoSlideComponents RyoMinamiguchiandMasatoshiTsuchiya................................................30 VSoLSCSum: BuildingaVietnameseSentence-CommentDatasetforSocialContextSummarization Minh-TienNguyen,DacVietLai,Phong-KhacDo,Duc-VuTranandMinh-LeNguyen........38 BCCWJ-DepPara: ASyntacticAnnotationTreebankonthe‘BalancedCorpusofContemporaryWritten Japanese’ MasayukiAsaharaandYujiMatsumoto...................................................49 SCTB:AChineseTreebankinScientificDomain ChenhuiChu,ToshiakiNakazawa,DaisukeKawaharaandSadaoKurohashi..................59 BigCommunityDatabeforeWorldWideWebEra TomoyaIwakura,TetsuroTakahashi,AkihiroOhtaniandKunioMatsui......................68 AnOverviewofBPPT’sIndonesianLanguageResources GunarsoGunarsoandHammamRiza.....................................................73 CreatingJapanesePoliticalCorpusfromLocalAssemblyMinutesof47prefectures YasutomoKimura,KeiichiTakamaru,TakumaTanaka,AkioKobayashi,HirokiSakaji,YuzuUchida, HokutoOtotakeandShigeruMasuyama.......................................................78 SelectiveAnnotationofSentenceParts: IdentificationofRelevantSub-sententialUnits GeXu,XiaoyanYangandChu-RenHuang................................................86 TheKyutechcorpusandtopicsegmentationusingacombinedmethod TakashiYamamura,KazutakaShimadaandShintaroKawahara ............................. 95 AutomaticEvaluationofCommonsenseKnowledgeforRefiningJapaneseConceptNet SeiyaShudo,RafalRzepkaandKenjiAraki..............................................105 SAMER: A Semi-Automatically Created Lexical Resource for Arabic Verbal Multiword Expressions To- kensParadigmandtheirMorphosyntacticFeatures MohamedAl-Badrashiny,AbdelatiHawwari,MahmoudGhoneimandMonaDiab...........113 SentimentAnalysisforLowResourceLanguages: AStudyonInformalIndonesianTweets TuanAnhLe,DavidMoeljadi,YasuhideMiuraandTomokoOhkuma.......................123 vii Conference Program Monday,December12,2016 09:00–09:05 Opening 09:05–10:25 OralSession1: Annotation AnextensionofISO-Spaceforannotatingobjectdirection DaikiGotou,HitoshiNishikawaandTakenobuTokunaga Annotation and Analysis of Discourse Relations, Temporal Relations and Multi- LayeredSituationalRelationsinJapaneseTexts KimiKaneko,SakuSugawara,KojiMineshimaandDaisukeBekki DevelopingUniversalDependenciesforMandarinChinese HermanLeung,RafaëlPoiret,Tak-sumWong,XinyingChen,KimGerdesandJohn Lee DevelopingCorpusofLectureUtterancesAlignedtoSlideComponents RyoMinamiguchiandMasatoshiTsuchiya 10:25–10:35 CoffeeBreak 10:35–11:55 OralSession2: Data VSoLSCSum: BuildingaVietnameseSentence-CommentDatasetforSocialContext Summarization Minh-Tien Nguyen, Dac Viet Lai, Phong-Khac Do, Duc-Vu Tran and Minh-Le Nguyen BCCWJ-DepPara: A Syntactic Annotation Treebank on the ‘Balanced Corpus of ContemporaryWrittenJapanese’ MasayukiAsaharaandYujiMatsumoto SCTB:AChineseTreebankinScientificDomain ChenhuiChu,ToshiakiNakazawa,DaisukeKawaharaandSadaoKurohashi BigCommunityDatabeforeWorldWideWebEra TomoyaIwakura,TetsuroTakahashi,AkihiroOhtaniandKunioMatsui ix Monday,December12,2016(continued) 12:00–14:00 LunchBreak 14:00–14:30 Postersession AnOverviewofBPPT’sIndonesianLanguageResources GunarsoGunarsoandHammamRiza CreatingJapanesePoliticalCorpusfromLocalAssemblyMinutesof47prefectures Yasutomo Kimura, Keiichi Takamaru, Takuma Tanaka, Akio Kobayashi, Hiroki Sakaji,YuzuUchida,HokutoOtotakeandShigeruMasuyama Selective Annotation of Sentence Parts: Identification of Relevant Sub-sentential Units GeXu,XiaoyanYangandChu-RenHuang 14:35–15:55 OralSession3: Analysis TheKyutechcorpusandtopicsegmentationusingacombinedmethod TakashiYamamura,KazutakaShimadaandShintaroKawahara AutomaticEvaluationofCommonsenseKnowledgeforRefiningJapaneseConcept- Net SeiyaShudo,RafalRzepkaandKenjiAraki SAMER: A Semi-Automatically Created Lexical Resource for Arabic Verbal Multi- wordExpressionsTokensParadigmandtheirMorphosyntacticFeatures MohamedAl-Badrashiny,AbdelatiHawwari,MahmoudGhoneimandMonaDiab Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets TuanAnhLe,DavidMoeljadi,YasuhideMiuraandTomokoOhkuma x
Description: