Lecture Notes in Computer Science 6910 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA MosheY.Vardi RiceUniversity,Houston,TX,USA GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany Ngoc Thanh Nguyen (Ed.) Transactions on Computational Collective Intelligence V 1 3 VolumeEditor NgocThanhNguyen WroclawUniversityofTechnology Wyb.Wyspianskiego27 50-370Wroclaw,Poland E-mail:[email protected] ISSN0302-9743(LNCS) e-ISSN1611-3349(LNCS) ISSN2190-9288(TCCI) ISBN978-3-642-24015-7 e-ISBN978-3-642-24016-4 DOI10.1007/978-3-642-24016-4 SpringerHeidelbergDordrechtLondonNewYork ©Springer-VerlagBerlinHeidelberg2011 Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnotimply, evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelaws andregulationsandthereforefreeforgeneraluse. Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Welcome to the fifth volume of Springer’s Transactions on Computational Collective Intelligence (TCCI).Itis the thirdissuein2011ofthis journalwhich isdevotedtoresearchincomputer-basedmethodsofcomputationalcollectivein- telligence(CCI)andtheirapplicationsinawiderangeoffieldssuchasgroupde- cision making, knowledge integration,consensus computing, the Semantic Web, social networks and multi-agent systems. TCCI strives to cover new computa- tional,methodological,theoreticalandpracticalaspectsofcollectiveintelligence understood as the form of intelligence that emerges from the collaboration and competition of many individuals (artificial and/or natural). This volume of TCCI includes ten interesting and original papers. The first of them, entitled“ImprovedN-grams Approach for Web Page Language Identi- fication”by Ali Selamat, presents an improvedN-grams approachfor Web page language identification,which is based on a combinationof an originalN-grams approach and a modified N-grams approach that has been used for language identificationofWebdocuments.Inthesecondpaperwiththetitle“Image-Edge DetectionUsingVariation-AdaptiveAntColonyOptimization”theauthors,Jing Tian,WeiyuYu, LiChen,andLihongMa,presenta novelimage-edgedetection approachusingantcolonyoptimizationtechniques,inwhichapheromonematrix representingedges atpixel positions of an image is built accordingto the move- mentsofanumberofantswhicharedispatchedtomoveontheimage.Thenext paper,“An Iterative Process for Component-Based Software Development Cen- tered on Agents”by Yves Wautelet, Sodany Kiv, and Manuel Kolp, includes a formalizationofthe processforcomponent-basedsoftwaredevelopmentthrough the use of the agent paradigm. In the fourth paper entitled“Cellular Gene Ex- pressionProgrammingClassifierLearning”theauthors,JoannaJedrzejowiczand (cid:2) PiotrJedrzejowicz,presentamethodforintegratingtwocollectivecomputational (cid:2) intelligence techniques: gene expression programming and cellular evolutionary algorithms with a view to inducing expression trees. This paper also includes a discussionofthevalidatingexperimentresultsconfirmingthehighqualityofthe proposed ensemble classifiers. The next paper, “A Situation-Aware Computa- tional Trust Model for Selecting Partners”by JoanaUrbano, Ana Paula Rocha, and Eugenio Oliveira,contains the description of a model for selecting partners inasociety,inwhichtheauthorsfocusoncontextualfitness,acomponentofthe model that adds a contextualdimensional to existing trust aggregationengines. The sixth paper entitled “Using the Perseus System for Modelling Epistemic Interactions”by Magdalena Kacprzak et al., includes a model for agent knowl- edge acquisition, using a logical puzzle in which agents increase their knowl- edge about the hats they wear and the software tool named Perseus. In the seventhpaper,“Reduction ofFaulty Detected Shot Cuts and Cross-DissolveEf- fects in Video Segmentation Process of Different Categories of Digital Videos,” VI Preface the author, Kazimierz Choro´s,presents a description of experiments confirming theeffectivenessoffourmethodsoffaultyvideodetectionreferringtofivediffer- entcategoriesofmovie:TVtalk-show,documentarymovie,animalvideo,action and adventure, and pop music video. In the next paper, “Using Knowledge- Integration Techniques for User Profile Adaptation Methods in Document Re- trieval Systems”by Bernadetta Mianowska and Ngoc Thanh Nguyen, a model for integrating the archival knowledge included in a user profile with the new knowledgedeliveredtoaninformationretrievalsystem,detectingandprovingits properties, is presented. The ninth paper entitled“Modeling Agents and Agent Systems,”by Theodor Lettmann et al., contains a universaland formal descrip- tion for agent systems that can be used as a core model with other existing models as special cases.The authors show that owing to this core model a clear specification of agent systems and their properties can be achieved. The last paper,“Online News Event Extraction for Global Crisis Surveillance”by Jakub Piskorskiet al.,presentsa real-timeandmultilingualnews eventextractionsys- tem developed at the Joint Research Centre of the European Commission. The authors show that with this system it is possible to accurately and efficiently extract violent and natural disaster events from online news. TCCIisapeer-reviewedandauthoritativereferencedealingwiththeworking potentialofCCImethodologiesandapplicationsaswellasemergingissuesofin- teresttoacademicsandpractitioners.TheresearchareaofCCIhasbeengrowing significantlyinrecentyearsandweareverythankfultoeveryonewithintheCCI research community who has supported the TCCI and its affiliated events in- cluding the International Conferences on Computational Collective Intelligence (ICCCI). The first ICCCI event was held in Wroclaw,Poland,in October 2009. ICCCI2010washeldinKaohsiung,Taiwan,inNovember2010andICCCI2011 took place in Gdynia, Poland, in September 2011. For ICCCI 2011 around 300 papers from 25 countries were submitted and only 105 papers were selected for inclusion in the proceedings published by Springer in LNCS/LNAI series. We will invite authors of the ICCCI papers to extend them and submit them for publication in TCCI. We are very pleased that TCCI and the ICCCI conferences are strongly ce- mented as high-quality platforms for presenting and exchanging the most im- portantandsignificantadvancesinCCIresearchanddevelopment.Itisalsoour pleasuretoannouncethecreationofthenewTechnicalCommitteeonComputa- tional Collective Intelligence within the Systems, Man and Cybernetics Society (SMC) of IEEE. We wouldlike to thank all the authorsfor their contributionsto TCCI. This issuewouldnothavebeenpossiblewithoutthegreateffortsoftheeditorialboard and many anonymously acting reviewers. We would like to express our sincere thanks to all ofthem. Finally,we wouldalsolike to expressour gratitudeto the LNCS editorial staff of Springer, in particular Alfred Hofmann, Ursula Barth, Peter Strasser and their team, who supported the TCCI journal. July 2011 Ngoc Thanh Nguyen Transactions on Computational Collective Intelligence This Springer journalfocuses on researchin applications of the computer-based methods of computationalcollective intelligence (CCI) andtheir applications in awiderangeoffieldssuchastheSemanticWeb,socialnetworksandmulti-agent systems.Itaimstoprovideaforumforthepresentationofscientificresearchand technologicalachievements accomplished by the international community. Thetopicsaddressedbythisjournalincludeallsolutionsofreal-lifeproblems forwhichitisnecessarytousecomputationalcollectiveintelligencetechnologies toachieveeffectiveresults.Theemphasisofthepaperspublishedisonnoveland original research and technological advancements. Special features on specific topics are welcome. Editor-in-Chief Ngoc Thanh Nguyen Wroclaw University of Technology, Poland Co-Editor-in-Chief Ryszard Kowalczyk Swinburne University of Technology,Australia Editorial Board John Breslin NationalUniversity ofIreland,Galway,Ireland Shi-Kuo Chang University of Pittsburgh, USA Longbing Cao University of Technology Sydney, Australia Oscar Cordon European Centre for Soft Computing, Spain Tzung-Pei Hong National University of Kaohsiung, Taiwan Gordan Jezic University of Zagreb,Croatia Piotr Jedrzejowicz Gdynia Maritime University, Poland (cid:2) Kang-Huyn Jo University of Ulsan, Korea Rados(cid:4)law Katarzyniak Wroclaw University of Technology, Poland Jozef Korbicz University of Zielona Gora, Poland Hoai An Le Thi Metz University, France Pierre L´evy University of Ottawa, Canada Tokuro Matsuo Yamagata University, Japan Kazumi Nakamatsu University of Hyogo, Japan ToyoakiNishida Kyoto University, Japan Manuel Nu´n˜ez Universidad Complutense de Madrid, Spain VIII Transactions on Computational Collective Intelligence Julian Padget University of Bath, UK Witold Pedrycz University of Alberta, Canada Debbie Richards Macquarie University, Australia Roman Sl(cid:4)owin´ski Poznan University of Technology, Poland Edward Szczerbicki University of Newcastle, Australia Kristinn R. Thorisson Reykjavik University, Iceland Gloria Phillips-Wren Loyola University Maryland, USA Sl(cid:4)awomir Zadroz˙ny Institute of Research Systems, PAS, Poland Table of Contents Improved N-grams Approach for Web Page Language Identification .... 1 Ali Selamat Image Edge Detection Using Variation-Adaptive Ant Colony Optimization .................................................... 27 Jing Tian, Weiyu Yu, Li Chen, and Lihong Ma An Iterative Process for Component-Based Software Development Centered on Agents .............................................. 41 Yves Wautelet, Sodany Kiv, and Manuel Kolp Cellular Gene ExpressionProgramming Classifier Learning............ 66 Joanna J¸edrzejowicz and Piotr J¸edrzejowicz A Situation-Aware Computational Trust Model for Selecting Partners ........................................................ 84 Joana Urbano, Ana Paula Rocha, and Eug´enio Oliveira Using the Perseus System for Modelling Epistemic Interactions ........ 106 Magdalena Kacprzak, Piotr Kulicki, Robert Trypuz, Katarzyna Budzynska, Pawe(cid:2)l Garbacz, Marek Lechniak, and Pawe(cid:2)l Rembelski Reduction of Faulty Detected Shot Cuts and Cross Dissolve Effects in Video Segmentation Process of Different Categories of Digital Videos ... 124 Kazimierz Choro´s Using Knowledge Integration Techniques for User Profile Adaptation Method in Document Retrieval Systems............................. 140 Bernadetta Mianowska and Ngoc Thanh Nguyen Modeling Agents and Agent Systems ............................... 157 Theodor Lettmann, Michael Baumann, Markus Eberling, and Thomas Kemmerich Online News Event Extraction for Global Crisis Surveillance........... 182 Jakub Piskorski, Hristo Tanev, Martin Atkinson, Eric van der Goot, and Vanni Zavarella Author Index.................................................. 213 Improved N-grams Approach for Web Page Language Identification Ali Selamat Software Engineering Research Group, Faculty of Computer Science & Information Systems, Universiti Teknologi Malaysia, UTM Johor Baharu Campus, 81310, Johor, Malaysia [email protected] Abstract. Language identification has been widely used for machine translations and information retrieval. In this paper, an improved N- grams(ING)approachisproposed forwebpagelanguage identification. The improved N-grams approach is based on a combination of original N-grams (ONG) approach and a modified N-grams (MNG) approach that has been used for language identification of web documents. The features selected from the improved N-grams approach are based on N- grams frequency and N-grams position. The features selected from the original N-grams approach are based on a distance measurement and the features selected from the modified N-grams approach are based on a Boolean matching rate for language identification of Roman and Arabic scripts web pages. A large real-world document collection from British Broadcasting Corporation (BBC) website, which is composed of 1000 documents on each of the languages (e.g., Azeri, English, Indone- sian, Serbian, Somali, Spanish, Turkish, Vietnamese, Arabic, Persian, Urdu, Pashto) have been used for evaluations. The precision, recall and F1 measures have been used to determine the effectiveness of the pro- posed improved N-grams (ING) approach. From the experiments, we have found that the improved N-grams approach has been able to im- prove the language identification of the contents in Roman and Arabic scripts web page documents from theavailable datasets. Keywords: Monolingual, multilingual, web page language identifica- tion, N-grams approach. 1 Introduction Language identification (LID) is the process of identifying the predefined lan- guagethathasbeenusedtowritevarioustypesofdocuments.Inordertoidentify thecontentofwebdocuments,humansarethemostaccuratelanguageidentifier. Within seconds of reading a passage of a text, humans can determine whether itis a languagethey canunderstand.If itis a languagethat they areunfamiliar with,theyoftencanmakesubjectivejudgmentsastoitssimilaritytoalanguage that they already know. In this research, a term “language” is used to refer to N.T.Nguyen(Ed.):TransactionsonCCIV,LNCS6910,pp.1–26,2011. (cid:2)c Springer-VerlagBerlinHeidelberg2011

