• ¨ Tansel Ozyer Keivan Kianmehr Mehmet Tan Editors Recent Trends in Information Reuse and Integration Editors TanselO¨zyer MehmetTan DepartmentofComputer DepartmentofComputerEngineering Engineering TobbUniversity TobbUniversity So¨g˘u¨to¨zu¨ CaddesiNo.43,So¨g˘u¨to¨zu¨ So¨g˘u¨to¨zu¨ CaddesiNo.43,So¨g˘u¨to¨zu¨ Ankara,Turkey Ankara,Turkey [email protected] [email protected] KeivanKianmehr DepartmentofElectricalandComputer Engineering UniversityofWesternOntario Building373 London,ON,N6A5B9,Canada [email protected] Thisworkissubjecttocopyright. Allrights arereserved, whetherthewholeorpart ofthematerialisconcerned, specif- icallythoseoftranslation,reprinting,re-useofillustrations,broadcasting,reproduction byphotocopyingmachinesorsimilarmeans,andstorageindatabanks. ProductLiability:Thepublishercangivenoguaranteeforalltheinformationcontained in this book. This does also refer to information about drug dosage and application thereof. In every individual case the respective user must check its accuracy by consultingotherpharmaceuticalliterature. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protectivelawsandregulationsandthereforefreeforgeneraluse. (cid:2)c 2012Springer-Verlag/Wien SpringerWienNewYorkispartofSpringerScience+BusinessMedia springer.at Typesetting:SPiPublisherServices Printedonacid-freepaperandchlorine-freebleachedpaper SPIN:80032388 With134(partlycoloured)Figures LibraryofCongressControlNumber:2011937549 ISBN978-3-7091-0737-9 e-ISBN978-3-7091-0738-6 DOI10.1007/978-3-7091-0738-6 SpringerWienNewYork Foreword We are delighted to see this edited book as the result of our intensive work over the past year. We succeeded in attracting high quality submissions of which we could only include 19 papers in this edited book. The present text aims at helping the reader whether researcher or practitioner to grasp the basic concept of reusability which is very essential in this rapidly growinginformationera. The authorsemphasizetheneedforreusabilityandhowitcouldbeadaptedintheright way.Actuallyreusabilityleadstosatisfyamultiobjectiveoptimizationprocess,that is,tominimizethetime,costandeffortspenttodevelopnewproducts,technologies, informationrepositories,etc.Insteadofalwaysstartingfromscratchandreinventing the wheel in a process that consumes and wastes time, effort, and resources, practitionersanddevelopersshouldalwayslookintothepossibilityofreusingsome of the existingentities to producenew ones. In otherwords, reuse andintegration are essential concepts that must be enforced to avoid duplicating the effort. This problemisinvestigatedfromdifferentperspectives.Inorganizations,highvolumes ofdatafromdifferentsourcesformabigthreatforfilteringouttheinformationfor effectivedecisionmaking.Toaddressallthesevitalandseriousconcerns,thisbook covers the most recent advances in information reuse and integration. It contains high quality research papers written by experts in the field. Some of them are extended versions of the best papers which were presented at IEEE International ConferenceonInformationReuseandIntegration,whichwasheldinLasVegasin August2010. Chapter 1 by Udai Shanker, B. Vidya Reddi, and Anupam Shukla studies a real time commit protocol to improve the performance based on two approaches. They are: 1. Some of the locked data items which are unused after completionof processingoftransactioncanbeunlockedimmediatelyafterendofprocessingphase to reducedata contention.2. A lendingtransaction can lend its dirty data to more than one cohorts by creating only a single shadow in case of write–read conflicts toreducethedatainaccessibility.Itsperformancehasbeencomparedwithexisting protocols. v vi Foreword Chapter 2 by Ladjel Bellatreche estimates the complexity of the problem of selecting dimension table(s) to partition a fact table. It also proposesstrategies to perform their selection that take into account the main characteristics of queries suchasaccessfrequencies,sizeoftables,sizeofintermediateresultsofjoins,etc. Experimentalstudiesusingamathematicalcostmodelandtheobtainedresultshave beenexecutedonOracle11GDBMSforvalidation,andatoolhasbeenimplemented to assist data warehouse administrators in their horizontal partitioning selection tasks. Chapter 3 by Shabnam Pourdehi, Dena Karimipour, Navid Noroozi, and Fari- doonShabaniaddressesan adaptivefuzzycontrollerfora largeclass of nonlinear systems in the presence of uncertainties, input nonlinearities, and unknown time- delay. Based on the combination of the sliding mode control (SMC) with fuzzy adaptivecontrol,itpresentsadesignalgorithmtosynthesizearobustfuzzysliding modecontroller.Later,anadaptivefuzzyobserver-basedSMCschemeisproposed for stabilization. The unknown nonlinear functions have been approximated with fuzzylogicsystemsinbothproposedcontrolschemesanditsasymptoticalstability according to the corresponding closed-loop system is shown with Lyapunov– Krasovskiiapproach.Thesynchronizationoftwononidenticaltime-delayedchaotic systemsisinvestigatedasanapplicationofcontrolschemeswithsimulatedexam- plestoshowtheeffectivenessoftheproposedtechniques. Chapter 4 by Ladjel Bellatreche, Guy Pierra, and Eric Sardet presents a tech- nique for integrating automatically ontology-baseddata sources in a materialized architecture and a framework dealing with the asynchronousversioning problem. Existingsolutionsproposedin traditionaldatabasesisadaptedtomanageinstance and schema changes. The problem of managing ontology changes is overcome by introducing a distinction between ontology evolution and ontology revolution. Finally,itproposesfloatingversionmodelforontologyevolution.Itfullyautomates the whole steps of building an ontology-based integration systems (OBIS). It is validatedbyaprototypeusingECCOenvironmentandtheEXPRESSlanguage. Chapter 5 by Iyad Suleiman, Shang Gao, Maha Arslan, Tamer Salman, Faruk Polat,RedaAlhajj,andMickRidleysuggestsaacomputerizedassessmenttoolthat canlearntheuser’sskillsandadjustthe assessmenttestsforassessmentofschool readiness. The user plays varioussessions from variousgames, while the Genetic Algorithm(GA)selectstheupcomingsessionorgroupofsessionstobechosenfor theuseraccordingtohis/herskillsandstatus.ItdescribesthemodifiedGAandthe learning procedure that is integrated with a penalizing system into the GA and a fitnessheuristicforbestchoiceselection.Twomethodsforlearningareproposed,a memorysystemandanomemorysystem.Furthermore,itincludesseveralmethods fortheimprovementofthespeedoflearning.Inaddition,learningmechanismsthat are based on the social network paradigm to address further usage of assessment automationisused. Chapter 6 by Gre´gory Claude, Gae¨l Durand, Marc Boyer, and Florence Se`des defendsthe idea that a defect carries informationby the simple fact that it exists, thecharacteristicsofthedetectedincident(theproblem)andoftheappliedprotocol toresolveit(thesolution).Thevariousactorswhotookpartin itsdescriptionand Foreword vii its resolution collect this information. This knowledge is essential to achieve for assistanceincorrectiveactionsforfuturedefectsandpreventionoftheiremergence. Takingtheadvantageofthisknowledgebyworkingoutamodelofdefectmakesit possibletodefinea setofgroupingcriteriaofdefectsthatweresolvedinthepast. Thesegroupsarethecornerstoneofthecorrectiveandpreventiveprocessesfornew defects. Chapter 7 by Abdullah M. Elsheikh, Tamer N. Jarada, Taher Naser, Kelvin Chung, Armen Shimoon, Faruk Polat, Panagiotis Karampelas, Jon Rokne, Mick Ridley, and Reda Alhajj presents a comprehensive approach for the mapping betweenthe objectdatabase language(ODL)and XML. Itincludesbothstructure specification and database content. It concentrates on deriving a separate set of transformation rules for each way mapping. For the first mapping (from ODL toXML). Chapter8byTaghiM.Khoshgoftaar,KehanGao,andJasonVanHulseproposes anovelapproachtofeatureselectionforimbalanceddatainthecontextofsoftware quality engineering. This process follows a repetitive process of data sampling followed by feature ranking and finally aggregating the results generated during therepetitiveprocess.Itiscomparedagainstfilter-basedfeaturerankingtechnique alone on the originaldata, and data sampling and featurerankingtechniqueswith twodifferentscenarios. Chapter 9 by ZifangHuang and Mei-LingShyupresentsa k-nearestneighbors (k-NN)-basedleast squares support vector machine (LS-SVM) model with multi- value integration to tackle the long-term time series prediction problem. A new distance function,which incorporatesthe Euclideandistance and the dissimilarity ofthetrendofatimeseries,isdeployed. Chapter 10 by Carlos M. Cornejo, Iva´n Ruiz-Rube, and Juan Manuel Dodero discusses their approach to provide a complete set of services and applicationsto integratediverseweb-basedcontentsoftheculturaldomain.Itispurposedtoextend over the web the knowledge base of cultural institutions, build user communities aroundit,andenableitsexploitationinseveralenvironments. Chapter 11 by Abdelghani Bakhtouchi, Ladjel Bellatreche, Chedlia Chakroun, andYamineA¨ıt-Ameurproposesanontology-basedintegrationsystemwithmedia- torarchitecture.Itexploitsthepresenceofontologyreferencedbyselectedsources to explicit their semantic. Instead of annotating different candidate keys of each ontologyclass,itssetoffunctionaldependenciesaredefinedonitsproperties. Chapter 12 by Mark McKenney gives an efficient algorithm for the map construction algorithm (MCP). The algorithm is implemented and experiments show that it is significantly faster than the naive approach, but is prone to large memory usage when run over large data sets. An external memory version of the algorithm is also presented that is efficient for very large data sets, and requires significantlylessmemorythantheoriginalalgorithm. Chapter 13 by Jairo Pava, Fausto Fleites, Shu-Ching Chen, and Keqi Zhang proposesasystemthatintegratesstormsurgeprojection,meteorological,topograph- ical, and road data to simulate storm surge conditions. The motivation behind the viii Foreword systemistoservelocalgovernmentsseekingtoovercomedifficultiesinpersuading residentstoadheretoevacuationnotices. Chapter 14 by Richa Tiwari, Chengcui Zhang, Thamar Solorio, and Wei- Bang Chen describes a framework to extract information about co-expression relationships among genes from published literature using a supervised machine learning approach. Later it rank those papers to provide users with a complete specializedinformationretrievalsystemwithDynamicConditionalRandomFields (DCRFs)fortrainingthemodel.TheyshowitssuperiorityagainstBayesNet,SVM, andNaveBayes. Chapter 15 by Brandeis Marshall identifies similar artists, which serves as a precursor to how music recommendation can handle the more complex issues of multiple genre artists and artist collaborations. It considers the individual most similarartistrankingfromthreepublic-useWebAPIs(Idiomag,Last.fm,andEcho Nest) as different perspectives of artist similarity. Then it aggregates these three rankingsusingfiverankfusionalgorithms. Chapter 16 by Ken Q. Pu and Russell Cheung presents a query facility called tag grid to support fuzzy search and queries of both tags and data items alike. By combining methods of Online Analytic Processing from multidimensional databases and collaborative filtering from information retrieval, tag grid enables userstosearchforinterestingdataitemsbynavigatinganddiscoveringinteresting tags. Chapter 17 by Stefan Silcher, Jorge Minguez, and Bernhard Mitschang intro- duces an SOA-based solution to the integration of all Product Lifecycle Manage- ment (PLM) phases. It uses an Enterprise Service Bus (ESB) as service-based integrationandcommunicationinfrastructure.Threeexemplaryscenariosareused toillustratethebenefitsofusinganESBascomparedtoalternativePLMinfrastruc- tures.Furthermore,itdescribesaservicehierarchythatextendsPLMfunctionality with value-added services by mapping business processes to data integration services. Chapter 18 by Awny Alnusair and Tian Zhao describes an ontology-based approachforidentifyingandretrievingrelevantsoftwarecomponentsinlargereuse libraries. It exploits the use of domain-specific ontologies to enrich a knowledge baseinitiallypopulatedwithontologicaldescriptionsofAPIcomponents. Chapter 19 by Du Zhang proposes an algorithm for detecting several types of firewall rule inconsistency. It also defines a special type of inconsistency called setuid inconsistency and highlights various other types of inconsistencies in the aforementionedareas. Finally, it concludes that inconsistency is a very important phenomenon,and its utilities can neverbe underestimatedin informationsecurity anddigitalforensics. Last but not the least, we would like to mention the hard workers behind the scene who have significant unseen contributions to the successful task that produced this valuable source of knowledge. We would like to thank the authors whosubmittedpapersandthereviewerswhoproduceddetailedconstructivereports which improved the quality of the papers. Various people from Springer deserve Foreword ix large credit for their help and support in all the issues related to publishing this book. In particular, we would like to thank Stephen Soehnlen for his dedications, seriousness, and generous support in terms of time and effort; he answered our emailsontimedespitehisbusyschedule,evenwhenhewastraveling. TanselO¨zyer,KeivanKianmehr,MehmetTan •