ebook img

Quality Issues in the Management of Web Information PDF

209 Pages·2013·3.591 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Quality Issues in the Management of Web Information

INTELLIGENT SYSTEMS REFERENCE LIBRARY Volume 50 Gabriella Pasi Gloria Bordogna Lakhmi C. Jain (Eds.) Quality Issues in the Management of Web Information 123 Intelligent Systems Reference Library Volume 50 SeriesEditors J.Kacprzyk,Warsaw,Poland L.C.Jain,Adelaide,Australia Forfurthervolumes: http://www.springer.com/series/8578 · Gabriella Pasi Gloria Bordogna Lakhmi C. Jain Editors Quality Issues in the Management of Web Information ABC Editors GabriellaPasi LakhmiC.Jain DipartimentodiInformatica UniversityofCanberra SistemisticaeComunicazione Canberra UniversitàdegliStudidiMilanoBicocca Australia Milano and Italy UniversityofSouthAustralia SouthAustralia GloriaBordogna Australia CNR-IDPA–Istitutoperla Dinamicadei ProcessiAmbientali Dalmine Italy ISSN1868-4394 ISSN1868-4408 (electronic) ISBN978-3-642-37687-0 ISBN978-3-642-37688-7 (eBook) DOI10.1007/978-3-642-37688-7 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013934982 (cid:2)c Springer-VerlagBerlinHeidelberg2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpub- lication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforany errorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespect tothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Foreword ProfessorDr.CarloBatini DepartmentofInformatics,SystemsandCommunication UniversityofMilanBicocca, Milan Italy AbookonQualityissuesinthemanagementofWebInformationhastodealwitha potentiallywidenumberofissues.Theconceptofqualityispervasive,sopervasive that it is even difficult to providea shared and usable definition of the concept of quality. The difficulty is reduced(not too much...) if we delimit the area of con- sidered technologies and related resources. This book is focused on web retrieval technologies and on the information resource, that web retrieval technologies ac- cess and manipulate to provide knowledge access services to human beings and computerapplications. Informationisinturnapervasiveconcept,thatisinherentlyrelatedtoothertwo concepts,dataandknowledge.WecansaywithBoisot[1999]that“dataisdiscrim- inationbetweenphysicalstatesofthings(black,white,etc.)thatmayconveyornot conveyinformationto an agent. Whether it does so or not dependson the agent’s priorstockofknowledge.....thus,whereasdatacanbecharacterizedbyaproperty ofthings,knowledgeisapropertyofagents...informationestablishesarelationship betweenthingsandagents.” Theworldisdirty,andalsotheWeb,oftenatoovividandfaithfulrepresentation of the world, is dirty. Yet, the Web is and will be more and more in the future, the mostaccessed sourceof knowledgeforthe humanbeings.Fromthisscenario, wecanunderstandwhytheissueofinformationqualityisofgrowingrelevancein Computer Science literature and InformationSystems applications, and in a wide spectrumofresearchareasandreallifeapplications. The issue ofqualityhasbeenhistoricallyinvestigatedfirstin thesimplestcase, data stored in databases, structured in domains and tables, and managed in transactionalapplications, under the rigid controlof the organization.The second VI Foreword agecorrespondstothedispersionofdataofinterestfortheorganizationinamulti- plicityofdatabases,heterogeneousinformat,contentandsemantics,thatleadtyp- ically to represent in the information systems of the organization the same entity oftherealworldwithmultipleheterogeneousrepresentations,characterizedusually bydifferentlevelsofquality. A number of dimensions and related metrics have been proposed to formally characterizequalityofdatainthesetwoscenarios.Ananalysisoftheliteratureon data quality (see Batini and Scannapieco [2006] and Batini et al. [2009]),reports more than 50 dimensions and about 100 metrics, and at least 12 methodologies for the assessment and improvement of data quality in Information Systems us- ing the database technology. Among dimensions, the most relevant are accuracy, currency, completeness and consistency, for the definitions see Batini and Scan- napieco[2006].Techniquesrangefromrecordlinkageandentityidentification,to datacleansing,qualitydrivenqueryanswering,editimputationandcorrection,out- lier identification.With the adventof networksand the planetarydiffusionsofthe Web, new typesof informationsystems and data access and usage paradigmshad tobeconsidered.Amonginformationsystems,cooperativeinformationsystemsal- low different autonomous organizations to share data, applications and services, whilepeer-to-peersystemsarecharacterizedbyhigherautonomyandheterogeneity and absence of commonmanagementof data. Among new data access and usage paradigms, the evolution of information systems to cover a wide range of infor- mationrepresentations,suchassemistructuredtexts,unstructuredtexts,maps,im- ages,videos,sounds,leadtodevelopaccessmechanismswheresearchesarebased onmetadata,tagsandfull-textindexing,givingrisetotheInformationretrievalre- searchdiscipline. In theareaofdataandinformationquality,theabovediversificationresultedin theinvestigationofdimensions,methodologiesandtechniquesthatcoverallofthe above mentioned types of information representations, previously in the world of single-organizationinformationsystemsandcooperativeinformationsystems,and now in the differentarticulations of peer-to-peerinformationsystems and the im- menseworldoftheWeb.Andwhileafilrougecanbeidentifiedamongdimensions definedinthedifferenttypesofinformationrepresentations(seeBatinietal.[2012] fora discussion),whenthe othercoordinatesare considered(typesof information systems and the Web), the need arises to consider new dimensions and new tech- niques. Among dimensions and related determinants, due to the uncontrolled and “anarchic”characteroftheWeb,theattentionisshiftedtodimensionssuchastrust- worthiness,provenance,authority,ageandpopularity(see e.g.fora discussionon Ramachandranet al. [2009])that refer to quality of sources, besides the data and informationtheyconvey. Focusing on the main theme of this volume, techniques are a wide range and coverissuessuchasqualitydrivenretrieval,qualityawaresimilaritysearch,quality ofvolunteeredgeographicalinformationsystems,qualitybasedknowledgediscov- eryinspecificdomains,qualityofwebengines.Suchtechniquesareinvestigatedin severalpapersofthepresentvolume. Foreword VII References 1. Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer(2006) 2. Batini,C.,Cappiello,C.,Francalanci,C.,Maurino,A.:Methodologies fordataquality assessmentandimprovement.ACMComputingSurveys(2009) 3. Batini,C.,Palmonari,M.,Viscusi,G.:Themanyfacesofinformationandtheirimpacton informationquality.In:ProceedingsofInternationalConferenceonInformationQuality, Paris(2012) 4. Boisot, M.: Knowledge Assets: Securing Information Advantage in the Information Economy.OxfordUniversityPress(1998) 5. Ramachandran,S.,Paulraj,S.,Joseph,S.,Ramaraj,V.:EnhancedTrusworthyandHigh- QualityInformationRetrievalSystemforWebSearchEngines.IJCSIInternationalJour- nalofComputerScienceIssues5(2009) Preface Thismainfocusofthisbookisonthequalityissueinthemanagementofinforma- tionusedinWebapplications.Avarietyoftasksareconcernedwithandaffectedby theassessmentofquality.Thechaptersincludedinthisbookarerelatedtothetasks of InformationRetrieval, GeographicInformationRetrieval, InformationFiltering andKnowledgeExtraction.Theseareasdemonstratethatbymodellingandexploit- ing the quality dimensions of the information objects considered, it is possible to improvesystems’effectiveness. Theproblemofassessingthequalityoftextualinformationhasbeeninvestigated foralongtime.Severaldistinctproposalshavebeenformulated.Thereisnotasin- gle unifying consensual definition of a texts’ suitability for the task in hand. The problemoftexts’qualityassessmentmaybeconsideredinrelationtotheinforma- tioncontentitself(objectivecriteria),orfromtheuserpointofview.Forexample, inthecontextofInformationRetrievalitisclearthattherelevanceofthedocuments toarequestdependsonseveralaspects.Thesearerelatedtothedistinctproperties ofthedocuments,thesearch,theuserwhoformulatedthequeryandtheuserswho accessed the documentspreviously.It may include other informationsuch as user ratings and tags, and the contextof both documentsand queries. One of the rele- vancedimensionsmayberelatedtothequalityofdocuments.IncaseofWebpages wellknownalgorithms(suchasPageRank)havebeendefined. Thisbookhasbeenorganisedintoninechapters.Itincludesrecentcontributions related to quality-based information management on the Web. Academic and ap- plied researchers working on the issue of information quality will find the book a valuable referenceresource. The methods,models and systems proposedin this bookcaninspireandmotivatefurtherresearchonimportantissues.Itishopedthat finalyearundergraduate,mastersandPhDstudentsincomputerscience,andinfor- mationsystemswillfindinthisbookanexcellentcompiledreferencetextfortheir futurestudies. X Preface We wish to thank all the contributorsand referees for their excellentwork and assistanceandSpringer-Verlaginproducingthispublication. GabriellaPasi,Italy GloriaBordogna,Italy LakhmiC.Jain,Australia Editors GabriellaPasi Gabriella Pasigraduatedin ComputerScience atthe Universita` degli Studi di Milano, Italy, and took a PhD in Computer Science at the Universite´ de Rennes,France.SheworkedasaresearcherattheNa- tional Council of Research in Italy till 2005. Since 2005 she is Associate Professor at the Universita` DegliStudidiMilanoBicocca,Milano,Italy,where, within the Department of Informatics, Systems and CommunicationsheleadstheInformationRetrievalresearchLab.Herresearchac- tivity mainly concerns the modelling and design of flexible and personalised sys- temsforthemanagementandaccesstoinformation(suchasInformationRetrieval Systems,InformationFilteringSystemsandDataBaseManagementSystems). She is member of organizing and program committees of several international conferences. She has co-edited eight books and several special issues of Interna- tionalJournals.She haspublishedmorethan 180papersonInternationalJournals andBooks,andontheProceedingofInternationalConferences.Sheisinvolvedin severalactivitiesfortheevaluationofresearch,inparticular,shewasappointedas anexpertoftheComputerSciencepanelfortheStartingGrantsoftheProgramme IdeasattheEuropeanResearchCouncil.SheistheVice-PresidentoftheEuropean SocietyforFuzzyLogicandTechnologies(EUSFLAT). She is a memberof the EditorialBoardof the internationaljournalsFuzzy Sets andSystems,JournalofComputationalIntelligenceSystems,WebIntelligenceand AgentSystems,IntelligentDecisionTechnology:AnInternationalJournal,andACM AppliedComputingReview. ShewasthecoordinatoroftheEuropeanProjectPENG(PersonalizedNewsCon- tent Programming),a STREP (Specific Targeted Research or InnovationProject), withintheVIFrameworkProgramme,PriorityII,InformationSocietyTechnology. She organized several International events among which the IEEE / WIC / ACM Intenational Joint Conference on Web Intelligence and Intelligent Agent

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.