ebook img

Metode analize podatkov o raziskovalni dejavnosti na primeru aplikacije IST World PDF

107 Pages·2007·3.67 MB·Slovenian
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Metode analize podatkov o raziskovalni dejavnosti na primeru aplikacije IST World

UNIVERZA V LJUBLJANI FAKULTETA ZA RAČUNALNIŠTVO IN INFORMATIKO Jure Ferlež Metode analize podatkov o raziskovalni dejavnosti na primeru aplikacije IST World MAGISTRSKO DELO Ljubljana, 2007 UNIVERZA V LJUBLJANI FAKULTETA ZA RAČUNALNIŠTVO IN INFORMATIKO Jure Ferlež Metode analize podatkov o raziskovalni dejavnosti na primeru aplikacije IST World MAGISTRSKO DELO Mentor: akad. prof. dr. Ivan Bratko Somentor: doc. dr. Dunja Mladenič Ljubljana, 2007 Metode analize podatkov o raziskovalni dejavnosti na primeru aplikacije IST World POVZETEK V magistrskem delu izdelamo in uporabimo (1) metode strojnega učenja za integracijo podatkov iz več podatkovnih virov in (2) metode rudarjenja v podatkih o raziskovalni dejavnosti v podporo iskanju partnerjev v kontekstu procesa prenosa znanja. Najprej opišemo spletni informacijski sistem IST World, ki je dosegljiv na spletnem naslovu http://www.ist- world.org. Ta predstavlja okolje, v katerem delujejo opisane metode integracije in analize podatkov o raziskovalni dejavnosti. Nato opišemo aplicirane metode strojnega učenja, ki jih uporabimo za podporo integraciji podatkov, ki izvirajo iz različnih podatkovnih virov. Za reševanje tega problema v kontekstu sistema IST World smo razvili metodologijo integracije podatkov, ki temelji na sodobnih metodah analize podatkov, kot so rudarjenje v besedilih, obrnjeno indeksiranje besed z dokumenti in strojno učenje. Uspešnost razvite metodologije smo empirično potrdili s poskusom integracije raziskovalnih podatkov iz baze evropskih projektov CORDIS. V drugem delu magistrske naloge predstavimo uporabljene metode analize podatkov o raziskovalni dejavnosti za podporo procesa iskanja partnerjev. Cilj analize je avtomatsko identificirati in slediti intenzivnosti tematike dela ter vzorcev sodelovanja znanstvenih akterjev. V tem delu opišemo metode rudarjenja v besedilih in grafih sodelovanja, ki omogočajo razpoznavanje kompetenc, konzorcijev, razvoja kompetenc in razvoja konzorcijev. Uspešnost algoritmov analize prikažemo na poskusih, ki potrdijo, da se rezultati ujemajo s človeško intuicijo. Ključne besede: integracija podatkov, povezovanje zapisov, iskanje podvojenih zapisov, rudarjenje v besedilih, urejevalna razdalja, aktivno učenje, strojno učenje, metoda podpornih vektorjev, iskanje partnerjev, vizualizacija podatkov, kompetenca, konzorcij, rudarjenje v podatkih, singularni razcep, večdimenzionalno lestvičenje in gručenje. UNIVERSITY OF LJUBLJANA FACULTY OF COMPUTER AND INFORMATION SCIENCE Jure Ferlež Methods for Analysis of Research Related Data in the IST-World Application M.SC. THESIS Supervisor: Acad. Prof. Dr. Ivan Bratko Co-Supervisor: Doc. Dr. Dunja Mladenič Ljubljana, 2007 Methods for Analysis of Research Related Data in the IST-World Application ABSTRACT In this master thesis we implement and apply (1) machine learning algorithms to support the integration of data coming from different data sources and (2) data mining algorithms for analysis of research related data to support the partner search process in the knowledge transfer scenario. We begin by giving an overview of the IST World portal, which is an online information system we developed for supporting partner search in the knowledge transfer process. The portal, accessible at http://www.ist-world.org is the environment in which the described algorithms for data integration and data analysis are put to use. We then describe the applied machine learning methods for integrating research related data, which originates from several data sources, into a single integrated dataset. We developed an integration approach based on state of the art data analysis methods such as text mining, inverted indexing and active learning for solving the record linkage problem in the scope of the IST World system. The approach was empirically evaluated with an experiment in integration of research related data from the European CORDIS database of research projects. The second part of the thesis is centered on research related data analysis for the purpose of supporting the partner search process. The goal of the developed and applied data mining algorithms is to automatically identify and track the topics of work and collaboration communities of the analyzed research actors. We describe the used text and graph mining algorithms enabling identification of competences, consortia, competences development and consortia development. We conclude by illustrating the effectiveness of these algorithms in several experiments and by showing that the results agree with human intuition. Keywords: data integration, record linkage, duplicate detection, machine learning, text mining, string kernel, edit distance, active learning, support vector machine, partner search, data mining, data visualisation, competence, consortium, latent semantic indexing, singular value decomposition, multidimensional scaling and clustering.

Description:
podatkov iz več podatkovnih virov in (2) metode rudarjenja v podatkih o raziskovalni dejavnosti v important collaboration communities and the correspondence between the individual records with the .. research databases, regional business indexes, global citation databases, the IST World.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.