IDENTIFYING REMARKABLE RESEARCHERS USING CITATION NETWORK ANALYSIS EPHRANCE ABU UJUM FACULTY OF SCIENCE UNIVERSITY OF MALAYA KUALA LUMPUR 2014 IDENTIFYING REMARKABLE RESEARCHERS USING CITATION NETWORK ANALYSIS EPHRANCE ABU UJUM DISSERTATION SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE INSTITUTE OF MATHEMATICAL SCIENCES FACULTY OF SCIENCE UNIVERSITY OF MALAYA KUALA LUMPUR 2014 UNIVERSITI MALAYA ORIGINALLITERARYWORKDECLARATION NameofCandidate: EPHRANCEABUUJUM I.C./PassportNo.: 780923-12-5195 Registration/MatricNo.: SGP070002 NameofDegree: MASTEROFSCIENCE TitleofProjectPaper/ResearchReport/Dissertation/Thesis(“thisWork”): “IDENTIFYINGREMARKABLERESEARCHERSUSINGCITATIONNETWORKANALYSIS” FieldofStudy: MATHEMATICALMODELING Idosolemnlyandsincerelydeclarethat: (1) Iamthesoleauthor/writerofthisWork; (2) Thisworkisoriginal; (3) Anyuseofanyworkinwhichcopyrightexistswasdonebywayoffairdealingandforpermitted purposesandanyexcerptorextractfrom,orreferencetoorreproductionofanycopyrightwork hasbeendisclosedexpresslyandsufficientlyandthetitleoftheWorkanditsauthorshiphave beenacknowledgedinthisWork; (4) Idonot haveanyactual knowledgenordo Iought reasonably toknow thatthemaking ofthis workconstitutesaninfringementofanycopyrightwork; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (“UM”),whohenceforthshallbeownerofthecopyrightinthisWorkandthatanyreproduction oruseinanyformorbyanymeanswhatsoeverisprohibitedwithoutthewrittenconsentofUM havingbeenfirsthadandobtained; (6) IamfullyawarethatifinthecourseofmakingthisWorkIhaveinfringedanycopyrightwhether intentionallyorotherwise,Imaybesubjecttolegalactionoranyotheractionasmaybedeter- minedbyUM. (CandidateSignature) Date: Subscribedandsolemnlydeclaredbefore, Witness’sSignature Date: Name: PROFESSOR DR KURUNATHAN A/L RATNAVELU Designation: PROFESSOR ii ABSTRACT Experts or authorities within a research field exhibit specific traits in how they pub- lish as well as in how they are cited by others. An analysis of such citation dependen- cies requires a network approach whereby a researcher’s impact depends not only on the number of citations he/she has accumulated (over a given period of time) but also on the prominence of researchers who depend on their work. This thesis shall explore how to distinguish researchers based on temporal patterns of their publication and citation records. Asintuitionmaysuggest,theinfluenceofaresearcherisproportionaltothenumber ofcitationshe/shehasacquiredaswellastheinfluenceofhis/hercitingauthors. Authority canalsobeconferredtoaresearcherbyvirtueofhis/her(co)authoredworksthatcontinue toaccruecitationslongaftertheyearofpublication. In this thesis, experts or authorities are identified using the “temporal citation net- work analysis” approach of Yang, Yin, and Davison (2011). This method assigns a high influencescoretoresearcherswhoarestillactivelyandpersistentlypublishing,havelong publicationtrackrecord,andareheavilycited(especiallybyinfluentialpeers). Asacasestudy,themethodproposedbyYangandco-workersshallbeusedtoiden- tify authorities within the ISI Web of Knowledge category of “BUSINESS, FINANCE” spanning the period 1980-2011 inclusive. The thesis shall also explore a modification of thismethodtopredictrisingstarswithinthesamedataset. iii ABSTRAK Pakardalamsesebuahbidangpenyelidikanmenunjukkanciri-cirikhususdalamcarame- reka menerbitkan artikel dan juga dalam cara mereka dirujuk penyelidik lain. Anal- isa kebergantungan pemetikan perlu didekati dengan menggunakan konsep rangkaian di manaimpakseseorangpenyelidiktidakhanyabergantungkepadajumlahpemetikanyang diperolehi (dalam suatu jangka masa tertentu), tetapi juga pada kewibawaan penyelidik- penyelidiklainyangbergantungkepadakaryadanciptaannya. Disertasiinimeneliticaramembezakanpenyelidikdenganmengeksploitasikanpola bataswaktudalamrekodpenerbitandanpemetikanmereka. Sepertiyangdicadangkanin- tuisi,pengaruhseseorangpenyelidikberkadarterusdenganjumlahpemetikanyangdiper- olehi serta pengaruh penyelidik yang memetik artikelnya. Kewibawaan turut diberikan kepadaseseorangpenyelidikmenerusikaryakongsiyangmenerimapemetikanbeterusan walaubertahunlamasejaktahunpenerbitan. Disertasiiniakanmengenalpastipakardenganmenggunakankaedah“temporalcita- tionnetworkanalysis”yangdisarankanolehYangetal.(2011). Kaedahinimemberiskor pengaruh yang tinggi kepada penyelidik yang masih aktif dan menerbitkan artikel secara beterusan, mempunyai rekod penerbitan yang ekstensif, dan juga dipetik secara intensif (terutamasekalidaripadakumpulanyangberpengaruh). Sebagai kes kajian, kaedah yang disarankan oleh Yang et al. akan digunakan un- tukmengenalpastipakar-pakardalamkategorisubjek“BUSINESS,FINANCE”daripada pangkalan data ISI Web of Knowledge dalam jangka waktu merentangi tahun 1980 se- hingga (dan termasuk) tahun 2011. Disertasi in juga meneliti modifikasi kaedah Yang et al.untukmeramalpakaryangakandatangdenganmenggunakansetdatayangsama. iv ACKNOWLEDGEMENTS First and foremost, I offer thanks to God for giving me the opportunity to explore ideas. I express great gratitude to my supervisor Professor Dr. Kurunathan Ratnavelu for his unwaveringsupportandbeliefinmeovertheyears. Hisguidanceandwisdomhasmade, and will continue to make an enduring impact on my life. I also wish to express heartfelt gratitudetomycollaboratorandfriend,Dr.ChoongKwaiFatt,forrelentlesslyencourag- ingthepursuitofthisworkandotherrelatedproblems. I hereby thank the hardworking staff of the Institute of Mathematical Sciences, Fac- ulty of Science, and the Institute of Postgraduate Studies for their invaluable help and counselling especially during various setbacks I faced during the completion of this the- sis. I am profoundly indebted to their patience and generosity. Some critical directions inthisworkwasdevelopedundergrantsRG146/10HNEandRG298/11HNEprovisioned undertheUMRGscheme,andlaterthroughtheUMHighImpactResearch(HIR)grant. IwishtothankThomsonReutersforthedatausedinthisthesis,obtainedspecifically viainstitutionalaccesstotheWebofKnowledge. Ialsowishtoacknowledgetheingenuity oftheOpenSourcecommunity,specificallyinLinux,R,Perl,Python,Gephi,andLATEX. To my friends and colleagues: I am grateful to Chin Jia Hou, Melody Tan, and Tan HuiXuanforthehelptheyhavegivenmefromtimetotime. Agreatdealoftheideasand questions pursued in this work stemmed from useful discussions I have had with these wonderful people. All in all, I owe my family for their endless love and understanding, without which this work would have been impossible to accomplish. It is to them that I dedicatethiswork. v TABLEOFCONTENTS ORIGINALLITERARYWORKDECLARATION ii ABSTRACT iii ABSTRAK iv ACKNOWLEDGEMENTS v TABLEOFCONTENTS vi LISTOFFIGURES viii LISTOFTABLES ix LISTOFAPPENDICES xi CHAPTER1: INTRODUCTION 1 1.1 Background 1 1.2 LiteratureReview 5 1.2.1 Quantifyingauthorityandexpertise 5 1.2.2 Identifyingauthoritiesandexpertsonnetworks 13 1.2.3 Citationnetworkofresearchpapers 25 1.2.4 Citationnetworkofauthors 28 CHAPTER2: METHODOLOGY 30 2.1 Definitionsandnotation 30 2.1.1 Basicdefinitions 30 2.1.2 Networkproperties 32 2.2 Data 34 2.2.1 Processingthedata 34 2.2.2 Extractingcitations 50 2.3 Networkanalysis 53 2.3.1 Documentcitationnetwork(DCN) 53 2.3.2 Authorcitationnetwork(ACN) 56 2.3.3 Yang-Yin-Davisonlinkweightingscheme 59 2.3.4 Goodnessofprediction 61 2.4 OutlineofMethodology 62 2.5 Softwareused 65 CHAPTER3: ANALYSIS 67 3.1 Documentcitationnetwork 67 3.2 Journalcitationnetwork 77 3.3 Identifyingexpertsandauthorities 88 3.4 Identifyingrisingstars 98 vi CHAPTER4: CONCLUSION 104 APPENDICES 109 REFERENCES 143 vii LISTOFFIGURES Figure2.1 ISIdatafieldtags 37 Figure2.2 SampleISIdata 38 Figure2.3 ParsingISIdata. 63 Figure2.4 Snapshotofdocumentcitationnetwork(DCN)centredonone paper,i.e.“fama.ef_1993_j.financ.econ_v33_p3”. Numerical valuesonlinkscorrespondstoCIR values. Inset: illustrationof hierarchicalstructureduetotimeorderingofpapersontheDCN. 64 Figure2.5 Snapshotofauthorcitationnetwork(ACN)centredononeauthor, i.e.“fama.ef”. NumericalvaluesonlinkscorrespondstoCI values. 64 Figure2.6 OutlineofCoarse-Grain(CG)scheme. 65 Figure2.7 OutlineofYYDscheme. 65 Figure3.1 Giantweaklyconnectedcomponentofdocumentcitationnetwork (DCN).Nodesarecolor-codedviacommunitydetectionmethod ofBlondel,Guillaume,Lambiotte,andLefebvre(2008)and plottedusinganopensourcegraphvisualisationandexploration toolcalledGephi(Bastian,Heymann,&Jacomy,2009). 68 Figure3.2 Documentcitationnetwork(DCN)fornodesinthetop20listby citationcount(in-degreecentrality). Nodesarecolor-codedbyyear andsizedbycitationcountontheentireDCN.PlottedwithGephi. 72 Figure3.3 Documentcitationnetwork(DCN)fornodesinthetop20listby PageRank. Nodesarecolor-codedbyyearandsizedbyPageRank scoreontheentireDCN.PlottedwithGephi. 72 Figure3.4 Journalcitationnetworkfor“Business,Finance”(1980–2011). Communitydetectionwascarriedoutusingthehierarchical optimizationofmodularitymethoddevelopedbyBlondelet al.(2008). Community(module)membershipisaslisted inTable3.4. PlottedwithGephi. 78 Figure3.5 Giantweaklyconnectedcomponentofauthorcitationnetwork (DCN).Nodesarecolor-codedviacommunitydetectionmethod ofBlondeletal.(2008)andplottedusingGephi. 89 Figure3.6 ScatterplotofcorrelationmatrixinTable3.8. Graphicisproduced usingthePerformanceAnalyticspackageinR(Carl,Peterson, Boudt,&Zivot,2009). 90 FigureA.1 Aseminalpaperspansastructuralholeinthecitationnetwork, i.e.,advancesworkindifferentgroupsofdenselyconnectedpapers (indicatedbydifferentcolours). 117 FigureA.2 Anintegrativepapercitesasetofpapersthatthemselvesdonot citeeachother. 119 viii LISTOFTABLES Table1.1 Numberofarticlesin30journalsunderthe“BUSINESS, FINANCE”dataset,thatmaintainforward/reversealphabetical orderingatleast50%ofthetime. Eachjournalhasatleast50 articlesco-authoredby2ormoreworkersovertheperiod 2005–2010. 8 Table2.1 Sourcedataparameters 34 Table2.2 Coverageofarticlesandcitationswithinthe“Business,Finance” studydataset. Seetextfordetails. 42 Table2.3 Citationandin-degreestatistics. 51 Table3.1 Propertiesofdocumentcitationnetwork(DCN). 68 Table3.2 Thetop20citedarticles. JAR,JF,JFE,andRFSdenotethejournals JournalofAccountingResearch,TheJournalofFinance,Journalof FinancialEconomics,andReviewofFinancialStudies,respectively. Theasterisk(*)denotesarticleswithPageRank-to-CiteRank ratiolargerthan10. 73 Table3.3 Thetop20articlesbyGooglePageRankscore. JF,JFE,JME,and MFdenotethejournalsTheJournalofFinance,Journalof FinancialEconomics,JournalofMonetaryEconomics,and MathematicalFinance,respectively. Theasterisk(*)denotes articleswithCiteRank-to-PageRank ratiolargerthan10. 74 Table3.4 ModulemembershipforjournalsinFigure3.4. 79 Table3.5 Centralityof“Business,Finance”journalsbasedoninter-journal citationlinksspanningthe5-yearperiod2007–2011. Journalsare listedbydecreasingstructuralinfluencescore,S. C ,C ,C , D C B denotedegree,closeness,andbetweennesscentrality,respectively. Theinandoutsuperscriptsdenotein-linkandout-linkversionsof thecorrespondingcentralityalgorithm. PR0.86,PR0.5,auth,and hubdenotestheGooglePageRankscorewithd = 0.86,PageRank withd = 0.5,HITSauthority,andHITShubscore,respectively. 82 Table3.6 Rankof“Business,Finance”journalsbasedoninter-journalcitation linksspanningthe5-yearperiod2007–2011. Journalsarelistedby decreasingstructuralinfluencescoreS. 86 Table3.7 Propertiesofauthorcitationnetwork(ACN). 88 Table3.8 Spearmanrankcorrelationcoefficientfornodeattributesongiant componentoftheauthorcitationnetworkconstructedinthisstudy. h-indexscoresareestimatedbasedonarticleslimitedtojournalsin thestudydataset(i.e.ISI-indexedarticlespublishedunder the“Business,Finance”subjectcategoryspanningtheperiod 1980-2011). Valuesinthelowertrianglecorrespondtocorrelation p-values. 90 ix
Description: