ebook img

A quantitative analysis of measures of quality in science PDF

0.18 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A quantitative analysis of measures of quality in science

A QuantitativeAnalysisofMeasures ofQualityinScience Sune Lehmann ∗ InformaticsandMathematicalModeling,TechnicalUniversityofDenmark,Building321,DK-2800Kgs.Lyngby,Denmark. Andrew D. Jackson and Benny E. Lautrup The Niels Bohr Institute, Blegdamsvej 17, DK-2100 København Ø, Denmark. (Dated:February2,2008) 7 Condensingtheworkofanyacademicscientistintoaone-dimensionalmeasureofscientificqualityisadiffi- 0 cultproblem.Here,weemployBayesianstatisticstoanalyzeseveraldifferentmeasuresofquality.Specifically, 0 wedetermineeachmeasure’sabilitytodiscriminatebetweenscientificauthors. Usingscalingarguments, we 2 demonstratethatthebestofthesemeasuresrequireapproximately50paperstodrawconclusionsregardinglong n termscientificperformance withusefullysmallstatisticaluncertainties. Further, theapproach describedhere a permitsthevalue-free(i.e.,statistical)comparisonofscientistsworkingindistinctareasofscience. J 7 PACSnumbers:89.65.-s,89.75.Da 2 ] I. INTRODUCTION and citations (in-links). Citation networks have frequently h beenusedasanexampleofgrowingnetworkswithpreferen- p tial attachment [13]. For reviews on this extensive subject, - It appears obvious that a fair and reliable quantification c see [14, 15, 16]. The aim of the present paper is to take of the ‘level of excellence’ of individualscientists is a near- o such studies in a novel direction by addressing the question s impossibletask[1,2,3,4,5]. Mostscientistswouldagreeon ofwhich one-dimensionalmeasureof citationdata is best in . twoqualitativeobservations: (i)Itisbettertopublishalarge s a manner which is both quantitative and free of value judg- c numberofarticlesthanasmallnumber.(ii)Foranygivenpa- i per,itscitationcount—relativetocitationhabitsinthefieldin ments. Given the remarks above, the ability to answer this s questiondependsonacarefuldefinitionoftheword‘best’. y whichthepaperispublished—providesameasureofitsqual- h ity.Itseemsreasonabletoassumethatthequalityofascientist Theprimarypurposeofanalyzingandcomparingthecita- p isa functionofhisor herfullcitationrecord1. Thequestion tionrecordsofindividualscientistsistodiscriminatebetween [ iswhetherthisfunctioncanbedeterminedandwhetherquan- them, i.e., to assign some measure of quality and its associ- 1 titativelyreliablerankingsofindividualscientistscanbecon- ated uncertainty to each scientist considered. Whatever the v structed. A varietyof ‘best’measuresbased oncitation data intrinsic and value-basedmerits of the measure, m, assigned 1 have been proposed in the literature and adopted in practice toeveryauthor,itwillbeofnopracticalvalueunlessthecorre- 1 [6,7]. Thespecificmeritsclaimedforthesevariousmeasures spondinguncertainty,d missufficientlysmall.Fromthispoint 3 rely largely on intuitive argumentsand value judgmentsthat of view, the best choice of measure will be that which pro- 1 arenotamenabletoquantitativeinvestigation.(Honestpeople vides maximal discrimination between scientists and hence 0 7 candisagree,forexample,ontherelativemeritsofpublishing thesmallestvalueofd m. We willdemonstratethattheques- 0 a single paper with 1000 citations and publishing 10 papers tionofdecidingwhichofseveralproposedmeasuresis most / with100citationseach.) Theabsenceofquantitativesupport discriminating,andtherefore‘best’,canbeaddressedquanti- s c for anygivenmeasure of quality based on citation data is of tativelyusingstandardstatisticalmethods. i concern since such data is now routinely considered in mat- Although the approach is straightforward, it is useful first s y tersof appointmentandpromotionwhichaffecteverywork- todescribeitingeneral. We beginbybinningallauthorsby h ingscientist. sometentativemeasure,m,ofthequalityoftheirfullcitation p Citationpatternsbecamethetargetofscientificscrutinyin record. Theprobabilitythatanauthorwilllieinbina isde- v: the1960saslargecitationdatabasesbecameavailablethrough notedp(a ). Similarly,webineachpaperaccordingtothetotal i the work of Eugene Garfield [8] and other pioneers in the numberofcitations2. Thefullcitationrecordforanauthoris X field ofbibliometrics. A surprisingly,largebodyofworkon simply the set ni , where ni is the number of his/her paper ar thestatisticalanalysisofcitationdatahasbeenperformedby incitationbini{. F}oreachauthorbin,a , we thenempirically physicists. Relevant papers in this tradition include the pio- constructtheconditionalprobabilitydistribution,P(ia ),that | neering work of D. J. de Solla Price, e.g. [9], and, more re- a singlepaperbyanauthorinthisbin willlie in citationbin cently, [7, 10, 11, 12]. In addition, physicists are a driving i. Theseconditionalprobabilitiesarethecentralingredientin forceintheemergingfieldofcomplexnetworks.Citationnet- our analysis. They can be used to calculate the probability, worksrepresentone popularnetworkspecimenin whichpa- P( ni a ),thatanyfullcitationrecordwasactuallydrawnat perscorrespondtonodesconnectedbyreferences(out-links) ran{do}m|ontheconditionaldistribution,P(ia )appropriatefor | ∗Electronicaddress:[email protected] 2WeusetheGreekalphabetwhenbinningwithrespecttotomandtheRo- 1Citationdatais,infact,publiclyavailableforallacademicscientists. manalphabetforbinningcitations. 2 afixedauthorbin,a . Bayes’theoremallowsustoinvertthis probabilitytoyield P(a n ) P( n a )p(a ), (1) i i |{ } ∼ { }| whereP(a n )istheprobabilitythatthecitationrecord n i i wasdrawn|{atr}andomfromauthorbina . Byconsidering{th}e actualcitationhistoriesofauthorsinbinb ,wecanthuscon- structtheprobabilityP(a b ),thatthecitationrecordofanau- thorinitiallyassignedtob|inb wasdrawnonthethedistribu- tionappropriateforbina . Inotherwords,wecandetermine the probability that an author assigned to bin b on the basis of the tentative quality measure should actually be placed in bin a . This allows us to determine both the accuracy of the FIG.1: Logarithmicallybinnedhistogramofthecitationscountsof initialauthorassignmentitsuncertaintyinapurelystatistical all papers by authors withmore than 25 publications in the theory fashion. subsectionofSPIRES.Thedataisnormalizedandtheaxesareloga- Whileagoodchoiceofmeasurewillassigneachauthorto rithmic. thecorrectbinwithhighprobabilitythiswillnotalwaysbethe case. Considerextremecasesinwhereweelecttobinauthors on the basis of measures unrelated to scientific quality, e.g., set consists of all citable papers written by academic scien- byhair/eyecolororalphabetically. ForsuchmeasuresP(ia ) tists from the theory subfield, ultimo 2003. All citations to and P( n a ) will be independentof a , and P(a n ) w|ill papers outside of SPIRES were removed. In the context of i i become{pr}o|portionalto prior distribution p(a ). A|s{a}conse- this paper, we define an academic scientist as someone who quence,the proposedmeasurewillhave nopredictivepower haspublished25papersormore. Thisdefinitionisintended whatsoever. Itisobvious,forexample,thata citationrecord to include almost everyone with a permanent academic po- provides no information of its author’s hair/eye color. The sition and exclude those who leave academia early in their utility of a given measure (as indicated by the statistical ac- careers(andgenerallyceaseactivejournalpublication)inthe curacy with which a value can be assigned to any given au- interestsof maintainingthe homogeneityofthe datasample. thor)willobviouslybeenhancedwhenthebasicdistributions For more see [17], Chapters 3 and 4. The resulting data set P(ia ) depend strongly on a . These differences can be for- includes6737authorsandatotalof274470papers. Theac- | malized using the standard Kullback-Leiblerdivergence. As tualnumberofpapersissmallerthanthissinceeachmultiple we shallsee, there aresignificantvariationsin the predictive authorpaperis countedonce per co-author. Thetheorysub- powerofvariousfamiliarmeasuresofquality. field is, however,thatpartof highenergyphysicswherethis The organization of the paper is as follows. Section II is effectisleastpronounced. Thisisduetotherelativelysmall devotedtoadescriptionofthedatausedintheanalysis,Sec- number of co-authors(typically 1 3) per theoretical paper. − tionIIIintroducesthevariousmeasuresofqualitythatwewill Inthecaseofthetheorysubfield,thisweightingofpapersby consider. In Sections IV and V, we providea more detailed the numberof co-authorshas been shown to have negligible discussion of the Bayesian methodsadopted for the analysis effects[11]. ofthesemeasuresandadiscussionofwhichofthesemeasures ThetheorysubsectionoftheSPIRESdatahasapower-law isbestinthesensedescribedaboveofprovidingthemaximum structure. Specifically the probability that a paper will re- discriminatorypower. ThiswillallowusinSectionVItoad- cieve k citations is approximately proportional to (k+1) g − dresstothequestionofhowmanypapersarerequiredinorder with g = 1.11 for k 50 and g = 2.78 for k > 50. The tomakereliableestimatesofagivenauthor’sscientificqual- transition between the≤se two power laws is found to be sur- ity; finally, SectionA discussesthe originof asymmetriesin prisingly sharp [11]. These features of the global distribu- some the measures. A discussion of the results and various tion are also present in the conditionalprobabilitiesfor sub- conclusionswillbepresentedinSectionVII. groupsofauthorsbinnedaccordingtomostmeasuresofqual- ity. In virtually all cases, these conditionalprobabilitiescan also be described accurately by separate power laws in each II. DATA of two regionswith a relatively sharp transition between the regions. As one might expect, authors with more citations The analysis in this paper is based on data from the are described by flatter distributions (i.e., smaller values of SPIRES3 databaseofpapersinhighenergyphysics.Ourdata g ) anda somewhathighertransitionpoint. Figure1 displays the total distribution of citations as a binnedand normalized 3SPIRES is an acronym for Stanford Physics Information RE- trieval System. The database is open and can be found at electronically viaeprints orjournal articles, publications such as mono- http://www.slac.stanford.edu/spires/. Citations in SPIRES are gath- graphsorconferenceproceedingsaretreatedinconsistentlyandtherefore ered only from the papers in the database that have references entered notincludedinthisstudy. 3 anindicationofhowoftenthecontentofthatpaperhasbeen usedintheworkofothers6. Note, however,theobviousfact that citations can only be interpreted as a meaningfulproxy ofqualityrelativetothecitationhabitsofone’speersor,put slightlydifferently,inthecontextofthecitationhabitsofthe fieldinwhichthepaperispublished. In[11],wehaveshown thatthetheorysubsectionofSPIRESisindeedaveryhomo- geneousdataset. Inthissense, wewillassumethatthe cita- tioncountofapaperisaproxyoftheintrinsicqualityofthat paper. Thequestionsremain,however,ofhowtoextractameasure ofthequalityofanindividualscientistfromhiscitationrecord and how fairly to project this record onto a scalar measure. FIG.2: Logarithmicallybinned histogram of thecitations inbin 6 This question is non-trivial because the probability, p(k) of ofthemedianmeasure. The pointsshowthecitationdistribution △ ⋆ findinga scientific paper with k citationsroughlyfollowsan ofthefirst25papersbyallauthors. Thepointsmarkedby show thedistributionofcitationsfromthefirst50papersbyauthorswho asymptoticpower-lawdistribution,seeFigs.1and2.Thisfact havewrittenmorethan50papers. Finally, the(cid:3) datapointsshow was documentedfor the SPIRESdata in Ref. [11] andholds thedistributionofallpapersbyallauthors.Theaxesarelogarithmic. true in many other scientific fields [9, 10, 16]. Thus, it is usefultoconsidersomeofthepropertiesofthedistributionof citationsforallauthorsbeforediscussingthevariousspecific histogram4. measuresofqualitytobeconsideredhere. Studiesperformedonthefirst25,first50andallpapersfor Empiricalevidenceindicatesthatmostcitationdistributions g agivenvalueofmshowtheabsenceoftemporalcorrelations. arelargelypower-lawdistributedwith p(k) k . Forsmall − It is of interest to see this explicitly. Consider the following values of k, g 1; for larger values, 2<g∼<3. Although ≈ example. InFigure2,wehaveplottedthedistributionforbin theaveragenumberofcitationsperpaperiswell-defined,the 6ofthemedianmeasure5. Thereare674authorsinthisbin. asymptotic power-law tails of these distributions cause their Two thirds of these authors have written 50 papers or more. variancetobeinfinite7. Whenthevarianceisnotdefined(or Only this subset is used when calculating the first 50 papers very large), the mean values of a finite sample fluctuate sig- results. In this bin, the means for the total, first 25 and first nificantly as a function of sample size. As a consequence, 50papersare11.3,12.8,and12.9citationsperpaper,respec- the average number of citations, k , in the citation record h i tively. The median of the distributions are 4, 6, and 6. The of a given author (which is precisely a finite sample drawn plotin Figure 2 confirmsthese observations. The remaining fromapower-lawprobabilitydistribution)isapotentiallyun- binsandtheothermeasuresyieldsimilarresults. reliable measure of the quality of an author’scitation record NotethatFigure2confirmsthegeneralobservationsonthe sincetheadditionorremovalofasinglehighlycitedpapercan shapesoftheconditionaldistributionsmadeabove. Figure2 materiallyalter anauthor’smean. Nevertheless, themeanof alsoshowstwodistinctpower-laws.Bothofthepower-lawsin anauthor’scitationsiscommonlyusedasanintensivescalar thisbinareflatterthantheonesfoundinthetotaldistribution measureofauthorquality. and the transitionpointis lower than in the totaldistribution Thereservationsjustexpressedabouttheuseofmeancita- fromFigure1. tions per paper apply with even greater force if one chooses tomeasureauthorqualitybythenumberofcitationsofeach author’s single most highly cited paper, k . Virtually all max of the stabilizing statistical power of the full citation record III. MEASURESOFSCIENTIFICEXCELLENCE has been discarded, and even greater fluctuationscan be ex- pectedinthismeasureasthesamplesizechanges. Inspiteof Despite differing citation habits in different fields of sci- such statistical arguments, there are reasons for considering ence, most scientists agree that the number of citations of a the maximum cited paper as a measure of quality. It is per- givenpaperisthebestobjectivemeasureofthequalityofthat fectly tenable to claim that the authorof a single paper with paper. Thebeliefunderlyingtheuseofcitationsasameasure of quality is that the numberof citations to a paper provides 6Werealizethatthereareanumberofproblemsrelatedtotheuseofcita- tionsasaproxyforquality. Papersmaybecitedornotforreasonsother 4Duetomattersofvisualpresentation,thebinningusedinthisandthefol- thantheirhighquality.Geo-and/orsocio-politicalcircumstancescankeep lowing figurehere is different from the binning used whenconstructing worksofhighqualityoutofthemainstream. Creditforanimportantidea theP(ia )usedlaterinthepaper. ThecorrectbinningisdescribedinAp- canbeattributedincorrectly. Paperscanbecitedforhistoricalratherthan | pendixB scientific reasons. Indeed, theveryquestionofwhetherauthors actually 5Sincethisplotisconstructedfromauthorsassignedtobin6,eachpaperis readthepaperstheyciteisnotasimpleone[18].Nevertheless,weassume weightedbythenumberofitsauthorspresentinthisbin.Weighingpapers thatcorrectcitationusagedominatesthestatistics. bythenumberofco-authors, however, doesnotsignificantly changethe 7Diverginghighermomentsofpower-lawdistributionsarediscussedinthe distributionofcitations[11]. literature.E.g.[19]. 4 1000citationsisofgreatervaluetosciencethantheauthorof ofq(x)as 10paperswith100citationseach(eventhoughthelatterisfar x less probablethan the former). In this sense, the maximally Q(x)= q(x)dx (2) Z ′ ′ cited paper might provide better discrimination between au- thorsof‘high’and‘highest’quality,andthismeasuremerits Evidently,Q(x)growsmonotonicallyfrom0to1independent consideration. ofq(x). The‘median’ofthissampleisdefinedasthatvalue ofxsuchthat(i)onedrawhasthevaluex, (ii)N drawshave Another simple and widely used measure of scientific ex- avaluelessthanorequaltox,and(iii)N drawshaveavalue cellenceistheaveragenumberofpaperspublishedbyanau- greaterthanorequaltox. Theprobabilitythatthemedianis thor per year. This would be a good measure if all papers atxisnowgivenas were cited equally. As we have just indicated, scientific pa- persareemphaticallynotcitedequally,andfewscientistshold (2N+1)! P (x)= q(x)Q(x)N[1 Q(x)]N . (3) theviewthatallpublishedpapersarecreatedequalinquality x1/2 1!N!N! − andimportance.Indeed,roughly50%ofallpapersinSPIRES ForlargeN,themaximumofP (x)occursatx=x where arecited 2times(includingself-citation).Thisfactaloneis x1/2 1/2 sufficient≤to invalidate publication rate as a measure of sci- Q(x1/2)=1/2.ExpandingPx1/2(x)aboutitsmaximumvalue, entific excellence. If all paperswere of equal merit, citation weseethat analysiswouldprovideameasureofindustryratherthanone 1 (x x )2 1 ofintrinsicquality. P (x)= exp[ − 1/2 ], s 2= . x1/2 √2ps 2 − 2s 2 4q(x1/2)2N Inanattemptorderto remedythisproblem,ThomsonSci- (4) entific(ISI)introducedtheImpactFactor8 whichisdesigned A similar argument applies for every percentile. The statis- to be a “measure of the frequency with which the ‘average ticalstabilityofpercentilessuggeststhattheyarewell-suited article’in a journalhasbeencited in a particularyearorpe- for dealing with the power laws which characterize citation riod”9. The Impact Factor can be used to weight individual distributions. papers. Unfortunately,citationsto articlesin a givenjournal Recently, Hirsch [7] proposed a different measure, h, in- also obey power-lawdistributions[12]. This has two conse- tendedtoquantifyscientificexcellence. Hirsch’sdefinitionis quences. First,thedeterminationoftheImpactFactorissub- asfollows: “A scientisthasindexh ifh ofhis/herNp papers jecttothelargefluctuationswhicharecharacteristicofpower- have at least h citations each, and the other (Np h) papers − lawdistributions. Second,thetailofpower-lawdistributions havefewerthanhcitationseach”[7]. Unlikethemeanandthe displaces the mean citation to higher values of k so that the median,whichareintensivemeasureslargelyconstantintime, majorityofpapershavecitationcountsthataremuchsmaller hisanextensivemeasurewhichgrowsthroughoutascientific thanthemean. Thisfactisforexampleexpressedinthelarge career. Hirsch assumes that h grows approximately linearly differencebetweenmeanandmediancitationsperpaper. For withanauthor’sprofessionalage,definedasthetimebetween the totalSPIRES database, the medianis 2 citationsperpa- thepublicationdatesofthefirstandlastpaper.Unfortunately, per; the meanis approximately15. Indeed, only22%of the thisdoesnotleadtoanintensivemeasure.Consider,forexam- papersinSPIREShaveanumberofcitationsinexcessofthe ple,thecaseofauthorswithlargetimegapsbetweenpublica- mean,cf.[11]. Thus,thedominantroleplayedbyarelatively tions,orthecaseofauthorswhosecitationdataarerecorded smallnumberofhighlycitedpapersindeterminingtheImpact in disjoint databases. A properly intensive measure can be Factorimpliesthatitissubjecttorelativelylargefluctuations obtained by dividing an author’s h-index by the number of andthatittendsoverestimatethelevelofscientificexcellence his/her total publications. We will consider both approaches of high impact journals. This fact was directly verified by below. Seglen [20], who showed explicitly that the citation rate for The h-index represents an attempt to strike a balance be- individual papers is uncorrelated to the impact factor of the tween productivity and quality and to escape the tyranny of journalinwhichitwaspublished. power law distributions which place strong weight on a rel- atively small number of highly cited papers. The problem An alternate way to measure excellence is to categorize isthatHirschassumesanequalitybetweenincommensurable eachauthorbythemediannumberofcitationsofhispapers, quantities.Anauthor’spapersarelistedinorderofdecreasing k . Clearly,themedianisfarlesssensitivetostatisticalfluc- citationswithpaperihavingC(i)citations. Hirsch’smeasure 1/2 tuationssinceallpapersplayanequalroleindeterminingits isdeterminedbytheequality,h=C(h),whichpositsanequal- value. Todemonstratetherobustnessofthemedian,itisuse- itybetweentwoquantitieswithnoevidentlogicalconnection. g fultonotethatthemedianofN =2N+1randomdrawson Whileitmightbereasonabletoassumethath C(h),thereis anynormalizedprobabilitydistribution,q(x),isnormallydis- noreasontoassumethatg andtheconstantof∼proportionality tributedinthelimitN ¥ . Tothisendwedefinetheintegral areboth1. → We will also include one intentionally nonsensical choice inthefollowinganalysisofthevariousproposedmeasuresof author quality. Specifically, we will consider what happens 8Forafulldefinitionseehttp://scientific.thomson.com/. whenauthorsarebinnedalphabetically.Intheabsenceofhis- 9Ibid. toricalinformation,itisclearthatanauthor’scitationrecord 5 shouldprovideuswithnoinformationregardingtheauthor’s ForanymeasurechosenEq.(6)providesuswiththeprob- name.Binningauthorsinalphabeticordershouldthusfailany abilitythatanauthorliesinauthorbina . Whilethevalueof statistical test of utility and will provide a useful calibration anymeasure(suchasthemeannumberofcitationsperpaper) ofthemethodsadopted.Themeasuresofqualitydescribedin canbecalculateddirectly,thecalculatedvaluesofP(a n ) i |{ } thissectionaretheoneswewillconsiderintheremainderof providefarmoredetailedandmorereliableinformationusing thispaper. allstatisticalinformationcontainedinthedata.Thelargefluc- tuations which can be encounteredin identifying authors by theirmeancitationrateorbytheirmaximallycitedpaperare IV. ABAYESIANANALYSISOFCITATIONDATA reduced.Further,byprovidinguswithvaluesofP(a n )for i all a , we obtain a statistically trustworthy gauge of|{wh}ether The rationale behind all citation analyses lies in the fact the resulting uncertainties in a are sufficiently small for the that citation data is strongly correlated such that a ‘good’ measureunderconsiderationto be a reliable indicatorof au- scientist has a far higher probability of writing a good (i.e., thor quality. In short, Eq. (6) providesus with a measure of highlycited)paperthan a ‘poor’scientist. Suchcorrelations an author’s ranking independent of the total number papers are clearly present in SPIRES [11, 21]. We thus categorize currentlypublished,andwithinformationwhichallowsusto eachauthorbysometentativequalityindexbasedontheirto- assess the reliability of this determination. The accuracy of tal citation record. Once assigned, we can empirically con- the resulting value of a increasesdramaticallywith the total structthe priordistribution, p(a ), that an authoris in author number of published papers. We will return to this point in bina andtheprobabilityP(N a )thatanauthorinbina has SectionV. a total of N publications. We|also construct the conditional Fig. 3 shows the probabilities P(a ni ) that A will lie in probabilityP(ia ) that a paperwritten by an authorin bin a eachofthedecilebinsusingthemeasu|{res}discussedinsection | willlieincitationbini. Aswehaveseenearlier,studiesper- II.Thesemeasuresinclude: (a)thefirstinitialoftheauthor’s formedon the first 25, first 50 and all papersof authorsin a name,(b)the averageyearlyoutputofpapers, (c)Hirsch’sh given bin revealno signs of additionaltemporalcorrelations normalizedbytheauthor’sprofessionalageT,(d)theh-index in the lifetimecitation distributionsofindividualauthors. In normalizedbythenumberofpublishedpapers,(e)thecitation performingthisconstruction,wehaveelectedtobinauthorsin countofthe single mostcited paper, (f) themean numberof deciles. WebinpapersintoLbinsaccordingtothenumberof citations per paper, (g) the median number (50th percentile) citations. Thebinningofpapersisapproximatelylogarithmic ofcitationsperpaper,and(h)a65thpercentilemeasure. Itis (see AppendixA). We have confirmedthat the results stated clearfromthefigurethattherearesignificantdifferences,both belowarelargelyindependentofthebin-sizeschosen. intheaccuracyofoftheinitialassignmentsand,moreimpor- We now wish to calculate the probability, P( n a ), that tantly,inthecorrespondinguncertainties. Largeuncertainties i an author in bin a will have the full (binned)ci{tati}o|n record are due to the fact that the conditional probabilities, P(ia ) n . Inordertoperformthiscalculation,weassumethatthe arelargelyindependentofa . Suchindependenceistobe|ex- i { } various counts n are obtained from N independent random pectedinthecaseofthealphabeticbinningofauthors,where i drawsontheappropriatedistribution,P(ia ). Thus, theinabilityofthecitationrecordtoidentifythefirstinitialof | authorA’snameishardlysurprising.Thefigurealsosuggests P({ni}|a )=P(N|a )N!(cid:213)i=L1P((in|ai))!ni . (5) tIhnaittiathleasnsiugmnmbeernotsfopfapaeurtshopruAblibsahseeddpoenrmyeeaarni,smneodtiarenl,ia6b5lteh. percentile, and maximum citations all appear to provide an Although large scale temporal correlations are known to be accurate reflection of his full citation record with a satisfac- absent, transientcorrelationsare possible. For example, one torily small uncertainty. Hirsch’s measures falls somewhere particularlywell-citedpapercouldleadtoanincreasedprob- betweenthebestandworstchoiceofmeasures. ability of high citations for its immediate successor(s). It is Given the large variations in the accuracy and confidence difficulttodemonstratetheirpresenceorabsence,butthere- of decile assignments as a function of the measure selected, sults of following section will provide a posteriori evidence itis ofinterestto investigatein greaterdetailthequestionof thatsuchcorrelations,ifpresent,arenotimportant. whichofthesemeasuresisbest. We addressthisquestionin WecannowinverttheprobabilityP( n a )usingBayes’ j thenextsection. { }| theoremtoobtain P( n a )p(a ) P(a |{ni}) = {Pi(}|ni ) V. WEIGHINGTHEMEASURES { } p(a )P(N a )(cid:213) P(ja )nj j = (cid:229) b p(b )P(N| |b )(cid:213) kP(|k|b )nk , (6) ityInofoardgeirvteonombteaainsuarem,owreegcraalpchuilcatreepthreespenrotabtaiobniliotyf,thPe(bquaa)l-, where we have inserted Eq. (5) and used marginalization to that an author initially assigned to bin a is predicted to|lie obtain the normalization. The combinatoric factors cancel. in binb . In practice, we determineP(b a )as the averageof ThequantityP(a n ),whichrepresentstheprobabilitythat theprobabilitydistributionsP(b n )fo|reachauthorin bin i i anauthorwithbin|{ned}citationrecord n isinauthorbina . a . The results are shown ‘stack|e{d’}in Fig. 4 for the various i Itcanbeusedintwoways—eachofw{hic}hisinteresting. measuresconsidered. Here,rowa showsthe(average)prob- 6 (a)Firstinitial (b)Papers/year (c)Hirsch(age) (d)Hirsch(papers) PHΑÈAL PHΑÈAL PHΑÈAL PHΑÈAL 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 Α Α Α Α 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 (e)Max (f)Mean (g)Median (h)65thpercentile PHΑÈAL PHΑÈAL PHΑÈAL PHΑÈAL 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 Α Α Α Α 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 FIG.3: Asingleauthorexample. WeanalyzethecitationrecordofauthorAwithrespecttotheeightdifferentmeasuresdefinedinthetext. AuthorAhaswrittenatotalof88papers. Themeanofthiscitationrecordis26citationsperpaper,themedianis13citations,theh-indexis 29,themaximallycitedpaperhas187citations,andpapershavebeenpublishedattheaveragerateof2.5papersperyear.Thevariouspanels showtheprobabilitythatauthorAbelongstoeachofthetendecilesgivenonthecorrespondingmeasure;theverticalarrowdisplaystheinitial assignment. Panel(a)displaysP(firstinitialA)(b)showsP(papersperyearA),(c)showsP(h/T A),(d)showsP(h/N A),panel(e)shows | | | | P(kmax|A),panel(f)displaysP(hki|A),(g)showsP(k1/2|A),andfinally(h)showsP(k.65|A). abilities that an author initially assigned to bin a belongs in ditionalprobabilitydistributions, P(ia ), for differentauthor decilebinb . Thisprobabilityisproportionaltotheareaofthe binsarelargest. Thesedifferencesca|nquantifiedbymeasur- correspondingsquares. Obviously, a perfect measure would ingthe‘distance’betweentwosuchconditionaldistributions place all of the weight in the diagonalentries of these plots. with the aid of the Kullback-Leibler (KL) divergence (also Weightsshouldbecenteredaboutthediagonalforanaccurate know as the relative entropy). The KL divergence between identificationofauthorqualityandthecertaintyofthisiden- twodiscreteprobabilitydistributions, pand p isdefined10as ′ tificationgrowsasweightaccumulatesinthediagonalboxes. NotethatanassignmentofadecilebasedonEq.(6)islikely (cid:229) pi KL[p,p]= p ln . (7) to be more reliable than the value of the initial assignment ′ i p i (cid:18) ′i(cid:19) since the formeris basedonall informationcontainedin the citationrecord. TheKullback-Leiblerdivergenceispositiveandhasdesirable Figure4emphasizesthat‘firstinitial’and‘publicationsper convexityproperties. It is, however, not a metric due to the year’ are not reliable measures. The h-index normalized by factthatKL[p′,p]=KL[p,p′]. Whilethisasymmetryisoflit- 6 professionalageperformspoorly;whennormalizedbynum- tleconcernwhenthedifferencesbetween pand p′ aresmall, berofpapers,thetrendtowardsthediagonalisenhanced.We some care is required when such differencesare large. This note the appearanceof verticalbarsin each figure in the top can occur when the data set is so small that some citation row. ThisfeatureisexplainedinAppendixA. Allfourmea- binsareemptyorwhenwebinauthorsbykmax,inwhichcase sures in the bottom row perform fairly well. The initial as- emptybinsareinevitableasnotedabove.WeconsidertheKL signment of the k measure always underestimates an au- distancebetweenadjacentdistributions,Fig.5showsthedis- max thor’scorrectbin.Thisisnotanaccidentandmeritscomment. tancesKL[P(ia ),P(ia +1)]forvariousmeasures.Theprob- Specifically,ifanauthorhasproducedasinglepaperwithci- ability P(b =|a +1a|) is exponentially sensitive to the KL | tations in excess of the values contained in bin a , the prob- divergence.MeasureswithlargeKLdivergencesbetweenad- ability that he will lie in this bin, as calculated with Eq. (6), jacent bins provide the most certain assignments of authors. is strictly 0. Non-zeroprobabilitiescan beobtainedonlyfor The KL divergencesfor the measures not shown are signifi- binsincludingmaximumcitationsgreaterthanorequaltothe cantlysmallerthanthosedisplayed.TheresultsofFig.5pro- maximumvaluealreadyobtainedbythisauthor.(Thefactthat vide quantitative support for the roughly equal performance theprobabilitiesforthesebinsshowninFig.4arenotstrictly ofmean,median,and65thpercentilemeasures11seeninFig- 0isaconsequenceoftheuseoffinitebinsizes.)Thus,binning ure 4. The h-indexnormalizedby numberof publicationsis authorsonthebasisoftheirmaximallycitedpapernecessarily underestimatestheirquality.Themean,medianand65thper- centileappeartobethemostbalancedmeasureswithroughly equalpredictivevalue. 10Thenon-standardchoiceofthenaturallogarithmratherthanthelogarithm basetwointhedefinitionoftheKLdivergence,willbejustifiedbelow. ItisclearfromEq.(6)thattheabilityofagivenmeasureto 11Figure5givesamisleadingpictureofthekmaxmeasure,sincetheKLdi- discriminateisgreatestwhenthedifferencesbetweenthecon- vergencesKL[P(ia +1),P(ia )]areinfiniteasdiscussedabove. | | 7 (a)FirstInitial (b)Papers/year (c)Hirsch(age) (d)Hirsch(papers) 10 10 10 10 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 Α Α Α Α 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 1 2 3 4 5Β6 7 8 9 10 1 2 3 4 5Β6 7 8 9 10 1 2 3 4 5Β6 7 8 9 10 1 2 3 4 5Β6 7 8 9 10 (e)Max (f)Mean (g)Median (h)65thpercentile 10 10 10 10 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 Α Α Α Α 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 1 2 3 4 5Β6 7 8 9 10 1 2 3 4 5Β6 7 8 9 10 1 2 3 4 5Β6 7 8 9 10 1 2 3 4 5Β6 7 8 9 10 FIG.4: Eightdifferentmeasures. Eachhorizontalrowshowstheaverageprobabilities(proportionaltotheareasofthesquares)thatauthors initiallyassignedtodecilebina arepredictedtobelonginbinb .PanelsasinFig.3. KL Hirsch Max Mean 0.1 Median 65th 0.08 0.06 0.04 0.02 1 2 3 4 5 6 7 8 9 FIG. 5: The Kullback-Leibler divergences KL[P(ia ),P(ia +1)]. | | FIG.6: Binning according todeciles. Thisplot displays anormal Resultsare shown for the following distributions: h-index normal- distribution(solidblackline)asanexampleof aprobability distri- ized by number of publications, maximum number of citations, butionpeakedaroundanon-zeromaximum. Thegreyverticallines mean,median,and65thpercentile. marktheboundariesofthe10deciles. dramaticallysmallerthantheothermeasuresshownexceptfor theextremedeciles. The reduced ability of all measures to discriminate in the middledecilesisimmediatelyapparentfromFig.5. Thisisa directconsequenceanypercentilebinninggiventhatthedis- between authors with very similar citation distributions. As tributionof authorqualityhas a maximumatsome non-zero a result, the statistical accuracy of percentile assignments is value, the binsize ofa percentiledistributionnearthe maxi- high at the extremes and relatively low in the middle of the mumwillnecessarilybesmall. Theaccuracywithwhichau- distributionwhereweareattemptingtomakefinedistinctions thorscanbeassignedto a givenbininthe regionaroundthe betweenscientists ofsimilarability. Thiseffectisillustrated maximum is reduced since one is attempting to distinguish inFig.6. 8 65thpercentilecitationrateasa measure(Similarresultsare obtainedwhenusingthemeanormediancitationrates). The figureindicatesthatN =50papersismorethansufficientto identifyauthorsinthefirstandtenthdeciles. Infact,approxi- mately25and20papersrespectivelyaresufficienttoplaceau- thorsinthesedecilesatthe90%confidencelevel. Fig.7also indicates that 50 published papers are sufficient to make ≈ meaningfulassignments of authors to the second, third, and ninth deciles. All measures have difficulty in assigning au- thorstodeciles5 8. Asindicatedbythesmallvaluesofthe − KL divergencein these binsforallmeasuresconsidered,the citation distributions of these authors are simply too similar topermitaccuratediscrimination(seeargumentsintheprevi- oussection).Ontheotherhand,theprobabilitythatanauthor can be correctly assigned to one of these middle binson the FIG. 7: The probability that a typical (i.e., most probable) author basis of 50 publication is high. This difficulty is due to the with50publishedpaperswillbeassignedtothecorrectdecileasa relatively small range of citations ranges which cover these functionofactualauthordecile. Themediannumberofcitationsis usedasameasure. bins:the65thpercentile-bins5though8containauthorswith a 65th percentile between 5 and 13 citations (cf. the narrow rangesof the middle binsin the case of the mean, displayed VI. SCALING inTableII). Inthissection,weconsiderthequestionofhowmanypub- lishedpapersarerequiredinordertomakea reliablepredic- VII. CONCLUSIONS tionofthepercentilerankingofagivenauthor. (Weconsider resultsonly usingthe 65thpercentilemeasure.) If this num- There are two distinct questionswhich must be addressed ber is sufficiently small, analysis along the lines presented in any attemptto use citation data as an indicationof author herecanprovideapracticaltoolofpotentialvalueinpredict- quality. Thefirstiswhetherthemeasurechosentocharacter- inglong-termscientificperformance. Inordertoaddressthis ize a given citation distribution or even the citation distribu- question,wewillconsiderhowP(a ni )scalesasafunction tion itself reflects the qualities that we would like to probe. |{ } of the total number of publications for an average author in Thesecondquestioniswhetheragivenmeasureiscapableof each bin. Assume thatan averageauthorbelongingto bin a discriminatingbetweenauthorsina statistically reliableway drawsNpapersatrandomfromthedistributionofP(ia ). The and,byextension,whichofseveralmeasuresisbest. Wehave | mostprobablenumberofpapersineachcitationbinwillthus shown that the use of Bayesian statistics and the Kullback- begivenasni=NP(ia ). InsertingthisresultintoEq.(6)and Leiblerdivergencecananswerthisquestioninavalue-neutral | discardingallfixedfactors,wefindthat and statistically compelling manner. It is possible to draw reliable conclusionsregarding an author’s citation record on N P(a n ) p(a ) (cid:213) P(ia )P(ia ) . (8) thebasisofapproximately50papers,anditispossibletoas- |{ i} ∼ i | | ! sign meaningful statistical uncertainties to the results. The highlevelofdiscriminationobtainedin thehighestandlow- est deciles providesindirect support for our assumption that For the same citation record, n , a similar expression per- i { } anauthor’scitationrecordisdrawnatrandomfromanappro- mitsdeterminationoftheprobabilitythatthisaverageauthor willbeassignedtoanybin,b . Weseethat priateconditionaldistributionandsuggeststhatpossibleaddi- tionalcorrelationsincitationdataarenotimportant. Further, 1 P(b n ) thedifficultyindiscriminatingbetweenauthorsinthemiddle Nli→m¥ N ln(cid:18)P(a ||{{nii}})(cid:19)=−KL[P(•|a ),P(•|b )] . (9) dneocni-lzeesrsouvgagleuset.sthatintrinsicauthorabilityispeakedatsome Thisequationillustratestheutility oftheKL divergenceand Theprobabilisticmethodsadoptedherepermitmeaningful explains the origin of its lack of symmetry. It is clear from comparison of scientists working in distinct areas with only Eqs.(8)and(9)thattheprobabilityofassigningthisaverage minimal value judgments. It seems fair, for example, to de- author to the wrong bin will ultimately vanish exponentially clareequalitybetweenscientistsinthesamepercentileoftheir with N. Givenenoughpapers,the largestbinwill ultimately peergroups. Itissimilarlypossibleto combineprobabilities dominate. inordertoassignameaningfulrankingtoauthorswithpubli- To obtain a quantitativesense of how manypapersare re- cationsinseveraldisjointareas. Allthatisrequiredisknowl- quired in practice, we pose the following question: What is edgeoftheconditionalprobabilitiesappropriateforeachho- the probability that a typical author from each author decile mogeneoussubgroup. withN=50publishedpaperswillbeassignedtothecorrect Wenote,however,thatthenumberofpublicationsrequired decile?TheanswerisplottedasahistograminFig.7usingthe to make meaningful author assignments is large enough to 9 limittheutilityofsuchanalysesintheacademicappointment (a)Papers/yearP˜(a ′ a ) (b)Papers/yearP(a ′ a ) | | process.Thisraisesthequestionofwhethertherearemoreef- 10 10 ficientmeasuresofanauthor’sfullcitationrecordthanthose 9 9 considered here. Our object has been to find that measure 8 8 whichisbestabletoassignthemostsimilarauthorstogether. 7 7 Straightforward iterative schemes can be constructed to this Α6 Α6 5 5 end and are found to converge rapidly (i.e., exponentially) 4 4 to an optimal binning of authors. (The result is optimal in 3 3 the sense that it maximizes the sum of the KL divergences, 2 2 KL[P( a ),P( b )], over all a and b .) The results are only 1 1 •| •| marginallybetterthanthoseobtainedherewiththemean,me- 1 2 3 4 5Β6 7 8 9 10 1 2 3 4 5Β6 7 8 9 10 dianor65thpercentilemeasures. Finally,itisalsoimportanttorecognizethatittakestimefor FIG.8:AcomparisonoftheapproximateP˜(b a )fromEq.(A2)and apapertoaccumulateitsfullcomplementofcitations. While theexactP(b a )forthepaperspublishedpery|earmeasure. theirareindicationsthatanauthor’searlyandlatepublications | are drawn (at random) on the same conditional distribution [11],manyhighlycitedpapersaccumulatecitationsatacon- stant rate for manyyears after their publication. This effect, resultsofthisapproximateevaluationareshowninFig.8and which has not been addressed in the present analysis, repre- comparedwith the exactvaluesof P(b a )for the papersper | sentsaseriouslimitationonthevalueofcitationanalysesfor year measure. The approximationsdo notaffectthe qualita- youngerauthors.Thepresenceofthiseffectalsoposesthead- tivefeaturesofinterest. ditionalquestionofwhetherthereareotherkindsofstatistical We nowassumethatthemeasuredefiningtheauthorbins, publication data that can deal with this problem. Co-author a , providesa poorapproximationto the true bins, A. Inthis linkagesmayprovideapowerfulsupplementoralternativeto case, authors will be roughly uniformly distributed, and the citation data. (Preliminarystudiesofthe probabilitythatau- factorP(Aa )appearinginEq.(A2)willnotshowlargevari- | thors in bins a and b will co-author a publication reveal a ations. Significant structure will arise from the exponential strikingconcentrationalongthediagonala =b .) Sinceeach terms, where the presence of the factor N (assumed to be paperiscreatedwithitsfullsetofco-authors,suchinforma- large),willamplifythedifferencesintheKLdivergences.The tioncouldbeusefulinevaluatingyoungerauthors.Thiswork KLdivergencewillhaveaminimumvalueforsomevalueof willbereportedelsewhere. A=A0(b ),andthissingletermwilldominatethesum. Thus, P˜(b a )reducesto | APPENDIXA:VERTICALSTRIPES P˜(b a ) P(A0 a )exp( NKL[Q( A0),P( b )]). (A3) | ∼ | − •| •| TheverticalstripesprominentinFigs.4(a)and(b)emergeas ThemoststrikingfeatureofthecalculatedP(b |a )shownin aconsequenceofthedominantb -dependentexponentialfac- Fig.4ispresenceofvertical‘stripes’. Thesestripesaremost tor. The present arguments also apply to the worst possible pronouncedforthepoorestmeasuresanddisappearasthere- measure, i.e., a completelyrandomassignmentof authorsto liabilityofthemeasureimproves. Here,weofferaschematic the bins a . In the limit of a large number of authors, N , aut butqualitativelyreliableexplanationofthisphenomenon. To allP(ib )willbeequalexceptforstatisticalfluctuations. The thisend,imaginethateachauthor’scitationrecordisactually | resulting KL divergenceswill respond linearly to these fluc- drawn at random on the true distributions Q(i|A). For sim- tuations.12 Thesefluctuationswillbeamplifiedasbeforepro- plicity,assumethateveryauthorhaspreciselyNpublications, videdonlythatN growslessrapidlythanN2.Theargument aut that each author in true class A has the same distribution of here does not apply to good measures where there is signif- citations with nAi =NQ(i|A), and that there are equal num- icant structure in the term P(Aa ). (For a perfect measure, btheerns odfisatruibthuotersdiinnteoacahutthrourebaiuntsh,oar,calacscso.rdTinhgesteoasuotmhoerschaore- P(A|a )=d Aa .) In the case of|good measures, the expected dominanceofdiagonalterms(seeninthelowerrowofFig.4) senqualitymeasure. ThemethodsofSectionsIVandV can remainsunchallenged. then be used to determine P(ia ), P( n(A) b ), P(b n(A) ) | { i }| |{ i } andP(b a ). Giventheformofthen(A) andassumingthatN islarge,|wefindthat i APPENDIXB:EXPLICITDISTRIBUTIONS P(b |{n(iA)})≈exp(−NKL[Q(•|A),P(•|b )]) (A1) Forconveniencewepresentalldatatodeterminetheprob- and abilitiesP(a ni )forauthorswhopublishinthetheorysub- |{ } sectionofSPIRES.Dataispresentedonlyforcaseofthemean P˜(b a ) (cid:229) P(Aa )exp( NKL[Q( A),P( b )]), (A2) | ∼ | − •| •| A whereP(Aa )istheprobabilitythatthecitationrecordofan authorassig|nedtoclassa wasactuallydrawnonQ(iA). The 12ThisistruebecausetherewillbenochoiceofAsuchthatQ(iA)=P(ia ). | | | 10 P(ia ) P(N a ) numberof citations. All citationsare binnedlogarithmically | | according to the citation bins listed in column one and two Binnumber Citationrange BinNumber Totalpaperrange of Table I. The author bins are determined on the basis of i=1 k=1 m=1 N=25 decilesofthetotaldistributionofmeancitations, p( k ). Ta- i=2 k=2 m=2 N=26 h i bleII showsthe relevantquantitiesforthese bins. Giventhe i=3 2<k 4 m=3 26<N 28 definitionsofboththeauthor-andcitationbins,wecandeter- ≤ ≤ i=4 4<k 8 m=4 28<N 32 minetheconditionalcitationdistributionsP(ia )empirically. ≤ ≤ | i=5 8<k 16 m=5 32<N 40 ThesearegiveninTableIII. ≤ ≤ i=6 16<k 32 m=6 40<N 56 ≤ ≤ i=7 32<k 64 m=7 56<N 88 ≤ ≤ i=8 64<k 128 m=8 88<N 152 ≤ ≤ i=9 128<k 256 m=9 152<N Nmax ≤ ≤ i=10 256<k 512 ≤ i=11 512<k kmax ≤ TABLEI:Thebinningofcitationsandtotalnumberofpapers. The firstandsecondcolumnshowthebinnumberandbinrangesforthe citationbinsusedtodeterminetheconditionalcitationprobabilities P(ia )foreacha ,showninTableIII. Thethirdandfourthcolumn | displaythebinnumberandtotalnumberofpaperrangesusedinthe creationoftheconditionalprobabilitiesP(ma )foreacha ,displayed | inTableIV. a k -range #authors p(a ) n¯(a ) h i 1 0 – 1.69 673 0.1 37.0 2 1.69 – 3.08 673 0.1 41.8 3 3.08 – 4.88 675 0.1 44.0 4 4.88 – 6.94 673 0.1 46.8 We also need the probabilities P(N a ) describing that an 5 6.94 – 9.40 674 0.1 52.2 authorinbina hasN publications. Be|causeofthelownum- 6 9.40 – 12.56 674 0.1 54.3 berofauthorsineachbin,weneedtobinthetotalnumberof 7 12.56– 16.63 673 0.1 59.5 publicationswhencalculatingthisprobability;weusethelet- 8 16.63– 22.19 674 0.1 59.0 term toenumeratetheN-bins. BecauseP(N a ) isdescribed 9 22.19– 33.99 674 0.1 65.4 byapower-lawdistribution13 andsinceweon|lyconsiderau- 10 33.99–285.88 674 0.1 72.2 thorswithmorethan25publications,wechoosetobinNlog- arithmically as displayed in the third and fourth column of TABLEII:Theauthorbins. Thistableshowsthemeannumbersof Table I. The conditionalprobabilities, P(ma ) are displayed citationsthatdefinethelimitsofthe10authorbins. inTableIV. | 13ThisfactisknownasLotka’sLaw[22]. [1] D.Adam. Thecountinghouse. Nature,415:726,2002. output. ProceedingsoftheNationalAcademyoftheSciences, [2] R.P.Dellavalle, E.J. Hester, L.F.Heilig, A. L.Drake, J.W. 102:16569,2005. Kuntzman,M.Graber,andL.M.Schilling.Going,going,gone: [8] E.Garfield. EssaysofanInformationScientist, volume1-15. Lostinternetreferences. Science,302:787,2003. ISIPress,1977-1993. [3] G.Franck. Scientificcommunication–Avanityfair. Science, [9] D. de Solla Price. Networks of scientific papers. Science, 286:53,1999. 149:510,1965. [4] J. Adams and Z. Griliches. Measuring science: An explo- [10] S.Redner. Howpopularisyourpaper? Anempericalstudyof ration.ProceedingsoftheNationalAcademyofSciences,USA, the citationdistribution. European Physics Journal B, 4:131, 93:12664,1996. 1998. [5] S.Lehmann, A.D.Jackson, andB.E.Lautrup. Measuresfor [11] S.Lehmann, B.E.Lautrup, andA.D.Jackson. Citationnet- measures. Nature,444:1003,2006. worksinhighenergyphysics. PhysicalReviewE,68,2003. [6] A. F. J. van Raan. Statistical properties of bibliometric in- [12] S.Redner.Citationstatisticsfrom110yearsofphysicalreview. dicators: Research group indicator distributions and correla- PhysicsToday,58:49,2005. tions.JournaloftheAmericanSocietyforInformationScience, [13] A.-L.Baraba´siandR.Albert. Emergenceofscalinginrandom 57:408,2005. networks. Science,286:509,1999. [7] J. E. Hirsch. An index to quantify an individual’s scientific [14] R.AlbertandA.-L.Baraba´si.Statisticalmechanicsofcomplex

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.