RESEARCHARTICLE Open Access Meets Discoverability: Citations to Articles Posted to Academia.edu YuriNiyazov1,CarlVogel2,RichardPrice1*,BenLund1,DavidJudd1,AdnanAkil1, MichaelMortonson1,JoshSchwartzman1,MaxShron2 1Academia.edu,SanFrancisco,California,UnitedStatesofAmerica,2Polynumeral,NewYork,NewYork, UnitedStatesofAmerica *[email protected] Abstract Usingmatchingandregressionanalyses,wemeasurethedifferenceincitationsbetween articlespostedtoAcademia.eduandotherarticlesfromsimilarjournals,controllingforfield, impactfactor,andothervariables.Basedonasamplesizeof31,216papers,wefindthata OPENACCESS paperinamedianimpactfactorjournaluploadedtoAcademia.edureceives16%morecita- tionsafteroneyearthanasimilararticlenotavailableonline,51%morecitationsafterthree Citation:NiyazovY,VogelC,PriceR,LundB,Judd D,AkilA,etal.(2016)OpenAccessMeets years,and69%afterfiveyears.WealsofoundthatarticlesalsopostedtoAcademia.edu Discoverability:CitationstoArticlesPostedto had58%morecitationsthanarticlesonlypostedtootheronlinevenues,suchaspersonal Academia.edu.PLoSONE11(2):e0148257. anddepartmentalhomepages,afterfiveyears. doi:10.1371/journal.pone.0148257 Editor:PabloDorta-González,UniversidaddeLas PalmasdeGranCanaria,SPAIN Received:August31,2015 Accepted:January15,2016 Introduction Published:February17,2016 Academia.eduisawebsitewhereresearcherscanposttheirarticlesanddiscoverandreadarti- Copyright:©2016Niyazovetal.Thisisanopen clespostedbyothers.ItcombinesthearchivalroleofrepositorieslikeArXiv,SSRN,orPubMed accessarticledistributedunderthetermsofthe withsocialnetworkingfeatures,suchasprofiles,newsfeeds,recommendations,andtheability CreativeCommonsAttributionLicense,whichpermits tofollowindividualsandtopics.Thesitelaunchedin2008andasofJanuary2016hasapproxi- unrestricteduse,distribution,andreproductioninany mately30millionregistereduserswhohaveuploadedapproximately8.5millionarticles.Regis- medium,providedtheoriginalauthorandsourceare credited. trationonthesiteisfreeanduserscanfreelydownloadallpaperspostedtothesite. Thereisalargebodyofresearchonthecitationadvantageofopenaccessarticles,and DataAvailabilityStatement:Datafromthe"Open researchersarestilldebatingthesizeandcausesoftheadvantage.Somestudieshavefound AccessMeetsDiscoverability"paperareavailableat https://github.com/polynumeral/academia-citations. thatopenaccessarticlesreceivesubstantiallymorecitationsthanpay-for-accessarticles,even aftercontrollingforcharacteristicsofthearticlesandtheirauthors[1,2].Otherstudiesusing Funding:Academia.edupaiditsemployees, experimentalandquasi-experimentalmethodshaveconcludedthatanymeasuredcitation contractorsandanexternalconsultancy (Polynumeral)toperformthisstudy.AuthorsYuri advantageismostlyduetoselectionbiasandotherunobserveddifferencesbetweenfreeand Niyazov,RichardPrice,BenLund,DavidJudd, paidarticles[3–5]. AdnanAkil,MichaelMortonsonandJosh Boththesupportiveandcriticalstudieshavefocusedontheaccessibilityofarticles:once SchwartzmanareemployedbyAcademia.edu. found,canthearticlebeobtainedforfree?Theyhavegivenlessconsiderationtothediscover- AuthorsCarlVogelandMaxShronareemployedby abilityofarticles:howeasilycanthearticlebefound?Thismakessense;themethodsresearch- Polynumeral.Academia.eduprovidedsupportinthe ersoftenusetofindarticlesdon’tprivilegeopenaccessoverpaidsourcesorviceversa.Google formofsalariesforauthorsYN,RP,BL,DJ,AA,MM andJS,butdidnothaveanyadditionalroleinthe Scholar,forexample,returnsbothfreeandpaidsources,asdomanylibrarydatabases. PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 1/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu studydesign,datacollectionandanalysis,decisionto Academia.edu,ontheotherhand,hasuniquefeaturesfordiscoveringarticles,makingitan publish,orpreparationofthemanuscript.Thespecific interestingvenueforanalyzingacitationadvantage.Usersarenotifiedwhenauthorstheyfol- rolesoftheseauthorsarearticulatedinthe“author lowpostarticlestothesite.Theycanthensharethosearticleswiththeirfollowers.Ausercan contributions”section.Polynumeralprovidedsupport taganarticlewithasubjectlike“HighEnergyPhysics”andusersfollowingthatsubjectwillbe intheformofsalariesforauthorsCVandMS,butdid nothaveanyadditionalroleinthestudydesign,data notifiedaboutthepaper. collectionandanalysis,decisiontopublish,or AnumberofusershavereportedtotheAcademia.eduteamthattheyobservedincreasedcita- preparationofthemanuscript.Thespecificrolesof tionsafterpostingtheirarticlestothesite[6,7].Motivatedbythoseanecdotalreports,aformal theseauthorsarearticulatedinthe“author statisticalanalysiswasconductedofthecitationadvantageassociatedwithpostinganarticle. contributions”section. WefindthatatypicalarticlepostedonAcademia.edureceivesapproximately16%morecita- CompetingInterests:Theauthorsofthismanuscript tionscomparedtosimilararticlesnotavailableonlineinthefirstyearafterupload,risingto havereadthejournal’spolicyandhavethefollowing 51%afterthreeyears,and69%afterfiveyears.WealsofindthatatypicalarticlepostedonAca- competinginterests:Academia.edupaidits demia.edureceivesmorecitationsthananarticleavailableonlineonanon-Academia.edu employees,contractorsandanexternalconsultancy venue,suchasapersonalhomepage,adepartmentalhomepage,orajournalsite.Atypical (Polynumeral)toperformthisstudy.AuthorsYuri Niyazov,RichardPrice,BenLund,DavidJudd, paperpostedonlytoAcademia.edureceives15%fewercitationsthananarticleuploadedtoa AdnanAkil,MichaelMortonsonandJosh non-Academia.edusiteinthefirstyear,but19%moreafterthreeyears,and35%afterfiveyears. SchwartzmanareemployedbyAcademia.edu. Ourstudyisobservational,requiringustocarefullyaccountforpossiblesourcesofselection AuthorsCarlVogelandMaxShronareemployedby bias.Wefindthatthecitationadvantagepersistsevenaftercontrollingforanumberofpossible Polynumeral.Therearenopatents,productsin selectionbiases. developmentormarketedproductstodeclare.This doesnotaltertheauthors’adherencetoallthePLOS ONEpoliciesonsharingdataandmaterials. Background TheOpenAccessCitationAdvantage EventhoughAcademia.edudiffersfromtraditionalvenuesforopenaccess,thehypothesesand methodsinthispaperoverlapwithresearchontheopenaccesscitationadvantage.Theterm “openaccess”typicallyreferstoarticlesmadefreelyavailableaccordingtospecificOpenAccess policiesofacademicjournals:forexample“GoldOpenAccess”policieswhereauthorsorinsti- tutionspaythejournaltomakeanarticlefreelyavailable,or“GreenOpenAccess”wherean authormayarchiveafreeversiontheirarticleonline.Sometimes,though,“openaccess”isused morelooselytorefertoanymannerbywhicharticlesaremadefreelyavailableonline.Some authorsusetheterm“freeaccess”forthisbroaderdefinition,todistinguishitfromGreenand GoldOpenAccesspolicies.Ourstudydoesnotrelyonthesedistinctions,andwewillusethe terms“openaccess”and“freeaccess”interchangeablytorefertothebroaderdefinitionof freelydownloadablearticles. Manyresearchers,beginningwith[8],havefoundthatfree-accessarticlestendtohave morecitationsthanpay-for-accessarticles.Thiscitationadvantagehasbeenobservedina numberofstudies,spanningavarietyofacademicfieldsincludingcomputerscience[8],phys- ics[9],andbiologyandchemistry[1]. Theestimatedsizeofthecitationadvantagevariesacrossandevenwithinstudies,butis oftenmeasuredtobebetween50%and200%morecitationsforopenaccessarticles.[10]The varietyofestimatesisunsurprising,sincebothopenaccessandcitationpracticesvarywidely acrossdisciplines,andcitationsaccumulateatdifferentratesfordifferentarticlespublishedin differentvenues.Differentstatisticalmethodsalsoleadtodifferentestimates.Somestudies havesimplycomparedunconditionalmeansofcitationsforsamplesoffreeandpaidarticles [8],whileothers,suchas[1]measuredtheadvantageinaregressionanalysiswithabatteryof controlsforcharacteristicsofthearticlesandtheirauthors. CritiquesoftheCitationAdvantage Otherstudieshavepresentedevidenceagainstanopenaccesscitationadvantage,arguingthat althoughthereiscorrelationbetweenopenaccessandmorecitations,openaccessdoesnot PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 2/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu causemorecitations.(See,e.g.,[11]and[12]forcriticalreviewsofthecitationadvantage literature.) Kurtzetal.[13]—inaframeworkadoptedbyseveralsubsequentauthors(e.g.,[3,11,14].)— putforththreepostulatestoexplainthecorrelationbetweenopenaccessandincreased citations: 1. TheOpenAccesspostulate.Sinceopenaccessarticlesareeasiertoobtain,theyareeasierto readandcite. 2. TheEarlyViewpostulate.Openaccessarticlestendtobeavailableonlinepriortotheir publication.Theycanthereforebeginaccumulatingcitationsearlierthanpaid-accessarti- clespublishedatthesametime.Whencomparingcitationsatfixedtimessincepublication, theopen-accessarticleswillhavemorecitations,becausetheyhavebeenavailablefor longer. 3. TheSelectionBiaspostulate.Ifmoreprominentauthorsaremorelikelytoprovideopen accesstotheirarticles,orifauthorsaremorelikelytoprovideaccesstotheir“highestqual- ity”articles,thenopenaccessarticleswillhavemorecitationsthanpaid-accessarticles. Kurtzetal.[13],andlater[14],concludedthattheEarlyViewandSelectionBiaseffects werethemaindriversofthecorrelationbetweenopen-accessandincreasedcitations.Alackof causalopen-accesseffectwasfurthersupportedinotherstudies,suchastherandomizedtrials in[3]and[4],andtheinstrumentalvariablesregressionsin[5]. Buteventhesestudiesarenotconclusive.Forexample,Kurtzetal.[13]pointoutthattheir conclusionsmaybespecifictotheirsample:articlespublishedinthetopfewastronomyjour- nals.Theexperimentaltreatmentin[3]and[4]wastomakerandomly-chosenarticlesfreeto downloadonthepublisher’swebsite.Howeasilyresearcherscoulddeterminethesearticles wereavailableforfreeisunclear.And,whiletheinstrumentalvariableanalysisof[5]foundevi- denceofselectionbiasinopenaccess,theystillestimatedastatisticallyandpracticallysignifi- cantcitationadvantageevenaftercontrollingforthatbias. Regardlessofthevalidityorgeneralityoftheirconclusions,thesestudiesdoestablishthat anycitationadvantageanalysismusttakeintoaccounttheeffectsoftimeandselectionbiason citationdifferentials. SourcesofSelectionBiasinAcademia.eduCitations Likemostcitationadvantagestudies,oursisobservational,notexperimental.Articlesarenot uploadedtoAcademia.edurandomly.Authorschoosetoregisterasusersonthesite,andthen choosewhichoftheirarticlestoupload.Whenmakingcomparisonstoarticlesnotpostedto thesite,thiscreatesseveralpotentialsourcesofbiasinunconditionalcitationcomparisons. 1. Self-selectionofdisciplines.Academia.eduusersmaybemorelikelytocomefromparticu- lardisciplines.Sincethecitationfrequencydiffersacrossdisciplines,acitationadvantage estimatethatdoesn’tcontrolforacademicdisciplinemightover-orunderestimatethetrue advantage. 2. Self-selectionofauthors.ResearcherswhopostpapersonAcademia.edumightdifferfrom thosewhodonot.Usersmightskewyounger,orbemorelikelytoworkatlesser-known institutions.Ifso,wewouldexpecttofindthatpaperspostedtothesitetendtohavefewer citationsthanthosenot.Orusersmightskewintheotherdirection—havingmoreestab- lishedreputations,orcomingfrombetter-knowninstitutions,inwhichcasewecouldover- estimatetheactualadvantage.Furthermore,userswhopostpapersmayalsobegenerally PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 3/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu moreproactiveaboutdistributingandmarketingtheirwork,boththroughAcademia.edu andothervenuesonlineandoff.Ifthisweretrue,itwouldalsocauseustooverestimatethe actualadvantage. 3. Self-selectionbyarticlequality.EvenifAcademia.eduuserswerenotsystematicallydiffer- entthannon-users,theremightbesystematicdifferencesbetweenthepaperstheychooseto postandthosetheydonot.As[13]andothershavehypothesized,usersmaybemorelikely toposttheirmostpromising,“highestquality”articlestothesite,andnotpostarticlesthey believewillbeofmorelimitedinterest. 4. Self-selectionbytypeofarticle.Academicjournalspublishcontentbesidesoriginal researchorscholarship:bookreviews,errata,responsestorecentlypublishedarticles,con- ferenceabstracts,editorials,etc.Theseothertypesofcontenttypicallyreceivefewercitations thanresearcharticles.IfAcademia.eduusersarelesslikelytoposttheseothertypesofcon- tenttothesite,thenwemightoverestimatetheadvantagerelativetoanoff-Academiagroup thatcontainsmore“non-research”content. 5. Self-selectionbyarticleavailability.Ausermaybemorelikelytopostapapertothesiteif theyhavealreadymadeitavailablethroughothervenues,suchastheirpersonalwebsiteor institutionalorsubject-specificrepositories.Inthiscase,acitationadvantageestimatedfor Academia.edupapersmightbemeasuringinpartorwhole,ageneralopenaccesseffect fromthearticles’availabilityattheseothervenues. Manyofthesefactorscannotbeobserveddirectlyorcompletely,andtheiraggregateeffect oncitationadvantageestimatesisdifficulttopredict.Wehavecollecteddataandemployed matchingandregressionstrategiestomitigateeachoftheabovepotentialbiases,andcontinue tofindasubstantivecitationadvantagetoarticlespostedtoAcademia.edu. MaterialsandMethods Werelyondatafromseveralsources:(1)articlestheAcademia.eduwebsite,(2)citationcounts andfree-accessstatusfromGoogleScholar,(3)journalrankingsfromSCIMago/Scopus,and (4)journalresearchfieldsfromtheAustralianResearchCouncil.Alldataandcodeusedinthe analysisareavailablefordownloadathttps://github.com/polynumeral/academia-citations. On-AcademiaandOff-AcademiaArticles OuranalysisisacomparisonofcitationsbetweenarticlespostedtoAcademia.edutoarticles notposted.Werefertothesetwosamplesasthe“On-Academia”sampleandthe“Off-Acade- mia”sample.Articlescomprisingeachsamplewereselectedinthefollowingway. On-AcademiaSample:ThearticlesinouranalysiswereuploadedtotheAcademia.edu between2009and2012,inclusive.Wechosetostartat2009becausethiswasthefirstfullyear thatthesitewasactive.Westoppedat2012sothatallarticlesinthesampleareatleasttwo- yearsoldandhavehadtimetoaccumulatecitations.Werestrictoursampletoarticlesthat werepostedtothesiteinthesameyeartheywerepublished.Werefertothisasthe“P=U” (Published=Uploaded)restriction.Thisensuresthatallofthearticlesareexposedtoanycita- tionadvantageeffectstartingfromtheirpublication.Italsomitigatesbiasfromauthorsfavor- ingtheir,expost,most-citedarticleswhenuploadingtothesite. OuranalysisreliesoninformationfromGoogleScholarandCrossRef.Thelatterisadata- basecontainingjournals,articles,authors,andDigitalObjectIdentifiers(DOIs).Therefore,we restrictedtheon-Academiasampletoarticlesthatcouldbematchedbytitleandauthortoboth GoogleScholarresultsandCrossRefentries. PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 4/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu Table1.Samplesizeofpapers,bycohort. Year Off-Academia On-Academia 2009 4,600 149 2010 5,768 490 2011 6,989 2,236 2012 8,368 2,616 Total 25,725 5,491 doi:10.1371/journal.pone.0148257.t001 Off-AcademiaSample:UsingtheCrossRefdatabase,weselectedarandomsubsetofarticles publishedintheyearsasarticlesintheon-Academiasample,butwhichhadnotbeenpostedto Academia.edu. CitationCounts Forallarticlesinboththeon-andoff-Academiasamples,weobtainedcitationcountsfrom GoogleScholarbetweenAprilandAugust2014. Table1showsthenumberofarticlesineachcohortandsample.Theon-Academiasample eachyearisasubsetofpaperspostedtothesitethatyear.Weexcludedpapersuploadedtothe sitethatwerepublishedinanearlieryear,andpapersthatcouldnotbematchedtoaGoogle ScholarsearchresultoraCrossRefentrybasedontheirtitlesandauthors.Usersmanually enterapaper’stitlewhentheyuploadittothesite,andwhattheyentermaydifferfromthe paper’scanonicaltitle.(Forexample,ausermayadd“forthcominginPLoS”tothetitle.)This sortofdiscrepancywasacommonreasonforafailuretomatch.Wedonotbelievethatfailure tomatchapaperisrelatedtoitscitations,andthereforetheseexclusionsshouldnotbiasour results. Articlesinthesamplecomefrom5,725differentjournals,butthereisaconcentratedrepre- sentationofjournals.Table2liststhetenjournalswiththehighestnumberofarticlesinour sample.Themost-representedjournal,AnalyticalChemistrycomprises4.6%ofthesample, andthetoptenjournalscomprise12%. JournalImpactFactorsandDivisions Weusedtheimpactfactorofanarticle’sjournalasamatchingvariableandregressionpredic- tor.JournalimpactfactorswereobtainedfromSCIMagoJournalandCountryRank,which usescitationdatafromScopus[15].Themetricwerefertoasthe“impactfactor”isthe“Cites Table2.Journalswiththemostnumberofarticlesinthesample. Journal #Articles %Total AnalyticalChemistry 1,422 4.56% BiologicalandPharmaceuticalBulletin 329 1.05% AnalyticalMethods:advancingmethodsandapplications 316 1.01% AnalyticalBiochemistry 303 0.97% BioconjugateChemistry 285 0.91% AppliedMechanicsandMaterials 282 0.90% PLoSOne 194 0.62% AppliedPhysicsLetters 179 0.57% AAPSPharmSciTech 164 0.53% AnesthesiaandAnalgesia 155 0.50% doi:10.1371/journal.pone.0148257.t002 PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 5/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu Table3.Toptenjournalsinsamplebyimpactfactor.Impactfactorisaveragedbyyear. Journal ImpactFactor ChemicalReviews 41.92 AnnualReviewofImmunology 39.88 ChemicalSocietyReviews 31.76 AnnualReviewofBiochemistry 31.52 AnnualReviewofAstronomyandAstrophysics 28.48 NatureReviewsNeuroscience 28.34 NatureMaterials 28.26 ProgressinPolymerScience 26.7 Nature 25.87 LancetOncology 25.48 doi:10.1371/journal.pone.0148257.t003 perDoc,2year”metricontheSCIMagosite.Ajournal’simpactfactorin,forexample2012,is calculatedastheaveragenumberofcitationsreceivedin2012bypapersthatwerepublishedin thejournalin2010and2011.Wematchedeacharticletoitsjournal’simpactfactorintheyear thearticlewaspublished.Thisensuresthattheimpactfactorwasnotaffectedbythearticle itself,onlyarticlespublishedinthejournalinprioryears.Thejournalsinoursamplewiththe highestimpactfactorsarelistedinTable3. Wealsoobtaineddataonthejournals’fieldsofresearchfromtheAustralianResearch Council’sExcellenceinResearchforAustraliareport[16].Thereportcontainsdataonaca- demicjournalsthatincludeslabelsfortheirFieldsofResearch,definedusingahierarchicaltax- onomyfromtheAustralianNewZealandStandardResearchClassification[17].Fieldof Researchisthesecondleveloftaxonomy,andthejournalsinoursamplecoveraround200dif- ferentFields. Weinsteadrelyonthefirstlevelofthetaxonomy,the“Division”ofthejournal,which describesbroaddisciplinesofresearch.Thereare22Divisionsinthetaxonomyandajournal canbelabelledwithuptothreedifferentDivisions.Multidisciplinaryjournals,whichcover morethanthreeFieldsofResearch,arelabelledwitha23rdDivisionlabelof “Multidisciplinary.” Alloftheanalysesinthepaperwerealsoconductedwiththe“FieldofResearch”labels, usingtextanalysisanddimensionreductiontechniquestoaccountforthelargenumberof labelsandhighcorrelationsamongstthem.Theseanalysesgavenearlyidenticalresultstothose basedontheDivisionlabels,soweusethelattersincetheyareeasiertointerpret. Table4providessummarydataabouttheDivisionsinoursample:theshareofarticlesin thefullandon-andoff-Academiasamplesineachdiscipline,andthemedianimpactfactorof journalsinoursampleineachDivision.NearlyathirdofarticlesinoursampleareinMedical andHealthSciencesjournals,whileEngineeringandBiologicalScienceseachrepresentafifth ofarticles.Thecolumnsadduptomorethan100%becausejournalscanbelabeledwithupto threedisciplines. DocumentTypes Weincludeinouranalysisonlyarticleswithoriginalresearch,analysisorscholarship,orsur- veyarticles.Weexcludebookreviews,editorials,errata,andother“non-research”content.Our procedureforobtainingon-andoff-Academiaarticlesprovided37,266articles.Fromthissam- ple,weremovedanyarticlesnotidentifiedtobeoriginalresearch. PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 6/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu Table4.JournalDivisions,definedaccordingtothetaxonomyin[17].Shareofarticlesinthefullsample,theon-Academiasample,andtheoff-Acade- miasampleineachDivision,andthemedianimpactfactorofsamplearticlesintheDivision.Journalscanbelabelledwithbetweenoneandthreedisciplines. Division %All %On %Off Med.IF MedicalandHealthSciences 33.0% 18.6% 36.1% 2.58 Engineering 22.9% 12.0% 25.3% 2.77 BiologicalSciences 20.6% 19.6% 20.8% 2.55 ChemicalSciences 18.7% 6.3% 21.4% 3.79 PsychologyandCognitiveSciences 7.7% 17.5% 5.6% 2.46 PhysicalSciences 7.2% 8.3% 7.0% 2.41 MathematicalSciences 7.1% 5.0% 7.5% 1.36 Multidisciplinary 5.1% 11.5% 3.7% 3.20 InformationandComputingSciences 4.9% 5.2% 4.8% 1.95 EarthSciences 4.0% 8.7% 2.9% 2.28 StudiesinHumanSociety 3.7% 9.8% 2.4% 1.15 AgriculturalandVeterinarySciences 3.7% 4.6% 3.5% 2.16 EnvironmentalSciences 3.4% 5.3% 3.0% 2.48 Commerce,Management,TourismandServices 2.8% 4.4% 2.5% 1.30 Technology 2.2% 1.9% 2.3% 1.96 Education 1.8% 4.5% 1.2% 1.12 Economics 1.6% 1.9% 1.5% 1.15 Language,CommunicationandCulture 1.4% 4.6% 0.8% 0.63 PhilosophyandReligiousStudies 1.4% 4.2% 0.8% 0.64 HistoryandArchaeology 1.3% 4.8% 0.5% 0.92 BuiltEnvironmentandDesign 0.9% 1.7% 0.8% 1.84 CreativeArtsandWriting 0.5% 1.4% 0.3% 0.76 LawandLegalStudies 0.4% 0.8% 0.3% 0.77 doi:10.1371/journal.pone.0148257.t004 Toidentifythetypeofeacharticle,weusedAmazonMechanicalTurk(MTurk),acrowd- sourcingmarketplace.CommonusesofMTurkinacademicresearchincludecollectingsurvey data,performingonlineexperiments,andclassifyingdatatotrainandvalidatemachinelearn- ingalgorithms.Anappendixwithamorecompletedescriptionofthedocumentclassification process,includingtheworkerquestionnaire,andaccuracystatistics,isavailableatthispaper’s Githubrepo,https://github.com/polynumeral/academia-citations/.Therepoalsoincludes underlyingdataonworkerresponses. WeprovidedDOIlinkstoarticlesinoursampletoover300MTurkworkers.Theworkers wereaskedtofilloutanonlineformbasedoninformationfromtheabstractorfulltextatthe DOIlink.Theywerefinallyaskedtoclassifythearticleasoneofthefollowingtypes: 1. Asummaryofameetingorconference 2. AnEditorialorCommentary 3. Aresponsetoarecentarticleinthesamejournal; 4. Anarticlewithoriginalresearch,analysisorscholarship,orabroadsurveyofresearchona topic 5. ThisisaBookReview,SoftwareReview,orreviewofsomeotherrecentworkorperformance 6. AnErratum,Correction,orRetractionofanearlierarticle 7. Somethingelse PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 7/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu Workersmightfailtocategorizeanarticle,givingoneofthesereasons:thelinkwasbroken, therewasnoabstractortextavailableonthesite,thearticlewasinaforeignlanguage,orthey otherwisecouldn’ttell.Someworkers’resultswereexcludediftheyexhibitedsuspiciouspat- terns,suchasgivingallarticlesthesameclassification,orcompletingalargenumberoftasks inanunreasonablyshorttime.Theirtaskswerethenresubmittedsothateacharticlehadthree independentreviews. Eacharticlewasreviewedbythreedifferentworkers.Oursampleonlyincludesarticlesthat allthreeworkersidentifiedas“originalresearch”(option4).Oftheoriginal37,266articles,this left31,216“originalresearch”articles.Relyingonamajority,2-of-3votetoclassifyarticles wouldhaveresultedin35,311“originalresearch”articles.Unanimityisaconservativeclassifi- cationrule,butgiventhatfalsepositiveclassificationof“originalresearch”articlescould upwardlybiasourresult,weconsideritappropriate. OnlineAvailability Inthelastsection,weconsideredseveralpotentialsourcesofselectionbiasintheon-Academia sample.Onewasthatusersmightbemorelikelytouploadarticlestothesiteiftheyhavealso madethosearticlesavailableelsewhereonline.Toexaminethispossibility,wecollecteddataon whetherallpapersinoursamplewerefreelyavailablefromnon-Academiasources.Fortheon- Academiaarticles,thiswouldmeantheywereavailablefromatleasttwoonlinesources. Todeterminewhetherapaperwasavailableelsewhere,wesearchedforitstitleonGoogle Scholar,andcheckedwhethertheresultscontainedalinktoanon-paywalledfull-textarticle. Thismethodissubjecttofalsenegatives,butsincethefailuretomatchatitle,orcorrectlyiden- tifyafull-textarticleonanon-Academiasiteshouldbeindependentofwhetherthearticleis alsopostedtoAcademia,weexpectitserrorratetobesimilarforbothon-andoff-Academia articles. Table5liststhenumberofarticlessearched,andthepercentagewithfree-accesstofulltext onnon-Academia.edusites.Wefindthatpapersintheon-Academia.edusamplearemore likelytobeavailableonlineaspapersintheoff-Academiasample.Thisindicatesthatthere maybesomeself-selectionbyavailabilityinourdata.Ourregressionanalysescontrolfor online-availability,mitigatingpotentialbiasfromthediscrepancy. Theuseofabinaryindicatorforonlineavailabilitydoesconcealsomepotentiallyuseful informationaboutthearticle’savailability.Forexample,howmanydifferentvenuesismaybe availableon,orwhatthosespecificvenuesare.Suchmetricsaredifficulttomeasureaccurately, butcouldbeinteresting.Indeed,thispaperarguesthatvenue-specificeffectscanbemeaning- ful.Nonetheless,wedonotbelievethisun-measuredinformationwillcontributetoanysub- stantialbiasforseveralreasons;theprimaryonebeingthatwefindasignificantcitation advantageamongstarticlesthatarenotonlineonanynon-Academiavenue;aneffectgenerally largerthantheaverageonlineadvantagewemeasurewiththebinaryvariable.Werewetousea richermetricforonlineavailability,thosearticlewouldnotbeaffected,andtheirAcademia advantagewouldremainroughlythesame. Table5.Shareofsamplearticlesfreelyavailablefromnon-Academia.edusites. Off-Academia On-Academia a. Full-textavailableelsewhere 9,487 3,652 b. Articlessearched 25,725 5,491 c. Share(a(cid:1)b) 36.9% 66.5% doi:10.1371/journal.pone.0148257.t005 PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 8/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu Table6.Citationssummarystatistics. Sample Min. 1stQu. Median Mean 3rdQu. Max. off-Academia 0 2 5 10.19 12 1237 on-Academia 0 3 7 12.77 15 721 doi:10.1371/journal.pone.0148257.t006 QuantifyingtheCitationAdvantage Ourgeneralempiricalstrategyistoestimatethedistributionofthecitationcountofarticlei, publishedinjournaljattimet,conditionalonitbeingpostedtoAcademia.edu,andcompare thisdistributiontothesamearticle,butconditionalonitnotbeingpostedtothesite.Denoting thenumberofcitationsasarandomvariableY,weareinterestedinthedistributions P1ðyÞ ¼ProbðY (cid:3)y jj;t;on(cid:4)AcademiaÞ ijt P0ðyÞ ¼ProbðY (cid:3)y jj;t;off (cid:4)AcademiaÞ: ijt Wecancomputethechangeinanarticle’scitationsassociatedwithpostingtoAcademia. edu,Δ ,bycomparingsummarystatisticsofthesedistributions.Forexample,thedifferencein ijt means D ¼E1ðYÞ(cid:4)E0ðYÞ; ijt ijt ijt ormedians, D ¼Med1ðYÞ(cid:4)Med0ðYÞ: ijt ijt ijt Oneapproachwouldbetodirectlyestimatethesesummarystatisticsbycomputingaverage ormediancitationswithineachjournal×yeargroup.Unfortunatelymanyofthesegroupscon- taintoofewarticlestoaccuratelyestimatesummarystatistics.Instead,weusejournal-specific covariatestorepresentjournals,mostprominentlythejournal’simpactfactor.Thisleadsto twoapproaches:anon-parametricmatchinganalysis,andaregressionanalysis. PropertiesofCitationCountDistributions Citationcountsarenon-negativeintegerswithahighlyright-skeweddistribution.Thiscanbe seeninTable6andFig1,thelatterofwhichalsoshowsthatthemodalarticlehasoneorno citations.Ourmatchinganalysisaccountsforthisaspectofthedatabycomparingquantilesof on-andoff-Academiacitationcounts.Ourregressionanalysisappliesseveralparametricmod- elsthataccommodateright-skewedcountdata. Results MatchingbyImpactFactor Ourfirstanalysiscomparescitationsofon-andoff-Academiaarticlesgroupedbycohortand theirjournals’impactfactors.Thisiseffectivelyamatchingstrategywithyearandimpact-fac- torasthecovariates;thepurposebeingtoprovidearelativelysimplenon-parametricestimator ofthedifferencewhilecontrollingforimportantcovariates.Theregressionanalysesinthesub- sequentsectionswillexpandonthisanalysiswithalargerarrayofcontrols. Tomatchon-Academiaarticlestooff-Academiaarticles,wecomputeddecilebinsofimpact factorsamongsttheon-Academiaarticlesinacohort.Therefore,eachimpactfactorbin PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 9/23 OpenAccessMeetsDiscoverability:CitationstoArticlesPosted toAcademia.edu Fig1.Distributionsofcitations(x-axisistruncatedat100). doi:10.1371/journal.pone.0148257.g001 represents10%ofarticlesintheon-Academiasampleforthatyear.Wethengroupedtheoff- Academiaarticlesintothosebins,andcomparedsampleswithineachbin. Fig2showsboxplotsofcitationstoon-andoff-Academiaarticlesineachcohortandimpact factorbin.(Bornmannetal.[18],amongothers,advocateusingboxplotstocomparecitation differencesacrosssamples.)Evidentinthefigurearethatolderpapershavemorecitations,and thatarticlespublishedinhigherimpactfactorjournalshavemorecitations.Furthermore,we findthatmediannumberofcitationstoon-Academiaarticlesisconsistentlyhigherthanoff- Academiaarticlesacrosscohortsandimpactfactorbins.Table7providesthemediansand PLOSONE|DOI:10.1371/journal.pone.0148257 February17,2016 10/23
Description: