BMC Bioinformatics BioMed Central Research Open Access Scratchpads: a data-publishing framework to build, share and manage information on the diversity of life † Vincent S Smith*, Simon D Rycroft , Kehan T Harman, Ben Scott † and David Roberts Address:NaturalHistoryMuseum,CromwellRoad,London,SW75BD,UK E-mail:VincentSSmith*[email protected];[email protected];[email protected]; [email protected];[email protected] *Correspondingauthor †Equalcontributors Published:10November2009 BMCBioinformatics2009,10(Suppl14):S6 doi:10.1186/1471-2105-10-S14-S6 Thisarticleisavailablefrom:http://www.biomedcentral.com/1471-2105/10/S14/S6 PublicationofthissupplementwasmadepossiblethankstosponsorshipfromtheEncyclopediaofLifeandtheConsortiumfortheBarcodeofLife. ©2009Smithetal;licenseeBioMedCentralLtd. Thisisanopenaccessarticledistributedundertheterms oftheCreativeCommons Attribution License(http://creativecommons.org/licenses/by/2.0), whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedthoriginalworkisproperlycited. Abstract Background: Natural History science is characterised by a single immense goal (to document, describe and synthesise all facets pertaining to the diversity of life) that can only be addressed throughaseeminglyinfiniteseriesofsmallerstudies.Thediscipline’sfailuretomeaningfullyconnect thesesmallstudieswithnaturalhistory’sgoalhasmadeithardtodemonstratethevalueofnatural history toawiderscientificcommunity.Digitaltechnologiesprovidethemeanstobridgethisgap. Results: We describe the system architecture and template design of “Scratchpads”, a data- publishingframeworkforgroupsofpeopletocreatetheirownsocialnetworkssupportingnatural history science. Scratchpads cater to the particular needs of individual research communities through a common database and system architecture. This is flexible and scalable enough to supportmultiplenetworks,eachwithitsownchoiceoffeatures,visualdesign,andconstituentdata. Ourdatamodelsupportswebservicesonstandardiseddataelementsthatmightbeusedbyrelated initiatives such as GBIF and the Encyclopedia of Life. A Scratchpad allows users to organise data around user-defined or imported ontologies, including biological classifications. Automated semantic annotation and indexing is applied to all content, allowing users to navigate intuitively andcuratediversebiologicaldata,includingcontentdrawnfromthirdpartyresources.Asystemof archiving citable pages allows stable referencing with unique identifiers and provides credit to contributors through normal citation processes. Conclusion: Our frameworkhttp://scratchpads.eu/ currently serves more than 1,100 registered usersacross100sites,spanningacademic,amateurandcitizen-scienceaudiences.Theseusershave generated more than 130,000 nodes of content in the first two years of use. The template of our architecture may serve as a model to other research communities developing data publishing frameworks outside biodiversity research. Page1of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 Background for legacy natural history data. But for new information, Taxonomic, systematic, and biodiversity studies (herein or that which has never been formally published, such referred to as ‘natural history’) are data-intense sciences reverse engineering should not be required so long as a thatdrawinformationfrommanydisciplinesinorderto technical, social and policy framework for data publica- buildacoherentpictureoftheextentandtrajectoryoflife tion can be found. This paper is about the creation of a on earth [1]. These data are essential to our discovery, data-publishing framework for natural history. understanding, and responsibleuse ofthenatural world [2]. Natural historians have traditionally relied on Data publication is a new sort of scholarship that manual systems and techniques to gather, organise, and involves construction of large-scale datasets whose publishthisinformation.Thisiscollectivelyenshrinedin governance, organisation and use can be implemented scientificpapersthatformanarchivespanningmorethan using the web as a platform for generating, repurposing, 250 years of published research. But as ever more and usingdata[7].Thesesystemsofferthepossibility of synthetic and integrated accounts of the natural world radical new ways of conducting scholarship and chal- are required to understand and mitigate threats to the lenge some established ideas of how particular dis- environment, the failings of traditional publication ciplines operate. To facilitate reuse and repurposing of methods for disseminating and using natural history data, these systems embrace ‘Open Science’ - the researchhavebecomeevermoreapparent. proposition of a model of communication inspired by the Free/Open Source software and Creative Commons Traditional papers struggle to accommodate the high movement [8]. A central theme of Open Science is to volume of data supporting accounts of natural history. make clear accounts of the methodology, along with In recent years this has been exacerbated by the rapid dataandresultsextractedtherein,freelyavailableviathe growth of semi-automated data gathering techniques Web.Thispermitsamassivelydistributedcollaboration. producing large-scale datasets such as those incorporat- ing genomic, phylogenetic, and image based compo- Designconcernsrelevanttodatapublicationframeworks nents. The size and diversity of these datasets mean that include the successful management of large scale dis- they are at best marginalised to an electronic ghetto on tributedresearchprogrammes, howto supportnetworks publishers’ websites. But all too often natural history of independent researchers, the management of data are simply never published [3]. The low impact of individual research careers, the development of new most natural history research, coupled with the high inter-disciplinary collaborations, and engagement with transaction costs associated with publication and access, non-scholarly communities as both producers and con- mean that much (perhaps most) natural history data sumersofresearch.Theseconcernsarehighlyrelevantto onlyexiststacitlyandinformallywithinexpertnetworks thechallengesfacingnaturalhistoryscientists[4].Froma of specialists (i.e. in the minds, notebooks and compu- technical perspective these systems involve the develop- ters of the people generating the data). Such data are ment of tools for sharing natural history data, building imperilled by a decline in the number of professional and maintaining collaborations both formally and specialists engaged in these networks, such as that informally,andmanagingworkflowandoutputs. reported amongst biological taxonomists. Indeed, this had lead some to question the long-term viability of Adefiningfeatureofdatapublicationframeworksisthat natural history as a professional scientific discipline [4]. they primarily rely on social information flows, motiva- Arguably natural history’s salvation lies in better use tions and relations to organise the group. Individuals (reuse) of the underlying data. self-identify, mostly, for tasks, and through a variety of peer-reviewmechanismscontributionsarerecognisedby Meaningful forecasting and sustainable use of biotic the group and incorporated into what emerges as the resources requires large volumes of primary biodiversity collaborative output. A feature critical to their success is data. However, this leaves informaticians with the the ability of the framework to be broken down into challenge of integrating data from numerous, disparate discretemodules,capableofindependentcompletionin natural history data providers, each with their own relatively fine-grained increments. Because of this, specific user communities, and diverse data types and people can contribute a little or a lot depending upon sources, including taxonomic names and concepts, their motivations, such that some combination of ‘true specimens in museum collections, scientific publica- believers’, occasional contributors, and people paid to tions, genomic and phenotypic data, and images [5]. participate can sustain a project. Oneapproachistodata-mineexistingpublicationsinan attemptto reverseengineer scholarly publications into a In this paper we describe the system architecture and database [6]. This approach is arguably the only option template design of “Scratchpads”, a data-publishing Page2of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 framework for groups of people to create their own for biological occurrence records [12]), or most of these social networks supporting natural history science. This data needs across narrow taxonomic domains (e.g., infrastructure is a combination of databases, network FishBase for fish [13], Avibase for birds [14], Amphibia- protocolsandcomputationalservicesthatbringspeople, Web for amphibians [15]). Arguably, though, none of information and computational tools together to per- these systems scale to the breadth of all taxonomic form and publish natural history. Our goal was to build diversity for all natural history data types. Indeed most a system that could motivate individual researchers in limit their scope in order to place some practical the generation, management and dissemination of their boundary on their development, and to simplify the owndatafortheirownneeds,whileempoweringawider process of establishing credibility within their chosen constituent of potential users who are free to repurpose domain. Further, these systems all struggle to accom- this information for other uses. modateconflictingoralternativehypothesesaboutdata. Natural history science is well known for its epistemo- logical richness and diversity [16]. It is difficult, if not Implementation impossible, to find researchers completely agreeing with Design considerations and related work each other within and between domains (e.g. in the Standard tools can be designed for the codification and taxonomic classification or phylogenetic relations of dissemination of data and knowledge for communities particular taxa). Electronic systems that force contribu- with standard practices. But in cutting-edge disciplines, tors to adopt a single representation of a particular data those spanning multiple specialities, or where standards set (e.g. a single taxonomic hierarchy for navigating are nascent and cannot be well defined, developing a data) risk disenfranchising potential contributors, often common approach can be extremely difficult. Natural to the exclusion of their data and interpretations. history science has all these challenging criteria. Natural history scientists work in fragmented, highly distributed To address these design requirements and social chal- and parochial communities, each with domain specific lenges the Scratchpads consist of a loosely coupled requirements and methodologies [9]. Their output is platform for publishing natural history research that heterogeneous,highvolumeandtypicallyoflowimpact, enables contributors to build, share, and manage data but with a citation half-life that may run into centuries. with minimal barriers beyond our highly generic design Thisoutput(e.g.speciesdescriptions)broadlyconformto constraints. We accommodate epistemological diversity a power law (long-tail) distribution where the least and the problem of establishing trust within natural regularly accessed content accounts for more than half history domains by enabling contributors to create of the total and is proportionally more important than independent sites whose purpose, destiny and brand thesmallerfractionofmoreregularlyaccessed(popular) rest in the hands of the contributing community. These content.Consequentlyahighlevelandflexibleapproach communities are often well established and have a to developing the software and workflow is needed, strong sense of purpose. Nevertheless, our software coveringbroadsubjectsandthemesinordertoencourage platform “Scratchpads” needed to establish credibility adoptionbyarangeofnaturalhistoryscientists. and trust across natural history science domains. This wasachievedbybasingtheprojectattheNaturalHistory Fundamental tothedesignwas theneedtobuilda truly Museum (NHM), London, and managing the project scalable and flexible data publishing framework acces- throughtheEuropeanDistributedInstituteofTaxonomy sible through a web browser that supports 1) large (EDIT). The NHM has a well-established brand as a numbers of users as passive readers and active contribu- world leader in natural history research, while EDIT is a tors; 2) editorial hierarchies serving individual and network of 28 leading European, North American and community needs; 3) the epistemological richness and Russian institutions specialising in natural history. This diversity of all contributors; 4) flexible data models that structure was intended to minimise individual and canbemodifiedoraddedbycontributors;5)automated institutional rivalries that might jeopardise the long- integrationofthirdpartycontent;6)automatedsemantic term sustainability of the project. Thus supporting our enrichment of contributed and third party content; 7) goalofcreatinganopencommunityresourcefornatural contentworkflowsandcurationtools;8)contentarchival history. andcitation;9)contentlicensingandaconditionsofuse framework;10)webservices;and11)easeofuse. The unusual nature of this project demands agile development methodologies to promote frequent Within the context of natural history science, some inspection and adaptation of the software in response websites and services meet many of these needs across to user needs [17]. Throughout this ongoing process we specific data types (e.g. TreeBASE for phylogenetic trees are mindful that our approach must be generic enough [10], Catalogue of Life for taxonomic names [11], GBIF toscaletothewidestconstituentofpossibleusers.Short Page3of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 development cycles and informal project management, gamut of data and practices engaged in by the wider sometimes spanning hours to generate several iterations usercommunity.Consequently,takeupamongstthe of a feature in reciprocal response to user feedback, potentialpoolofcandidateusersmaybelowbeyond makestraditionaldocumentationandplanningdifficult. those surveyed, because the transaction costs for However, our experience suggests this leads to software usersengagingwithsystemsthatdonotmeetthefull thatbettermeetsuserneeds,andfostersacommunityof spectrum of their needs is too high. For example, developers and users that arguably builds a path toward biodiversity informatics applications often assume long-term sustainability. data held by users is more structured, and therefore more readily modelled within a database, or struc- tured differently than it typically is. The effort Architecture and workflow (transaction cost) required by users to sufficiently Rather than expand the taxonomic or content scope of structure (or restructure) their data is too high, anexistingsystemsupportingnaturalhistoryscience,we relative to their perceived benefit from using the decided that a more generic solution would be required system. that could be tailored in a sustainable way toward the 2. The relationship between content (text, images, bespoke needs of natural history scientists. Content video, data etc) and context (layout presentation, Management describes the set of processes and technol- branding, ownership, identity, audience etc) is ogies that support the evolutionary life cycle of digital crucial to understanding how and why users engage information. Content Management Systems (CMS) can with information technology systems. This textogra- provide generic informatics solutions for web publica- phy (sensu Swales [23,24]) is crucial to scholarly tion of content created by individuals acting alone, or discourse, but is challenging to accommodate in within organisations and research communities of biodiversity informatics systems that usually disarti- almost any size. These tools aid in managing the culatetheprocessofcapturingcontent(e.g.theactof development of software, collaborations, documents populating a database) from content presentation and websites. They are highly extensible and can be (e.g.athedisplayofaformatteddatabasequeryona developed to support other research-specific activities webpage). such as handling large distributed datasets, data visua- 3. The heterogeneity of biodiversity data, coupled lisation and analytical tools [18]. The emphasis of CMS with multiple small, niche user communities, often tools on instant (web) publication and content main- with distinct needs and different audiences, requires tenance, allows contributors to focus on content devel- highly bespoke informatics solutions. These are opment, instead of administration. expensive and challenging to sustain, and usually lack a clear business model beyond intermittent Our decision to adapt an existing CMS for managing cycles of grant funding. biodiversity data is a significant departure from the conventionalapproachemployedwithinthebiodiversity OurreasonforadoptingaCMSasaplatformratherthan informaticscommunity.Traditionallythisinvolvesdevel- building a bespoke application was to directly address oping customised database models, usually after exten- thesechallenges.The‘content’inCMSistypicallyloosely sive mapping and observation of the target user definedandcanbeaccommodatedinvariousways,from community, followed by the development of a bespoke highly unstructured ‘pages’ or nodes, through to highly software application that formalizes workflows and structured normalized datasets. This provides the flex- processes.Acurrenthighprofileexampleofthisapproach ibilitynecessarytoaccommodateusecasesthatwerenot is the Common Data Model (CDM) [19], which is in originally envisaged at the outset of the project. CMS development by EDIT Workpackage 5 [20]. This is minimize the distinction between the organization of intended to provide a generic data model and service contentanditsfinalpresentation.Thishelpsthecontent frameworkforbespokebiodiversityapplicationsthatare providervisualisehowcontentwillbepresentedtotheir collectively referred to as the “CyberPlatform” [21]. audience without having to second-guess how an Another high profile example is CATE (Creating A informatician will re-present content on their behalf. Taxonomic E-science) [22], which augments the CDM Finally, generic CMS systems are used extensively in libraryandserverwithadditionallogictopresentthedata manyscholarlyandnon-scholarlysettingsandsupporta aswebpagesandinaworkflowgearedtowardrevisionary rangeofgenericfunctionalitythatisrequiredbyallweb- taxonomy.Inourexperiencethechallengewiththeseand based applications regardless of the size or purpose of similarbespokebiodiversityapplicationsisthreefold: theuserbase.Developercommunities writing the under- lying CMS software are completely independent and 1.Thedatatypes,structuresandworkflowsmodelled several orders of magnitude larger than the niche withinthesesystemstypicallydonotcapturethefull informatics communities working to support taxonomy Page4of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 and systematics. This gives generic CMS software much through a combination of 42 modules provided by the greater sustainability than bespoke biodiversity applica- Drupal community and 29 modules developed within tionswrittenbyverysmallnumberofdevelopers,which the Scratchpad project (See Additional file 1: Modules aresupportedbyintermittentresearchgrants.Buildingon for a complete list). The latter provide a suite of top of a CMS removes the burden of having to develop specifically adapted tools to support natural history generic functionality common to all applications (e.g. scientists and their data. user management and content versioning) allowing developers to focus on specialised functionality that New sites are initiated at the behest of a potential user, directlymeetstheneedsofthetargetusercommunity. who registers interest and accepts responsibility for the new site, through a form on the Scratchpad website CMS are usually built on top of content management http://scratchpads.eu/apply. Users supply their bio- frameworks (CMF) and open source programming graphic details and information on their proposed use languages. Many CMS tools exist [25] but the top four ofthesite,inadditiontoaGoogleMapAPI(Application based on a recent study [26] include one written in ProgrammingInterface)keythatactivatestheScratchpad Python, Plone [27], founded on the Zope CMF; and geolocativefeatures. Upon submissionofthecompleted threewritteninPHP:WordPress [28], acommonlyused form, the site approval and creation process is initiated. bloggingplatform;Joomla[29],widelyrecognisedforits Currently the Scratchpad project accepts all sites within ease of use; and Drupal [30], commonly used in multi- the domain of natural history, even if their proposed user collaborative sites. For the Scratchpad project we subject overlaps that of a current site. Site creation is selectedDrupalbecauseitoffersagoodbalancebetween semi-automated and is controlled through a Drupal thesophisticationandeaseofuserequiredformanaging installationprofilethatspecifiesstandardisedScratchpad largeanddistributedusercommunities.CruciallyDrupal settings,administrativeusers,modules,permissions,and met6ofthe11designcriteriathatwereidentifiedinthe establishesthesitesothatitisimmediatelyaccessibleto previous section. The variety of contributed Drupal the registrant. Registrants receive e-mail notification of modules (currently over 7,000), size of the userbase their new site and are assigned the role of site (including several Fortune 500 [31] companies, major maintainer, granting them administrative permissions universitiesandgovernmentagencies),andreadysupply thatincludetheability to assignnewusersasadditional of developers thanks to the popularity of PHP, were all maintainers, editors or contributors. These roles control contributing factors in our decision. In principle the user actions through a cascading hierarchy of permis- Scratchpadprojectcouldhowever,bereplicatedinanyof sions. Contributors are restricted to authoring and the top four CMS. editing their own content, editors can author and edit any content, while maintainers also have certain admin- TheScratchpadprojectwasfoundedonDrupalversion5 istrative privileges. Scratchpad project administrators in March 2007, although an ongoing transition to havefulladministrativecontrol fromanaccounthidden Drupal 6 means that as of June 2009 sites are being to other users. A click through agreement ensures that upgraded. The Scratchpad server’s operating system is each user agrees to a set of terms and conditions that Red Hat Enterprise Linux (RHEL 5) running Apache 2.2 outline their rights and responsibilities through use of with PHP 5.1.6 and the database backend is MySQL the site http://scratchpads.eu/TermsAndConditions. 5.0.45. The Scratchpad project can, however, be run on These terms are ultimately arbitrated by the Scratchpad any operating system, web server and database that administrators who reserve the right to review, refuse, supports PHP and Drupal. We are running a single monitor, edit or remove any content. Apache virtual host for all Scratchpad instances with Drupal handling domain names in a standard Drupal Content is added to a Scratchpad through a flexible multisite configuration. This means that all sites share a workflow (Figure 1) that is optimised if one or more common codebase but have their own database and ontologies (e.g. a biological taxonomic hierarchy) are database user, controlling data access on a site-by-site imported into the site first. These ontologies provide a basis. User uploads are also segregated to independent structure around which content can automatically be folders to improve site security and facilitate site-by-site tagged (i.e. classified), and facilitate search and browse mobility, archival and backup. Drupal provides the functionality.Scratchpadusersdonothavetofollowthis foundations for basic web publication including the workflow when publishing content, although the abilitytoregisterandmaintainindividualuseraccounts, absence of matching terms (e.g. taxon names) in an administration menus, RSS-feeds, customisable layout, uploadedontologywilldisablefeaturesthatdynamically flexible account privileges, logging, a blogging system, aggregatetaggedcontent(e.g. Taxon pages),makingthis and an Internet forum. The Scratchpads build on this information harder to navigate. Page5of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 NODES VOCABULARIES VIEWS Submitted Content Ontologies Taxon Page Host classi?cation 1 Host classi?cation 2 Host classi?cation 3 Parasite classi?cation 1 Country hierarchy Taxonomic ranks list Third-Party Content Anatomical terms list ... Semantic Enrichment Auto- node mated tagger Figure 1 Scratchpad workflow. Users independently submit content in Drupal nodes and vocabularies of terms (e.g. biological classifications of taxon names) associated with content. Nodes are grouped into content types (e.g. images, DNA sequences, phylogenetictrees,GBIFmapsetc),eachofwhichhasspecificworkflowsfordataentryandediting.Termsfromavocabulary (e.g. taxon names) common to nodes are used to semantically enrich (tag) content on submission. Tagged nodes can be represented in predefined views according to their content type, and are dynamically aggregated on term pages (e.g. taxon pages)thatcanbenavigatedthroughtheirvocabulary.Intuitivecurationtoolsallowuserstoselectwhichnodestodisplayon termpages.Thismodelprovidesahighlygenericsolutiontosubmitting,grouping,enriching,andnavigatingdiversedatatypes. Vocabularies implemented in a module http://drupal.org/project/ Drupal supports the use of multiple restricted vocabul- leftandright/ that has been tested with classifications aries as flat lists of terms, single hierarchies, or multiple exceeding two million terms without encountering hierarchies. We have adapted this system to support the notable performance degradation. import, export and management of biological classifica- tions through a series of modules. They allow Drupal to Drupal allows users to create multiple vocabularies that support very large hierarchies of a potentially unlimited must be linked with appropriate content types before numberofterms;improvethetermeditingandmanage- they canbeused. Thispermits users toassociatecontent ment interface; and provide the means to store addi- withmultiple,andpotentiallycompeting,classifications. tional term metadata. By default all hierarchical TermscanbeaddedtoavocabularywithinaScratchpad classifications are stored as parent-child relations within on a term-by-term basis through an editing interface, or thesitedatabase.However,performanceissuesmakethis enmasse,eitherviaatabdelimitedtextfilethathasbeen very inefficient for classifications with large numbers of exported from a Spreadsheet template http://scratch- terms. In instances where classifications contain more pads.eu/taxonomytemplate, or through a web service than one thousand terms, the hierarchy is additionally providedbytheEncyclopediaofLife(EOL)project[33]. stored using a nested set algorithm [32]. This is Thelatterenablesuserstoimportallchildtermswithina Page6of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 hierarchyfromoneofaselectionofpublishedbiological editwebrevisions.info/biblio and fungus gnats http:// classifications.Thisservicesuppliestermsandassociated sciaroidea.info/bibliography. metadata (authority, rank and synonymy) in an RDF representation of the Taxonomic Concept Schema [34]. Blog entry Ablogentryisasingleposttoanonlinediaryorjournal. Hierarchy management and term (taxon name) mod- This is a core content type within Drupal. Users can ification is supported for biological classification create multiple blogs within a site that are linked to through an editor that enables intuitive (drag and terms within a restricted vocabulary. A good example drop)manipulationoftermsandsectionsofhierarchies. containing multi-authored blog entries can be found on ThiswasdevelopedinpartnershipwiththeEncyclopedia theWallaceFundWebsitehttp://wallacefund.info/blogs. of Life project [33] and allows users to build new classifications using sections of classifications that have been previously created (i.e via drag and drop editing). Character project The tool also allows users to associate term names with This node type allows users to build and manage a their protologue, i.e. the bibliographic reference of the matrix of controlled, text or numeric characters asso- original taxon description. ciatedwithselectedtaxa.Thismatrixhas theappearance of a spreadsheet, and can be used to build morpholo- gical or molecular datasets for phylogenetic analysis, Content types identification keys and printed character lists or descrip- Drupal stores content in nodes which are instances of tions. On initiating a character project, users select taxa content types, each of which may have bespoke work- from one of any vocabulary present within their site flows for data import, export, editing and visualization. usingahierarchicalselectlist.Taxamaybehierarchically Certain generic content types (e.g. pages, blog entries, arranged within the matrix according to their classifica- fora) are defined within the core of Drupal. More tion, with an option for parent taxa to inherit the specialized content types are defined through modules properties(characterstates)oftheirchildren,orarranged that are tailored to the particular demands of different as a simple flat list. New characters are added to the data. The Scratchpads use a combination of contributed character project at the behest of the user, who can modules (i.e. those provided by members of the Drupal choose between three character types. These are con- developers community) and modules written by mem- trolledcharacters,whichlimitcharacterstatechoicestoa bersoftheScratchpaddevelopersgroup.Whenamodule restricted list provided by the user, e.g. molecular DNA developed within the Scratchpad project is sufficiently bases; text characters which allow unrestricted textual robust and of generic use to other Drupal users, it is input such as a verbose description of a particular released to the Drupal community via the Drupal morphological feature; and numeric characters, which website under an open source licence. allowwholeintegerordecimalinpute.g.measurements. Characters can be added to character groups and can be reordered in a drag and drop fashion to facilitate the Bibliography rapid creation and collection of character data. Users A contributed module http://drupal.org/project/biblio enter character states directly into the character grid, thatallowsuserstomanageanddisplaylistsofscholarly which are validated according to the type and controls publications. As part of the Scratchpad project we use specified when the character was created. Character this as a stand alone content type in addition to projects use the SlickGrid jQuery plugin [35] that is integrating this into the workflow for managing biblio- incorporatedintoourcharactereditor(nexus)moduleto graphic metadata associated with taxonomic names and provide the grid interface. specimens records. Features of the bibliography module include the ability to import reference lists in BibTeX, RIS, MARC, EndNote tagged and XML formats, and Countries map exportlistsinBibTeX,EndNotetaggedandXMLformats. This node type is used for displaying maps of the world Themoduleallowsuserstoformatreferencesinmultiple highlighting selected countries. It is intended for use in styles and supports in-line citation of references. This displaying the presence or absence of taxa from function is currently one of the most popular features particular countries, when more precise geolocative within the Scratchpads, supporting 37,204 nodes data is unavailable. Users select geographic regions across 37 sites. Users can upload PDF files of articles to from a list hierarchically organised around the TDWG create discipline specific bibliographies relevant to geographic region ontology (levels 1-4) [36] to produce natural history. Current Scratchpad bibliographies distribution maps of taxa. This module uses polygons include examples on fossil insects http://fossilinsects. corresponding to country outlines generated though a Page7of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 serviceprovidedbyEDIT[37],whichareoverlayedonto using the group home page as a focal point. Users can a world base map presented in the Mercator projection. establishprivategroupsthatwillnotbedisplayedinthe groups list, and groups may be selective, requiring users to be approved by the group administrator before Custom content types becoming a member. Groups are defined through the Userscancustomisecontenttypestotheirbespokeneeds Organic Groups contributed module http://drupal.org/ using the contributed Content Construction Kit (CCK) project/og. http://drupal.org/project/cck. The module is not a content type in itself, but allows users to create content types. CCK provides an intuitive interface for users to Image augment predefined content types (i.e. add new fields) The upload and display of images is defined through a or create new content types. Users first define a set of contributedmodulehttp://drupal.org/project/imagethat fields that become part of a table within the Scratchpad we have incorporated into a workflow through a second sitedatabase.Datacanthenbeselectivelyimportedinto module http://drupal.org/project/imagex. This supports the table from a tab-delimited spreadsheet. This is done the mass upload and display of images. Users can drag with the aid of an intuitive interface that guides users and drop collections of images in most formats includ- through the process of matching column headers from ing BMP, JPEG, TIFF, and PNG files, on to an open thespreadsheetwith fields inthe contenttype. Datacan source applet [38] that will upload the pictures. Image then be imported en masse, and visualised in user- thumbnails are dynamically created on upload. Users defined presentations with the views and taxon page can select images on to which they can apply annota- interface (see below). To date Scratchpad users have tions en masse via the matrix editor (described below). created custom content types for diverse data sets that All annotations are optional, and by default users can could not have been anticipated by the Scratchpad specify a title and gallery name from where the images development team, or supported because of its niche will be accessible. Users can also apply keywords and relevance to a particular subject domain. In many cases detailsofpreparationandimagingtechniques.Theseare predefined standards for these obscure natural history drawn from a restricted vocabulary defined within the data types do not exist. Examples include a checklist of Scratchpad that can be augmented by the user. For cockroach cultures currently being held in captivity by images of biological specimens users can optionally members of the Blattoidea culture group http://blatto- apply specimen and location information as defined by dea-culture-group.org/en/culturelist and descriptions of the Darwin Core standard [39]. Images can also be mosquitomorphology(e.g.http://mosquito-taxonomic- associated with publications through integration of this inventory.info/en/genus-emlutziaem-theobald-1903). workflow with the bibliographic module. Licences specifying how images can be used can be applied Forum topic through the Creative Commons module http://drupal. A forum topic is the initial post to a new discussion org/project/creativecommons_lite. Once uploaded, thread within a forum and is a core content type within images and their annotations can be browsed through Drupal.ThedefaultprofileofeachScratchpadincludesa in an image gallery. This is a contributed module that forum that users can customise. All users with login allows users to group and organise collections of authentication can initiate and contribute to a forum. uploaded images. Galleries can be hierarchically Fora can be linked to an e-mail account into which user arranged and include a weighting feature to control can receive posted messages. An example of an active theirplacementonapage.Thumbnailsofthefirstimage forum within the Scratchpads can be found on the within a gallery are dynamically created when an image Araceae Network Scratchpad (see http://scratchpad.cate- is uploaded. araceae.org/forum). Group iSpecies Cache Groups enable users to manage and control access to This is an administrative content type that holds copies collectionsofcontentbyaddingcontentorotherusersto of third party content drawn from external services. It is a group. This enables sub-communities to exist within a used as part of the taxon page display where content is Scratchpad site, allowing site members to self-organise dynamically created around terms held in a site around public or private topics of interest, such as the vocabulary (see below) and is hidden to non-adminis- production of scientific content for a research project. A trative users. Many third party web services serving groupiscreatedbyasinglegroupownerthathasspecial natural history data are fragile or slow to load. Caching permissions, including the ability to delete the group. this content reduces load time for users. The module is Group subscribers communicate amongst themselves named in homage to Roderic D.M. Page’s iSpecies Page8of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 website [40] that generates “on the fly” pages of species termites http://termites.myspecies.info/en/content/baye- information. sian-global-tree-termites. Location (DwC 1.2.1) Poll A location record conforms to Darwin Core 1.2.1 fields Polls are questions with a set of limited responses. [39] and can be used independently or in association Authorised users can create a poll and invite other users withthespecimenrecordcontenttype.Thisisintegrated to vote on the responses. Once created, polls automa- within the workflow for annotating specimen records. tically provide a running count of the number of votes Examples of Scratchpads that use this content type received foreach response. This is a Drupalcorecontent include sites on macrostomorph flatworms http:// type. macrostomorpha.info/ and flies http://diptera.myspe- cies.info/. The module allows users to specify a point Specimen (DwC 1.2.1) location by clicking on a Google Map and dragging the Specimen records are based on the Darwin Core version location marker. The interface also supports the textual 1.2.1fields[39].Thiscontenttypeallowsuserstorecord input of geolocative data matching the Darwin Core biological specimens through a tabbed workflow and is 1.2.1 standard, including information on elevation and linked with the location Darwin Core content type. An depth. exampleofasitethatusesthisfacilityisaScratchpadon Freeloader flies http://milichiidae.info/en/specimenlist. Newsletter issue This contributed module http://drupal.org/project/sim- Nodes plenews publishes and sends newsletters to lists of All content is stored in Drupal nodes which minimally subscribers. Newsletter issues are essentially e-mail have a unique URL, title, creator and last edited/created messages that are stored within the site and associated times, in addition to any supplementary fields defined with different newsletters through a list held by the by the content type. Each node is given a numeric taxonomy module. Users can edit this list to create new identifier that is addressable within the database and newsletters and users can subscribe to a newsletter formspartoftheURL.Thisidentifieralwayslinkstothe within their user account page. most recent version of a node if its content is modified. However,allpreviousversionsofcontentarestoredand Page canbeaccessedthroughapermanentURL.AURLaliasis A page is the simplest content type for creating and created fromthetitleofthenodethatis suppliedby the displaying information. Pages contain no predefined author.Thisalphanumericaliasisusuallymoreintuitive structure and are most appropriately used for content than the numeric node identifier and can be customised (i.e. text and images) that rarely change, such as an by authorised users. However, the numeric node “About us” section of a website. The main field of the identifier cannot be changed and is persistent and page can be optionally used with a WYSIWYG editor unique within a site. When appended to the sites provided through the contributed WYSIWYG module domain name this acts as a Globally Unique Identifier http://drupal.org/project/wysiwyg. This provides an API (GUID) that addresses node content. Optionally, each that supports third party editors as plugins, simplifying node can be tagged with one or more terms from a the administrative process of adding and changing the vocabulary allowing content to be searched and aggre- editor. Currently we use the TinyMCE editor [41] across gated through these terms. At the base of every node a the Scratchpads. range of options are presented to the user that include menu settings (to attach nodes to a menu block), Phylogenetic tree publishing options (to publish or unpublish a node, This module is based on Roderic D.M. Page’s tree promoteittothesitesfrontpageandallowittositatthe viewing widget (TVWidget) that displays very large top of a list), comment settings (to enable or disable phylogenetic trees to a constrained and predefined size. comments) and file attachments (to attach one or more The widget [42] has been wrapped within a Drupal files to the node). module called Tree and allows users to paste or upload a Newick tree description or Nexus file containing a All Drupal nodes can be translated through the Newickformattedphylogeny.Themodulecreatesapage contributed internationalisation package. This consists that displays the widget. Examples of very large of a collection of modules that provide a translation phylogenies displayed in Scratchpads through this interface for the creation of comprehensive multilingual module include trees on Dung beetles http://dungbee- sites including node content, taxonomies and menu tle.co.uk/en/content/adapted-monaghan-et-al-2007 and items. Working in conjunction with browser language Page9of16 (pagenumbernotforcitationpurposes) BMCBioinformatics2009,10(Suppl14):S6 http://www.biomedcentral.com/1471-2105/10/S14/S6 detection this will redirect users to content displayed in For example, bibliographic data are displayed in a table their preferred translation and includes a block for that can be sorted by author, year and publication title; language selection. At present only one site on stick images are displayed as a list of thumbnails; Wikipedia insects - the Phasmid Study Group http://phasmid- content is displayed as the first block of text before the study-group.org/en/ has made use of this facility. first header; and GBIF occurrence maps are simply embeddedwithintheview.UserscanalsocreateGoogle Map views displaying geolocative data with the GMaps Semantic enrichment (tagging) module. This allows nodes containing georeferenced Newormodifiednodesareautomaticallysearchedupon data to be filtered and plotted on a Google Map. By submissiontoidentifytermsthatmatchthosepresentin default Google Maps are created for all nodes with any vocabularies associated with the node content type. geolocative information, and all users who have speci- Ifoneormoretermsmatch,theuserispresentedwithan fied their location in their user profile. interface that enables them to select which term/s to associate with the node. These features are provided by A view of nodes and their associated fields is dynami- the autotag module http://drupal.org/project/autotag cally created for each content type (including user that supports terms from multiple independent voca- defined CCK content types) and works in conjunction bularies, even if the term name is common to two or withagridmatrixeditorwehavecreatedforspreadsheet- more vocabularies. Terms located in a node are pre- like data entry. This allows users to edit data en masse, selected in the interface to enable fast and efficient where each row represents a node (web page) and each tagging. The interface also facilitates tagging with terms column represents a field in the content type. Data in that are not present within the node. Term name auto- each field is validated according to the same controls completion, which includes references to the source that would be present if it were being edited within a vocabulary (i.e. classification), speeds up the input of single node. The matrix editor allows users to rapidly additional term name tags. edit large datasets in an intuitive spreadsheet like environment. Blocks and views Blocks and views are a means to create boxes of related/ grouped data that can be built to create aggregations of Taxon pages content from multiple node types. Blocks are normally Taxonomicnamesprovideacentrallinkbetweendiverse usedintheleftand/orrightsidebar(s)ofasiteandarea items of information about an organism. Given an core feature of the Drupal CMS. Views is a contributed organism’s scientific name a wide range of data can be modulethatactsasapowerfulquerybuilder.Thisallows drawn together, including content from third party users to fetch and present highly customised lists, tables databases that have a suitable API. Scratchpad taxon and other visualisations. Detailed descriptions of blocks pages are our attempt to allow users to dynamically and views can be found in the Drupal handbook http:// construct and curate pages of information (e.g. pheno- drupal.org/handbooks. Here we focus on how we have typic, genomic, images, specimens, geographic distribu- used these features within the Scratchpad project. tion)aboutanytaxonregardlessofthephysicallocation of the source data. These pages can include information TheScratchpadprofilespecifiesaseriesofblocksforthe contained within a Scratchpad and data drawn from left and right sidebars including a ClustrMap [43] that selected third party resources. showsthelocationofsitevisitors;a tabbed searchblock that allows users to switch between free text searches of Taxon pages are built around the biological classifica- the entire site, and filtered searches based on an auto tions that users have created within their sites, and can completed list of taxon names stored within the site be intuitively navigated through the taxon hierarchy vocabulary; an ‘about this site’ block providing brief block, created with the TinyTax module http://drupal. backgroundinformationsuppliedduringtheScratchpad org/project/tinytax. Because content may be inappropri- signup process; and a user log-on block. Once logged in ately tagged with a taxon name, especially if it is drawn thelog-onblockisreplacedwithanadministrativemenu from a third party source, taxon pages include curation and a ‘Create Content’ block providing direct links to tools allowing users to select which content to display, major site features. and how this is arranged on a page. This is achieved through an intuitive drag-and-drop interface that sup- The default data views created by the Scratchpad profile portsthecurationofthemajorcategoriesofinformation on creation of a site include views for each content type to be displayed, and the detailed content within these including data from third party sources. These are categories. By default, content is dynamically aggregated customised to the particular needs of each data source. on a taxon page in an order that represents the most Page10of16 (pagenumbernotforcitationpurposes)
Description: