How to Scale a Code in the Human Dimension Matthew J. Turk ([email protected]) Columbia University 3 1 0 This is a re-telling of a talk given at Scientific Software Days in December, 2012, at 2 the Texas Advanced Computing Center. The slides and video of this talk can be found n at http://scisoftdays.org/meetings/2012/. a J Abstract: As scientists’ needs for computational techniques and tools grow, 9 they cease to be supportable by software developed in isolation. In many cases, 2 these needs are being met by communities of practice, where software is developed by domain scientists to reach pragmatic goals and satisfy distinct and enumerable ] M scientific goals. We present techniques that have been successful in growing and engaging communities of practice, specifically in the yt and Enzo communities. I . h p - o r t s 1 Why “Community?” a velopingacommunityofpractice,wherethegoalsand [ technologyaredrivenbyactiveparticipants,wehave 1 Astrophysics, and particularly computational astro- been able to expand the class of astrophysical prob- v physics, is dominated by vertically-integrated,small- lems to which the technologies have been applied. 4 population researchcollaborations. The concept of a This has led not to consolidation of research inter- 6 community of researchers– sharing physics modules, ests, but rather a broadening, with improvements, 0 7 improvements to simulation codes, analysis tech- technology, and techniques being directly shared be- . niques,technology–issomewhatforeign. Infact,this tween working domain scientists. 1 ideaof“community”isoftenviewedasadetriment to In this paper, I outline the infrastructureand the 0 3 the individual researcher,rather than as a benefit to techniques with which we have cultivated these two 1 the field. I participatein twovibrant,activecommu- complementary, but distinct, communities and the : nitiesincomputationalastrophysics,thosesurround- various conscious decisions we have made to ensure v i ing the simulation code Enzo and the analysis code their growth and sustainability. X yt. Within these two communities, we have found a r somewhat surprising result: the empowerment of a the community does not come at the expense 2 Introduction: yt and Enzo of individual success. In fact, with actively shep- herded and cultivated community participation and The yt Project (http://yt-project.org/) is an processes,the opposite istrue: the bettermentofthe open source, community-developed analysis code for community comes at the enrichment of the individ- simulated astrophysical data. Largely written in ual. Python, Cython and C, it is parallelized using MPI In addition to the concrete, measurable benefits and OpenMP and has been used to analyze datasets we receive fromcommunity-focuseddevelopment, we whose size range from small (tens of megabytes) to havealsorealizedthatcommunitydevelopmentis large (tens of terabytes). yt is designed to abstract essential to the continued health of the field of out underlying technical aspects of simulation data computational astrophysics. By focusing on de- suchasfileformat,units,geometricconventions,and 1 parameterstorageinsuchawaythatisneutraltothe challenges as well as taken steps to directly address underlying simulation platforms. The flagship simu- these challenges. lationplatforms,whereweconducteddetailedtesting I am the original author of yt, having created and support all functionality, are are Enzo, FLASH, it during my graduate school career to analyze and Orion,Castro,Nyx, Piernik andNMSU-ART. Inthe visualize data created by the simulation code Enzo currentdevelopmentbranch(describedbrieflybelow) (http://enzo-project.org/),anadaptivemeshre- thishasbeenexpandedtoincludelimitedsupportfor finement simulation code. Enzo itself has undergone SPHandN-bodycodessuchasGadget,aswellasfull a number of changes overits years,both in the man- supportforOctreecodessuchasRAMSESandART. ner in which it is developed and the problems to ytprovidesalanguagefordescribingphysicalregions which it can be applied. Enzo began as a code- and applying data processing techniques to those re- base largely developed by a single individual, but gions: ratherthanfocusingonselectinggridpatches, through the stewardship of the Laboratory for Com- particles or octs and then masking out overlapping putational Astrophysics (LCA) it grew into a large, regions,itdevelopsconceptsofgeometricregions,re- open source (but mostly not community-based) de- gions defined by fluid quantities, and processes that velopmentproject. A direct consequence of both the transformthoseregionsintoquantitativevalues. ytis technology and the prevailing mindsets led to diver- bestthoughtofnotasanapplication,butasalibrary gent lines of “public” and “internal” development, for asking and answering questions about data. This resulting in a highly fragmented community of users aspectof ytnaturallyencouragestechnicalcontribu- and developers and many different branches of the tions, as every user of the software package typically code itself. Over the course of a few years in 2009 writes analysis code that builds upon its underlying and 2010, through concerted efforts from the Enzo machinery. Although its missionhas expanded in re- community,ittransitionedfromacloseddevelopment cent years to include the application of microphysi- model with periodic open releases into a fully-open, cal solvers, standardized input/output data formats, transparent development model based around con- creating initial conditions, and even studying data tributions from community members. This included fromearthquakesimulations,atitscoreytremainsa resolving many divergent code bases and produced tool for interrogatingastrophysicaldata and answer- the Enzo 2.0 release. The code is now developed us- ing questions about the underlying physics of that ing many best practices of software development in- data. cluding testing, version control, peer review, infras- tructuredesignandinvestmentofstakeholdersinthe The defining characteristic of yt is not that it developmentroadmap. Thishasresultedincontribu- is written in Python, or the operations it does, but tions of both physics modules and infrastructure im- ratherthatitissupportedbyaparticipatorycommu- provements,and has even brought to light bugs that nity of scientists interested in both using and devel- have subsequently been fixed by community mem- oping it. The yt community draws its members pri- bers. This transition,fromsemi-opento community- marilyfromcomputationalastrophysicists: individu- driven, has brought an energy and excitement to the als who conduct and analyze simulations. The first development of Enzo that seems likely to sustain it version was written in late 2006 by and for a single for years to come. Many of the items discussed be- scientist,butoverthelastseveralyearshasattracted low,about how to shape and foster community, have 170 subscribers to the general mailing list and 50 for been applied to the Enzo community as well as yt. the development mailing list; in 2012 it was identi- fiedasone ofthe mosthighly usedcodes onthe NSF NICS analysis machine Nautilus [Szczepanski et al., 3 What is “Community”? 2012]. Over the course of its history, nearly 40 peo- plehavecontributedchangesets,rangingfromtinyto Open Source is often used as a synonym for com- very large, and the mailing lists are relatively active, munity development, but in practice they are better averaging between one and four messages a day. In thoughtofastwodifferent,butoverlapping,modesof 2012weheldourfirstworkshopattheFLASHcenter development. The growthof a community, where in- inChicago,andwe areholdinga secondworkshopin dividuals participate in discussions,reportproblems, March of 2013 at UC Santa Cruz. Despite this rela- provideenhancements,andsupportothercommunity tively high level of activity for a project in computa- members, is neither necessitated by software being tional astrophysics, the community has faced several open source, nor is it a foregone conclusion for open 2 source software. namics, gravity, and so on. But as the simulation For scientific software, this situation is slightly codebecomeslarger,moregeneral,andmoremature, more complex, as the adoption of open source meth- it becomes unwieldy for a single person to manage ods are often met with resistance. In addition to and develop all aspect of a code. Within Enzo we this, members of a given community related to soft- haveseenthisasthecodehasgrowntoincludemany ware are just as likely – if not more likely – to be differenthydrodynamicmodules(includingmagneto- workingonsimilarprojects,competingformindshare hydrodynamics),chemistrymodules,radiativetrans- among the academic public, and even competing for port, star particle implementations, and parallelism funding or jobs. Building communities that are able strategies. Theoverallgeneralizationofthecodebase to thrive despite these barriers can result in con- has resulted in a deep specialization in some areasof siderable, non-local benefits: the aphorism “a ris- the code. By investing in a community of individ- ing tide lifts all ships” is nowhere more true than uals, the overall management costs are reduced as in community-driven scientific codes. Even the pres- individuals specialize in different subsections of the ence of other, engaged community members means code. This pipeline for conducting research using a that there are more people able to answer questions code such as Enzo shares many characteristics with from newcomers and provide assistance and energy observational research; the data is constructed using toward solving problems. the simulation platform, relies on many pieces of di- verse infrastructure such as IO libraries, libraries for The concept of openness in science amongst aca- parallelismand librariesfor solvingdifferential equa- demics is something of a paradox. Often, utilizing tions, and is then passed on to the analysis platform commodity tools is viewed as a very positive trait, which processes the data to produce results. whereasreleasingsoftwareisoccasionallyviewedwith skepticism. While thisis changing,andinsomeways The third objection enumerated above is some- quiterapidly,theideaofsourcecodebeingsharedbe- what more subtle. The implicit question is not, tweenpotentialcompetitorsisstilloftenseenasdan- “What if someone finds a bug?” but really “What gerous or even anathema to scientific progress. This if someone sees my work and is able to show that typically breaks along three primary objections:1 it is flawed?” This objection is the most challeng- 1. Why should I give up my competitive advan- ing, as it neatly aligns with the conflict between per- tage? sonal advancement and the collective advancement 2. How can I manage supporting a code? of science. The glib, simple solution to this would 3. What if someone finds a bug in my work? be to prioritize an incremental, layered approach to science rather than preserving the professional bene- Answeringthese questions in detail is beyondthe fits of previous, potentially incorrect results. In this scope of this document. But, a few comments can be made. The first two objections to releasing approach,“bugs” and flaws in results are not hidden code are directlyaddressed bya communityof from view and defended from exposure, but rather practice. As noted above, the benefits from collab- found and corrected. Unfortunately, such prioriti- zation is a subtle form of the prisoner’s dilemma: orative, participatory communities can alleviate the it is maximally successful when all members of the burdens of support as well as provide advantages to communityparticipateinrevealingshortcomingsand scientificinquirythatwouldotherwisenotbepresent. problems. The personal solution is somewhat more InboththeEnzoandytcommunities,asthecommu- subtle,andnotentirelyclear. Fromapragmaticper- nity has scaledin size, with it has scaledthe number spective,identificationofbugsenableshigher-quality, of eager, helpful participants in contributing mod- longer-lasting results, less likely to be overturned by ules (which are then shared, collectively) as well as subsequent investigations by competitors. provide supportfor problems and issues on the mail- ing list. When discussing these aspects of competi- As computational science matures, and as com- tive advantage and support, it is also worthwhile to putationaltechniquespermeateeveryaspectofscien- frame the discussion in terms of generalization and tificinquiry,itisnaturalthatthe softwareutilizedin specialization. As an example, in the early stages of scientificinquirygrowsmorecomplex. Scientificsoft- developing a simulation code, it’s entirely reasonable wareinmanyrespects,particularlyinastrophysics,is that a single individual can manage every aspect of thoughtofassomething ofasecond-classcitizen– in the code: the IO framework, parallelism, hydrody- yearspast,theconceptofscientificsoftwarebeingde- 1Forabroaderandmorequantitative study,seeStodden [2010] 3 veloped in isolation, placed on a website and largely all – whether that be contributing code, documen- abandonedbecamepervasive. Thismethodologysim- tation or even reporting bugs. This cognitive block plywillnotscalewiththecomplexityofprojectsnec- comes from a fundamental misunderstanding of how essary for modern scientific inquiry; we have reached scientific codes are developed,one that we have been the ageofadvancedalgorithmsbeing appliedinnon- attempting to remedy within our own ranks. The trivialwaystocomplex,physically-richdatasets. The traditional view of open source scientific code devel- cyberinfrastructure necessary to address problems in opmenthasbeenthatoftwogroups–developersand computationalscienceisnolongertractablysolvedby users. This segregation of individuals results in an individualsworkinginisolation–communityprojects implicit, but difficult to overcome, feeling of bound- must become the new norm. aries between responsibilities. Often responsibilities such as verificationof results, inspecting results, and Part of the reason that communities developed tracking (and understanding) modifications to the around scientific codes can be beneficial is also a code base are left to “developers.” Unfortunately, component of the challenges within these communi- all of these responsibilities are essential components ties. An active community with participants shar- of the scientific method! Segregating responsibilities ing enhancements, features, and assistance relies on inthis wayleadstomisunderstandingsaboutthe na- the participants developing those enhancements and ture ofscientific codes,many ofwhichareconstantly understanding the code base well enough to provide under development and improvement, and the na- assistance to less experienced individuals. Scientific ture of the results that they can produce. This is code development is strongly driven by pragmatic, even evidenced in simple things such as using the short-term needs of scientific inquiry; as a result, word “user” to describe community members, fur- most improvements are motivated by the next pa- ther emphasizing a distinction between people who per, or the next talk, or advancing a particular line “use” the code and people who “make” the code. ofinquiry likely to resultin a publicationandthe af- In Pawlik et al. [2012], the authors thoughtfully de- filiated respect from peers for solving a challenging scribe the role of scientists in softwarenot as a black scientific problem. The common argument against andwhite,userordeveloperdistinction,butratheras sharingadvancements,techniquesandtoolsisthatit acontinuumofgreys. Often, the self-applicationofa undercuts competitive advantage. This form of in- term like “user,” with its connotations, results from dividualism results not only in researchers refraining self-assigned roles, or perceptions of individual abil- from sharing, but more importantly, prevents them ities. As peers in a global scientific community, the from benefiting from the work of others. A strong distinction between “users” and “developers” community of sharing results in a stronger technical is actively harmful, when in reality scientists are code base,even if it has the side effect of a moderate tasked with occupying that grey area in the middle. reduction in perceived intellectual priority for indi- vidual researchers. If I share a technique that can In both the yt and Enzo communities, we have be applied to Population III star formation simula- seen this behavior play out numerous times. Those tions, I can no longer be the only one to apply that members of the community who have been the most technique – and thus the only person who can an- giving of enhancements, assistance, and technology swer that type of problem. I lose a small amount of (scripts, modules, bug fixes) are typically those who intellectual priority. However, I am now able to re- are able to best take advantage of the technology to ceive improvements to the technique, suggestions for askcomplexquestionsaboutthephysicalworld. Fur- further enhancements, and am thus able to extend thermore, the process of opening up a module and my scientific inquiry further than before. The most contributing it to an open source software commu- importantaspectofthisisthataltruisticgoals(shar- nity provides the opportunity for receiving enhance- ing,reproducibility,openness)alignexactlywithnon- ments in functionality, bug reports, and synergistic altruistic goals. It does not matter if I share because applications of that module. I hope to gain something in return or because I be- In the yt project, there are a number of individ- lieve in openness; the end result is that the health of uals who are the primary contributors and a larger the community has been improved by participation. number of individuals who have provided occasional The strongest bias we have seen in yt is not bug fixes or individual, isolated modules. The influ- against releasing of code or contributing back, but ence of non-developing community members on code againstthe notion of conducting any development at changes, roadmap efforts, and development method- 4 ology is not only allowed, but encouraged and so- to computational astrophysics, but is also seen fre- licited. Furthermore,weactivelysolicitcontributions quently within the instrumentation community. A in the form of scripts, modules, and enhancements common pattern seen within yt is a spirited discus- or modifications to the yt codebase. This initiative sionofidealisticgoalsfortheproject,butthenpress- has resulted in several positive effects, all of which ing deadlines and other local concerns typically tem- have strengthened yt as a community, independent per enthusiasm and execution. I myself am not only of their effects onyt as software. Individuals who do guilty ofthis, but perhaps the mostcommoninstiga- not feel motivated to or capable of contributing in a tor! technical sense are more likely to contribute through The challenge to community building I find the helping other users, providing documentation, and mostworrisomeisthatpresentedbytheso-called“ci- even networking and advertising the project. But tation economy” in astrophysics. The influence of a the more prominent effect is a sense of investment in given piece of code is often measured by citations to theproject. Thissenseofinvestmentresultsinquan- a method paper, which is by definition a fixed and tifiable results (such as more mailing list activity, a archived document. In 2011, a method paper for yt largernumber of commits) as well as results that are was published. Enzo’s method paper is still under more difficult to quantify, such as positive feelings, preparation, but papers in 1997 and 2004 are often friendly discourse, excitement, word of mouth, and cited as references for Enzo. Because ADS provides so on. reverse-indexingcapabilities, the number of citations to these papers provides an immediate method of gauging the influence of the software that they de- 4 Challenges scribe. Intheinterveningtimebetweenpublicationof the methodpapers,new contributorshavejoinedthe Even beyond these concerns, community building collaborations and contributed substantial and non- can be challenging within the confines of the tra- trivialenhancementstothecodebase. Infact,thisis ditional realm of academia. The reward struc- notonlywhathasalreadyhappened, butanexplicit, ture in astrophysical academia, for funding, jobs named goal of the future of these two code bases! and mindshare, is highly-correlated with influence, Thisresultsinanunfortunateconflictofinterestsbe- high-impact papers, and citation counts[see also tween new contributors and existing contributors. It Howison and Herbsleb, 2011]. Often developing soft- is in the best interests of existing contributors, who ware, or participating in academic communities, is contributedto the code base prior to the publication seen as orthogonal to these goals, even though this of a method paper, to consolidate citations in the is not necessarily true. In light of this, soliciting de- canonical paper; unfortunately, it is in the best in- velopment contributions is particularly difficult. De- terests of new contributors to publish a new method veloping a tool in support of a publication (for in- paper, so that they can begin to “collect” citations. stance, a module for yt) is seen as a worthwhile use In whose interest is it to contribute to a code base of time. Despite the benefits that come from making without rewards mediated through citations, one of that tool available, the time needed to “clean it up” the most common metrics of success in academia? I and provide a modicum of support is not as immedi- believe this is among the greatest challenges to com- ately beneficial, as it does not result in an additional munitydevelopedcodesinastrophysics,onethatwill publication, additional citations, and so on. Often, continue to grow in importance as software projects those contributions that would be the most benefi- inevitably scale beyond a handful of contributors2. cialtotheprojectasawholearethemostdifficultto motivate. These secondary, yet important, goals in- clude improvements to documentation (particularly 5 Strategies narrative, instructive documentation), testing of in- dividual components (as opposed to large-scale in- Notallofthechallengesenumeratedabovecanbead- tegration testing), and infrastructure improvements dressed directly. However, in both the yt and Enzo suchasoptimizationandmaintainabilityrefactoring. communities, we have addressed the technical and As a result, tool builders are not always favored by social challenges through conscious development of the academic reward structure. This is not unique infrastructure and techniques. These have been de- 2Alternatemetricsofsoftwarecitationarebeingexploredby,e.g. theASCL[Allenetal.,2012],buttheyprimarilyaddress discoverabilityofcodes anddonotattempttoaddresstheissueofdistributingcredittocontributors. 5 signed to foster good will, encourage contributions, changes upstream. The unique versioning of lo- and build social capital between contributors. Be- cal copies of a code base also allow unique revision low, I list several steps we have taken over the last specifiers to be applied globally; rather than “ver- severalyearstowardthegoalofgrowingandengaging sion 4.2 with modifications (unspecified)” code that 3 communities of practice. generates a given result can be uniquely identified Mostimportant,however,theseneedtobeviewed with a globally-unique hash. Both yt and Enzo as the investment they are. These strategies, partic- are hosted on BitBucket, a code hosting provider ularly the social strategies, require not only focused (roughly functionally equivalent with others such as energy and time, but an emotional investment. You RhodeCode, GitHub, Gitorious and Savannah). It must design the community you want. This provides technical infrastructure for code review, ac- design extends far beyond designing software and al- cepting changes, tracking contributions and identi- gorithms; it includes thinking about the diversity of fying specific revisions of the code. By using this contributors and community members, the tone of systemwelowerthebarriertoentryforcontributors, discoursewithinthecommunity,theprojectedenthu- enablingsubmissionoflocally-developedchanges. By siasm within the project, and even the congeniality usingDVCS,weencouragescientistsusingytorEnzo withwhichfeedback–especiallycriticalfeedback–is to track their own changes to it. Even if this code is received. Thecultureseededinacommunitywillself- never submitted for inclusion in the primary reposi- propagate;whether this is a culture ofneglect, a cul- tory,thisenablesprovenancetrackinganddebugging. ture of homogeneity, a culture of kindness, a culture Development of scientific software also requires ofbrasharroganceorevenacultureofopenness,this several difference degrees of communication; as the willflowoutwardfromthecore[Bacon,2009,Trapani, software itself is not the only project contributors 2011, Allsopp, 2012]. Both Enzo and yt have at- areworkingon,wehavefoundthatcommunicationis tempted to build cultures that promote respect and naturally divided into three categories, based on la- excitement. tency and urgency. Immediate communication, such as in-person meetings (or “code sprints”) or Google Hangouts,areusefulforlow-latencyplanning,discus- 5.1 Technical Infrastructure sion,andcollaboration. Tools suchas InternetRelay Chat(IRC)provideameansofmedium-latencycom- WithinbothytandEnzo,weutilizeseveralpiecesof munication;ofteninthe#ytchannelonFreeNode,an technicalinfrastructurethateasetheprocessofgrow- IRCnetwork,between10and20peoplecanbefound ingcommunities[seealsoFogel,2005]. Theseinclude atanytime. Messageslefthereareusuallyresponded open and freely-joinable mailing lists, a completely torelativelyquickly,whichleadstoalow-latencydis- open development process based on the distributed cussion where problems can be resolved much more versioncontrolsystem(DVCS)mercurial,andacode quickly than through email. Finally, nearly all plan- reviewandmentoringprocesstoensurecontributions ning and detailed discussion happens through high- are high-quality, with a minimum of friction. Fur- latency mediums such as comments on code changes thermore, we utilize methods of communication that andmailing lists. The mailing lists for yt (which are enable participation and that are tuned for low- to titled the unfortunately-chosennames yt-usersand high-latency interaction. yt-dev)areopen,indexedbyGoogle,andfreelyjoin- Bothyt andEnzohaveseenanenormousgrowth able. Wehaveoften(successfully)encourageddiscus- in contributions following migration to DVCS; this sions here to turn into collaborations and code shar- is not unique to these projects, but is a hallmark of ing, and in fact we have had a number of success- the lowered barrier to entry presented by DVCS. In fulopportunities for long-termplanning andinvasive contrast to centralized version control systems such code changes here. They serve not only as broadcast as CVS and Subversion, DVCS systems are fully- media, but as venues for soliciting input and contri- distributed. Every clone, or checkout, of a reposi- butions. tory is a peer with every other clone; this enables anyone who checks out a repository to track their Nearly as important as providing mechanisms own changes to the repository, and provides the for communication is reducing the barrier to en- (technical) ability to much more easily contribute try for new contributors. This includes ensuring a 3A relatedworkisPrli´candProcter [2012] wherethe authors have compileda setof veryusefuland helpful guidelines for scientificsoftwaredevelopment. 6 smoothprocesstofindingtheappropriatelocationin techniques described in the book to software engi- the source code, making changes, submitting those neers, in many ways they can be applied directly changesforreview,andakind,thoughtfulmentoring to collaborations around scientific software projects. process for new code contributors. The most com- The authors enumerate three characteristics which mon path we have seen for new code contributors is they suggest applying to all communication and in- relatively straightforward: teraction: 1. Individual applies software package to meet • Humility own goals • Respect 2. Individual develops a modification to the code, • Trust for either an enhancement, a bug fix, or a con- Communicationbytext,inparticularabouttech- tribution of an example nical topics, is often stripped of the nuances and 3. This change is submitted for inclusion in the inflections that convey emotion. In textual media, main yt repository therefore, it is even more important to guide discus- 4. The change is read, reviewed and tested by sion in a way that encourages participation. Within community members to ensure that standards the ytcommunity, weareverycarefulto ensure that for coding, documentation and testing are all the tonesonthe mailing list,code reviewandinIRC met. shouldbeconductedinthisway: thehealthofacom- 5. Theindividualbeginstoparticipateinthecom- munitycanbejudgedbyhowittreatsnewcomersand munity novicesaswellashowittreatsexperienced,advanced The main codebase is not suitable for contribu- participants. Even in framing analysis interactions tions of all types; we have found individuals ea- betweenpeersinthisway,onecanseehowcommuni- ger to contribute scripts that exist as supplemen- cationcanservethegrowthofscientificcommunities. tal tools, or that were used in the writing of a pa- Inscientific softwarecommunities, an aspectthat of- per. To support this desire, we provide a location ten goes unmentioned is that we arguably want to (https://hub.yt-project.org/) to submit these foster discussions between peers, not discussions be- scripts as well, where links to external source code tweenanelite classandasubservientclass. By guid- repositoriesarecollectedalongsidedescriptionsofthe ing the discussion with a focus on humility, respect individual projects. In this way, by providing mech- and trust (HRT as abbreviated by Fitzpatrick and anisms and structures, as well as an open and active Collins-Sussman), discussions can become more con- solicitationfortechnicalcontributions,wehavefound genial, more productive, and can lead to a greater that we can foster a sense of community and accom- spirit of collective focus. plishment among individuals. We have found that even minor barriers to con- Humility can be well-characterized in how a tribution (or even software deployment!) can build communityrespondsto negativefeedback. Ina com- up a “technicalfriction” to participationthat results munity in which I participate, I witnessed a dis- in losing contributions and community members. In cussion between an experienced community member an effort to alleviate this, we provide easy access to and a newcomer. The newcomer reported what they thesourcecode,detailedinstructionsforcontributing thought was a bug in a particular routine. The ex- code, a mechanism for communication, and scripts perienced community member, who had made the that assist with the entire process. We have found change that introduced the potential bug, responded that by ensuring that we have a uniform review pro- with a very abrupt, dismissive response. “It’s like cess (where even code written by founding members that for a very good reason. Don’t touch it.” By of the community is reviewed) we ensure that the terminatingdiscussionatthatpoint,itsentthe clear code is of acceptable quality, tests are included, and signalthatadiscussionofthereasoningbehindacode that undocumented code is discouraged. change was simply not necessary; the message was notonlythatthereasoningwasbeyondreproach,but that a discussion was not worth the time it took to 5.2 Social Techniques have. Thebettersolution,ofanopenandhumbledis- In Fitzpatrick and Collins-Sussman [2012] the au- cussion of the background and reasoning behind the thorsdescribeastrategyforsocialinteractionsamong decision, would have been to engage the newcomer so-called “geek” teams. I will not provide a detailed and explain the reasoning behind a decision. This retelling of this, but despite focus on applying the takes an investment of time, but certainly no more 7 than the investment of time it would take to justify gard external contributions in a positive way, such a decision to a referee or journal reviewer, and rather than as threats to their accomplish- the potential gains in this case would be a new con- ments. tributorandanincreaseinsocialcapitalamongstthe HRT are not always easy attributes to focus on. community. Particularly in a competitive field such as astro- Respect can be seen in how peers contribute to physics, it can be difficult to spend time thinking about how words or conversations will be perceived, the development of the code. In principle, when de- particularly when these conversations are between velopingascientificcode,weareengaginginscientific potential competitors. But by remaining focused discourse,attemptingtopushthefieldofastrophysics on engaging community members as peers, treating forward in both our understanding of and our abil- them with HRT and spending time and energy on ity to ask questions of the natural world. One of the thoughtful communication, the community will be most common ways I’ve seen this ignored is when a more likely to grow and flourish. personwritestoamailinglistorappearsonIRCand sayssomevariationof“I’venoticedsomethingisact- ing strangely with ...” The all-too-common response 6 Conclusions is simply, “You’re doing it wrong,” or even worse, a dismissive instruction to “Just read the documenta- As computationalscience has grownboth more com- tion.” This is rarely even prefaced with the obvious plexandmorepervasive,communitiesofpracticesur- question of, “How did you expect it to act?” It ter- rounding scientific softwareprojects have become es- minates discussion and discourages individuals from sential to the health and vibrancy of computational returning. This has the direct effect of reducing par- scienceprojects. Unfortunately,theacademicreward ticipation, and disproportionately impacts new con- systemdoesnotalwaysalignwiththedevelopmentof tributors. A lack of respect can infect and transform softwareprojects,whichcanleadtostagnationorcor- a community into an unfriendly, unwelcoming envi- nercuttinginimportanttaskssuchasinfrastructure, ronment; the most valuable attribute a community documentationandtesting. Thislargelyresultsfrom of scientific software developers has is its ability to the overlap between the traditional roles of develop- engage in scientific practice, and that is obstructed ers and users in software development, a distinction by a lack of respect. that can in fact actively harm the scientific process. Trust within scientific software communities is To grow communities around any software perhaps more subtle. What we have seen with yt project, the structure and character of the commu- and Enzo is not that trust is an unwavering faith in nityitself mustbe carefullyconsidered;youmustde- someonetoproducehigh-quality,scientificallycorrect sign this community. This includes setting the tone code, but a trust in the process and the stewardship of discourse, consciously fostering a diverse set of of a code. Within communities of scientific software, participants, and being open and enthusiastic. This we have seen this evidenced when contributorslet go processcan be assisted with technical infrastructure. of a project. As a community matures, individuals This includes using distributed version control sys- cannotcontributetothebreadthofsub-projectsthat tems, conducting open communicationin media that theydidwhenitwasyoung;furthermore,asscientific suitthestyleofcommunication,andstreamliningthe interests shift, the leader of an individual project or participation process and providing a clear method aspect of the scientific code may move on to other of contributing. Technical infrastructure alone can- things. Trusting that the community will be able to not “solve” the development of a community; social steward and shepherd that project is a crucial ingre- infrastructureandstandardsofconductmustalsobe dient in addressing sustainability and burnout, and developed. Conducting business while remaining fo- an essential ingredient in expressing trust for peers cused on humility, respect and trust can help to en- and other community members. sure that newcomers feel welcomed, existing contrib- The motion of projects between individuals is a utorsandpeersfeelvalidatedandrespected,andthat difficult social situation. We attempt to address this ultimately as interests change over time projects do in yt by emphasizing pride over ownership. By notstagnateundertheweightofcodeorprojectown- changing the conversation from dominion to ership. stewardship, this changes how individuals ap- A vibrant, active community brings sustainable proach projects, and helps enable them to re- development, synergistic applications and develop- 8 ment of code, and considerable rewards in potential and N. P. F. Lorente, editors, Astronomical Data collaborations and discourse. I am grateful for the Analysis Software and Systems XXI, volume 461 opportunity to have participated in both the yt and of Astronomical Society of the Pacific Conference Enzo communities, and grateful for the concrete re- Series, page 627, September 2012. wards–scientific,socialandtechnical–thatmypar- John Allsopp. The proof of the ticipation has allowed me. pudding, November 2012. URL As a closing note, despite all of the successes we http://www.webdirections.org/blog/the-proof-of-the-pudding have seen in the yt and Enzo communities, we still have considerable roomto grow from both the social Jono Bacon. The Art of Community. Building the and technical standpoints. I hope that as communi- New Age of Participation. 2009. ties drivenboth bylong-reachinggoalsandthe prag- Brian W. Fitzpatrick and Ben Collins-Sussman. matic needs of scientific inquiry, we can continue to Teamgeek: asoftwaredeveloper’sguidetoworking remain healthy, vibrant and growing. I believe that well with others. O’Reilly, Sebastopol. CA, 2012. the single biggest problem within scientific software ISBN 978-1-449-30244-3. communitiesisensuringcreditforcommunitypartic- ipation can be shared with new contributors. This KarlFogel. Producing OpenSourceSoftware: Howto affects motivations at essentially every level, and be- Run a Successful Free Software Project. O’Reilly cause of its very nature can dramatically affect the Media, Inc., October 2005. ISBN 0596007590. health and stunt the growth of scientific software communities. James Howison and James D. Herbsleb. Scientific software production: incentives and collaboration. In Pamela J. Hinds, John C. Tang, Jian Wang, Acknowledgments Jakob E. Bardram, and Nicolas Ducheneaut, ed- itors, CSCW, pages 513–522. ACM, 2011. ISBN I would like to thank the organizers of the Texas 978-1-4503-0556-3. Scientific SoftwareDays conference,Victor Eijkhout, A. Pawlik, J. Segal, and M. Petre. Documentation SergeyFomel, Andy R.Terrel,andMichaelTobisfor practices in scientific software development. In a wonderful conference full of excellent talks. I’d Cooperative and Human Aspects of Software En- also like to thank the other attendees and speakers gineering (CHASE), 2012 5th International Work- for fascinating and enlightening conversations about shop on, pages 113 –119, june 2012. doi: 10.1109/ scientific software. I am supported by an NSF CI CHASE.2012.6223004. TraCSfellowship(OCI-1048505). Anyopinions,find- ings, and conclusions or recommendations expressed Andreas Prli´c and James B. Procter. Ten sim- inthispublicationaremyownanddonotnecessarily ple rules for the open development of scientific reflect the views of the National Science Foundation. software. PLoS Computational Biology, 8 I would also like to thank the members of both (12):e1002802+, December 2012. ISSN 1553- the yt and Enzo communities. Much of this talk – 7358. doi: 10.1371/journal.pcbi.1002802. URL andapproachtocommunitybuilding–wasdeveloped http://dx.doi.org/10.1371/journal.pcbi.1002802. incollaborationwiththem. I’mgratefulforfeedback Victoria Stodden. The scientific method in prac- providedby NathanGoldbaumandBrianO’Shea on tice: Reproducibility in the computational sci- a draft of this manuscript. In addition, I would like ences. Technical Report Working Paper 4773-10, to thank Britton Smith and Greg Bryanboth for ex- MIT SloanSchoolof Management,February 2010. tremelyvaluablefeedbackinpreparingthispaperand URL http://ssrn.com/abstract=1550193. forleadingbyexampleincommunitydevelopmentfor scientific codes. A. Szczepanski, T. Baer, Y. Mack, J. Huang, and S. Ahern. Usage of a hpc data analysis and visu- alizationsystem. Computer, PP(99):1,2012. ISSN References 0018-9162. doi: 10.1109/MC.2012.192. A. Allen, P. Teuben, R. J. Nemiroff, and L. Shamir. Gina Trapani. Your community is Practices in Code Discoverability: Astrophysics your best feature, April 2011. URL Source Code Library. In P. Ballester, D. Egret, http://smarterware.org/7819/my-codeconf-talk-your-communit 9