Table Of ContentHow to Scale a Code in the Human Dimension
Matthew J. Turk (matthewturk@gmail.com)
Columbia University
3
1
0 This is a re-telling of a talk given at Scientific Software Days in December, 2012, at
2 the Texas Advanced Computing Center. The slides and video of this talk can be found
n at http://scisoftdays.org/meetings/2012/.
a
J
Abstract: As scientists’ needs for computational techniques and tools grow,
9 they cease to be supportable by software developed in isolation. In many cases,
2
these needs are being met by communities of practice, where software is developed
by domain scientists to reach pragmatic goals and satisfy distinct and enumerable
]
M scientific goals. We present techniques that have been successful in growing and
engaging communities of practice, specifically in the yt and Enzo communities.
I
.
h
p
-
o
r
t
s
1 Why “Community?”
a velopingacommunityofpractice,wherethegoalsand
[
technologyaredrivenbyactiveparticipants,wehave
1 Astrophysics, and particularly computational astro- been able to expand the class of astrophysical prob-
v physics, is dominated by vertically-integrated,small- lems to which the technologies have been applied.
4
population researchcollaborations. The concept of a This has led not to consolidation of research inter-
6
community of researchers– sharing physics modules, ests, but rather a broadening, with improvements,
0
7 improvements to simulation codes, analysis tech- technology, and techniques being directly shared be-
. niques,technology–issomewhatforeign. Infact,this tween working domain scientists.
1
ideaof“community”isoftenviewedasadetriment to In this paper, I outline the infrastructureand the
0
3 the individual researcher,rather than as a benefit to techniques with which we have cultivated these two
1 the field. I participatein twovibrant,activecommu- complementary, but distinct, communities and the
: nitiesincomputationalastrophysics,thosesurround- various conscious decisions we have made to ensure
v
i ing the simulation code Enzo and the analysis code their growth and sustainability.
X
yt. Within these two communities, we have found a
r somewhat surprising result: the empowerment of
a
the community does not come at the expense 2 Introduction: yt and Enzo
of individual success. In fact, with actively shep-
herded and cultivated community participation and The yt Project (http://yt-project.org/) is an
processes,the opposite istrue: the bettermentofthe open source, community-developed analysis code for
community comes at the enrichment of the individ- simulated astrophysical data. Largely written in
ual. Python, Cython and C, it is parallelized using MPI
In addition to the concrete, measurable benefits and OpenMP and has been used to analyze datasets
we receive fromcommunity-focuseddevelopment, we whose size range from small (tens of megabytes) to
havealsorealizedthatcommunitydevelopmentis large (tens of terabytes). yt is designed to abstract
essential to the continued health of the field of out underlying technical aspects of simulation data
computational astrophysics. By focusing on de- suchasfileformat,units,geometricconventions,and
1
parameterstorageinsuchawaythatisneutraltothe challenges as well as taken steps to directly address
underlying simulation platforms. The flagship simu- these challenges.
lationplatforms,whereweconducteddetailedtesting I am the original author of yt, having created
and support all functionality, are are Enzo, FLASH, it during my graduate school career to analyze and
Orion,Castro,Nyx, Piernik andNMSU-ART. Inthe visualize data created by the simulation code Enzo
currentdevelopmentbranch(describedbrieflybelow) (http://enzo-project.org/),anadaptivemeshre-
thishasbeenexpandedtoincludelimitedsupportfor finement simulation code. Enzo itself has undergone
SPHandN-bodycodessuchasGadget,aswellasfull a number of changes overits years,both in the man-
supportforOctreecodessuchasRAMSESandART. ner in which it is developed and the problems to
ytprovidesalanguagefordescribingphysicalregions which it can be applied. Enzo began as a code-
and applying data processing techniques to those re- base largely developed by a single individual, but
gions: ratherthanfocusingonselectinggridpatches, through the stewardship of the Laboratory for Com-
particles or octs and then masking out overlapping putational Astrophysics (LCA) it grew into a large,
regions,itdevelopsconceptsofgeometricregions,re- open source (but mostly not community-based) de-
gions defined by fluid quantities, and processes that velopmentproject. A direct consequence of both the
transformthoseregionsintoquantitativevalues. ytis technology and the prevailing mindsets led to diver-
bestthoughtofnotasanapplication,butasalibrary gent lines of “public” and “internal” development,
for asking and answering questions about data. This resulting in a highly fragmented community of users
aspectof ytnaturallyencouragestechnicalcontribu- and developers and many different branches of the
tions, as every user of the software package typically code itself. Over the course of a few years in 2009
writes analysis code that builds upon its underlying and 2010, through concerted efforts from the Enzo
machinery. Although its missionhas expanded in re- community,ittransitionedfromacloseddevelopment
cent years to include the application of microphysi- model with periodic open releases into a fully-open,
cal solvers, standardized input/output data formats, transparent development model based around con-
creating initial conditions, and even studying data tributions from community members. This included
fromearthquakesimulations,atitscoreytremainsa resolving many divergent code bases and produced
tool for interrogatingastrophysicaldata and answer- the Enzo 2.0 release. The code is now developed us-
ing questions about the underlying physics of that ing many best practices of software development in-
data. cluding testing, version control, peer review, infras-
tructuredesignandinvestmentofstakeholdersinthe
The defining characteristic of yt is not that it developmentroadmap. Thishasresultedincontribu-
is written in Python, or the operations it does, but tions of both physics modules and infrastructure im-
ratherthatitissupportedbyaparticipatorycommu- provements,and has even brought to light bugs that
nity of scientists interested in both using and devel- have subsequently been fixed by community mem-
oping it. The yt community draws its members pri- bers. This transition,fromsemi-opento community-
marilyfromcomputationalastrophysicists: individu- driven, has brought an energy and excitement to the
als who conduct and analyze simulations. The first development of Enzo that seems likely to sustain it
version was written in late 2006 by and for a single for years to come. Many of the items discussed be-
scientist,butoverthelastseveralyearshasattracted low,about how to shape and foster community, have
170 subscribers to the general mailing list and 50 for been applied to the Enzo community as well as yt.
the development mailing list; in 2012 it was identi-
fiedasone ofthe mosthighly usedcodes onthe NSF
NICS analysis machine Nautilus [Szczepanski et al., 3 What is “Community”?
2012]. Over the course of its history, nearly 40 peo-
plehavecontributedchangesets,rangingfromtinyto Open Source is often used as a synonym for com-
very large, and the mailing lists are relatively active, munity development, but in practice they are better
averaging between one and four messages a day. In thoughtofastwodifferent,butoverlapping,modesof
2012weheldourfirstworkshopattheFLASHcenter development. The growthof a community, where in-
inChicago,andwe areholdinga secondworkshopin dividuals participate in discussions,reportproblems,
March of 2013 at UC Santa Cruz. Despite this rela- provideenhancements,andsupportothercommunity
tively high level of activity for a project in computa- members, is neither necessitated by software being
tional astrophysics, the community has faced several open source, nor is it a foregone conclusion for open
2
source software. namics, gravity, and so on. But as the simulation
For scientific software, this situation is slightly codebecomeslarger,moregeneral,andmoremature,
more complex, as the adoption of open source meth- it becomes unwieldy for a single person to manage
ods are often met with resistance. In addition to and develop all aspect of a code. Within Enzo we
this, members of a given community related to soft- haveseenthisasthecodehasgrowntoincludemany
ware are just as likely – if not more likely – to be differenthydrodynamicmodules(includingmagneto-
workingonsimilarprojects,competingformindshare hydrodynamics),chemistrymodules,radiativetrans-
among the academic public, and even competing for port, star particle implementations, and parallelism
funding or jobs. Building communities that are able strategies. Theoverallgeneralizationofthecodebase
to thrive despite these barriers can result in con- has resulted in a deep specialization in some areasof
siderable, non-local benefits: the aphorism “a ris- the code. By investing in a community of individ-
ing tide lifts all ships” is nowhere more true than uals, the overall management costs are reduced as
in community-driven scientific codes. Even the pres- individuals specialize in different subsections of the
ence of other, engaged community members means code. This pipeline for conducting research using a
that there are more people able to answer questions code such as Enzo shares many characteristics with
from newcomers and provide assistance and energy observational research; the data is constructed using
toward solving problems. the simulation platform, relies on many pieces of di-
verse infrastructure such as IO libraries, libraries for
The concept of openness in science amongst aca-
parallelismand librariesfor solvingdifferential equa-
demics is something of a paradox. Often, utilizing
tions, and is then passed on to the analysis platform
commodity tools is viewed as a very positive trait,
which processes the data to produce results.
whereasreleasingsoftwareisoccasionallyviewedwith
skepticism. While thisis changing,andinsomeways The third objection enumerated above is some-
quiterapidly,theideaofsourcecodebeingsharedbe- what more subtle. The implicit question is not,
tweenpotentialcompetitorsisstilloftenseenasdan- “What if someone finds a bug?” but really “What
gerous or even anathema to scientific progress. This if someone sees my work and is able to show that
typically breaks along three primary objections:1 it is flawed?” This objection is the most challeng-
1. Why should I give up my competitive advan- ing, as it neatly aligns with the conflict between per-
tage? sonal advancement and the collective advancement
2. How can I manage supporting a code? of science. The glib, simple solution to this would
3. What if someone finds a bug in my work? be to prioritize an incremental, layered approach to
science rather than preserving the professional bene-
Answeringthese questions in detail is beyondthe
fits of previous, potentially incorrect results. In this
scope of this document. But, a few comments can
be made. The first two objections to releasing approach,“bugs” and flaws in results are not hidden
code are directlyaddressed bya communityof from view and defended from exposure, but rather
practice. As noted above, the benefits from collab- found and corrected. Unfortunately, such prioriti-
zation is a subtle form of the prisoner’s dilemma:
orative, participatory communities can alleviate the
it is maximally successful when all members of the
burdens of support as well as provide advantages to
communityparticipateinrevealingshortcomingsand
scientificinquirythatwouldotherwisenotbepresent.
problems. The personal solution is somewhat more
InboththeEnzoandytcommunities,asthecommu-
subtle,andnotentirelyclear. Fromapragmaticper-
nity has scaledin size, with it has scaledthe number
spective,identificationofbugsenableshigher-quality,
of eager, helpful participants in contributing mod-
longer-lasting results, less likely to be overturned by
ules (which are then shared, collectively) as well as
subsequent investigations by competitors.
provide supportfor problems and issues on the mail-
ing list. When discussing these aspects of competi- As computational science matures, and as com-
tive advantage and support, it is also worthwhile to putationaltechniquespermeateeveryaspectofscien-
frame the discussion in terms of generalization and tificinquiry,itisnaturalthatthe softwareutilizedin
specialization. As an example, in the early stages of scientificinquirygrowsmorecomplex. Scientificsoft-
developing a simulation code, it’s entirely reasonable wareinmanyrespects,particularlyinastrophysics,is
that a single individual can manage every aspect of thoughtofassomething ofasecond-classcitizen– in
the code: the IO framework, parallelism, hydrody- yearspast,theconceptofscientificsoftwarebeingde-
1Forabroaderandmorequantitative study,seeStodden [2010]
3
veloped in isolation, placed on a website and largely all – whether that be contributing code, documen-
abandonedbecamepervasive. Thismethodologysim- tation or even reporting bugs. This cognitive block
plywillnotscalewiththecomplexityofprojectsnec- comes from a fundamental misunderstanding of how
essary for modern scientific inquiry; we have reached scientific codes are developed,one that we have been
the ageofadvancedalgorithmsbeing appliedinnon- attempting to remedy within our own ranks. The
trivialwaystocomplex,physically-richdatasets. The traditional view of open source scientific code devel-
cyberinfrastructure necessary to address problems in opmenthasbeenthatoftwogroups–developersand
computationalscienceisnolongertractablysolvedby users. This segregation of individuals results in an
individualsworkinginisolation–communityprojects implicit, but difficult to overcome, feeling of bound-
must become the new norm. aries between responsibilities. Often responsibilities
such as verificationof results, inspecting results, and
Part of the reason that communities developed
tracking (and understanding) modifications to the
around scientific codes can be beneficial is also a
code base are left to “developers.” Unfortunately,
component of the challenges within these communi-
all of these responsibilities are essential components
ties. An active community with participants shar-
of the scientific method! Segregating responsibilities
ing enhancements, features, and assistance relies on
inthis wayleadstomisunderstandingsaboutthe na-
the participants developing those enhancements and
ture ofscientific codes,many ofwhichareconstantly
understanding the code base well enough to provide
under development and improvement, and the na-
assistance to less experienced individuals. Scientific
ture of the results that they can produce. This is
code development is strongly driven by pragmatic,
even evidenced in simple things such as using the
short-term needs of scientific inquiry; as a result,
word “user” to describe community members, fur-
most improvements are motivated by the next pa-
ther emphasizing a distinction between people who
per, or the next talk, or advancing a particular line
“use” the code and people who “make” the code.
ofinquiry likely to resultin a publicationandthe af-
In Pawlik et al. [2012], the authors thoughtfully de-
filiated respect from peers for solving a challenging
scribe the role of scientists in softwarenot as a black
scientific problem. The common argument against
andwhite,userordeveloperdistinction,butratheras
sharingadvancements,techniquesandtoolsisthatit
acontinuumofgreys. Often, the self-applicationofa
undercuts competitive advantage. This form of in-
term like “user,” with its connotations, results from
dividualism results not only in researchers refraining
self-assigned roles, or perceptions of individual abil-
from sharing, but more importantly, prevents them
ities. As peers in a global scientific community, the
from benefiting from the work of others. A strong
distinction between “users” and “developers”
community of sharing results in a stronger technical
is actively harmful, when in reality scientists are
code base,even if it has the side effect of a moderate
tasked with occupying that grey area in the middle.
reduction in perceived intellectual priority for indi-
vidual researchers. If I share a technique that can In both the yt and Enzo communities, we have
be applied to Population III star formation simula- seen this behavior play out numerous times. Those
tions, I can no longer be the only one to apply that members of the community who have been the most
technique – and thus the only person who can an- giving of enhancements, assistance, and technology
swer that type of problem. I lose a small amount of (scripts, modules, bug fixes) are typically those who
intellectual priority. However, I am now able to re- are able to best take advantage of the technology to
ceive improvements to the technique, suggestions for askcomplexquestionsaboutthephysicalworld. Fur-
further enhancements, and am thus able to extend thermore, the process of opening up a module and
my scientific inquiry further than before. The most contributing it to an open source software commu-
importantaspectofthisisthataltruisticgoals(shar- nity provides the opportunity for receiving enhance-
ing,reproducibility,openness)alignexactlywithnon- ments in functionality, bug reports, and synergistic
altruistic goals. It does not matter if I share because applications of that module.
I hope to gain something in return or because I be-
In the yt project, there are a number of individ-
lieve in openness; the end result is that the health of
uals who are the primary contributors and a larger
the community has been improved by participation.
number of individuals who have provided occasional
The strongest bias we have seen in yt is not bug fixes or individual, isolated modules. The influ-
against releasing of code or contributing back, but ence of non-developing community members on code
againstthe notion of conducting any development at changes, roadmap efforts, and development method-
4
ology is not only allowed, but encouraged and so- to computational astrophysics, but is also seen fre-
licited. Furthermore,weactivelysolicitcontributions quently within the instrumentation community. A
in the form of scripts, modules, and enhancements common pattern seen within yt is a spirited discus-
or modifications to the yt codebase. This initiative sionofidealisticgoalsfortheproject,butthenpress-
has resulted in several positive effects, all of which ing deadlines and other local concerns typically tem-
have strengthened yt as a community, independent per enthusiasm and execution. I myself am not only
of their effects onyt as software. Individuals who do guilty ofthis, but perhaps the mostcommoninstiga-
not feel motivated to or capable of contributing in a tor!
technical sense are more likely to contribute through The challenge to community building I find the
helping other users, providing documentation, and mostworrisomeisthatpresentedbytheso-called“ci-
even networking and advertising the project. But tation economy” in astrophysics. The influence of a
the more prominent effect is a sense of investment in given piece of code is often measured by citations to
theproject. Thissenseofinvestmentresultsinquan- a method paper, which is by definition a fixed and
tifiable results (such as more mailing list activity, a archived document. In 2011, a method paper for yt
largernumber of commits) as well as results that are was published. Enzo’s method paper is still under
more difficult to quantify, such as positive feelings, preparation, but papers in 1997 and 2004 are often
friendly discourse, excitement, word of mouth, and cited as references for Enzo. Because ADS provides
so on. reverse-indexingcapabilities, the number of citations
to these papers provides an immediate method of
gauging the influence of the software that they de-
4 Challenges
scribe. Intheinterveningtimebetweenpublicationof
the methodpapers,new contributorshavejoinedthe
Even beyond these concerns, community building collaborations and contributed substantial and non-
can be challenging within the confines of the tra- trivialenhancementstothecodebase. Infact,thisis
ditional realm of academia. The reward struc- notonlywhathasalreadyhappened, butanexplicit,
ture in astrophysical academia, for funding, jobs named goal of the future of these two code bases!
and mindshare, is highly-correlated with influence,
Thisresultsinanunfortunateconflictofinterestsbe-
high-impact papers, and citation counts[see also tween new contributors and existing contributors. It
Howison and Herbsleb, 2011]. Often developing soft- is in the best interests of existing contributors, who
ware, or participating in academic communities, is contributedto the code base prior to the publication
seen as orthogonal to these goals, even though this of a method paper, to consolidate citations in the
is not necessarily true. In light of this, soliciting de- canonical paper; unfortunately, it is in the best in-
velopment contributions is particularly difficult. De- terests of new contributors to publish a new method
veloping a tool in support of a publication (for in- paper, so that they can begin to “collect” citations.
stance, a module for yt) is seen as a worthwhile use
In whose interest is it to contribute to a code base
of time. Despite the benefits that come from making
without rewards mediated through citations, one of
that tool available, the time needed to “clean it up”
the most common metrics of success in academia? I
and provide a modicum of support is not as immedi-
believe this is among the greatest challenges to com-
ately beneficial, as it does not result in an additional
munitydevelopedcodesinastrophysics,onethatwill
publication, additional citations, and so on. Often,
continue to grow in importance as software projects
those contributions that would be the most benefi- inevitably scale beyond a handful of contributors2.
cialtotheprojectasawholearethemostdifficultto
motivate. These secondary, yet important, goals in-
clude improvements to documentation (particularly 5 Strategies
narrative, instructive documentation), testing of in-
dividual components (as opposed to large-scale in- Notallofthechallengesenumeratedabovecanbead-
tegration testing), and infrastructure improvements dressed directly. However, in both the yt and Enzo
suchasoptimizationandmaintainabilityrefactoring. communities, we have addressed the technical and
As a result, tool builders are not always favored by social challenges through conscious development of
the academic reward structure. This is not unique infrastructure and techniques. These have been de-
2Alternatemetricsofsoftwarecitationarebeingexploredby,e.g. theASCL[Allenetal.,2012],buttheyprimarilyaddress
discoverabilityofcodes anddonotattempttoaddresstheissueofdistributingcredittocontributors.
5
signed to foster good will, encourage contributions, changes upstream. The unique versioning of lo-
and build social capital between contributors. Be- cal copies of a code base also allow unique revision
low, I list several steps we have taken over the last specifiers to be applied globally; rather than “ver-
severalyearstowardthegoalofgrowingandengaging sion 4.2 with modifications (unspecified)” code that
3
communities of practice. generates a given result can be uniquely identified
Mostimportant,however,theseneedtobeviewed with a globally-unique hash. Both yt and Enzo
as the investment they are. These strategies, partic- are hosted on BitBucket, a code hosting provider
ularly the social strategies, require not only focused (roughly functionally equivalent with others such as
energy and time, but an emotional investment. You RhodeCode, GitHub, Gitorious and Savannah). It
must design the community you want. This provides technical infrastructure for code review, ac-
design extends far beyond designing software and al- cepting changes, tracking contributions and identi-
gorithms; it includes thinking about the diversity of fying specific revisions of the code. By using this
contributors and community members, the tone of systemwelowerthebarriertoentryforcontributors,
discoursewithinthecommunity,theprojectedenthu- enablingsubmissionoflocally-developedchanges. By
siasm within the project, and even the congeniality usingDVCS,weencouragescientistsusingytorEnzo
withwhichfeedback–especiallycriticalfeedback–is to track their own changes to it. Even if this code is
received. Thecultureseededinacommunitywillself- never submitted for inclusion in the primary reposi-
propagate;whether this is a culture ofneglect, a cul- tory,thisenablesprovenancetrackinganddebugging.
ture of homogeneity, a culture of kindness, a culture
Development of scientific software also requires
ofbrasharroganceorevenacultureofopenness,this
several difference degrees of communication; as the
willflowoutwardfromthecore[Bacon,2009,Trapani,
software itself is not the only project contributors
2011, Allsopp, 2012]. Both Enzo and yt have at-
areworkingon,wehavefoundthatcommunicationis
tempted to build cultures that promote respect and
naturally divided into three categories, based on la-
excitement.
tency and urgency. Immediate communication, such
as in-person meetings (or “code sprints”) or Google
Hangouts,areusefulforlow-latencyplanning,discus-
5.1 Technical Infrastructure
sion,andcollaboration. Tools suchas InternetRelay
Chat(IRC)provideameansofmedium-latencycom-
WithinbothytandEnzo,weutilizeseveralpiecesof
munication;ofteninthe#ytchannelonFreeNode,an
technicalinfrastructurethateasetheprocessofgrow-
IRCnetwork,between10and20peoplecanbefound
ingcommunities[seealsoFogel,2005]. Theseinclude
atanytime. Messageslefthereareusuallyresponded
open and freely-joinable mailing lists, a completely
torelativelyquickly,whichleadstoalow-latencydis-
open development process based on the distributed
cussion where problems can be resolved much more
versioncontrolsystem(DVCS)mercurial,andacode
quickly than through email. Finally, nearly all plan-
reviewandmentoringprocesstoensurecontributions
ning and detailed discussion happens through high-
are high-quality, with a minimum of friction. Fur-
latency mediums such as comments on code changes
thermore, we utilize methods of communication that
andmailing lists. The mailing lists for yt (which are
enable participation and that are tuned for low- to
titled the unfortunately-chosennames yt-usersand
high-latency interaction.
yt-dev)areopen,indexedbyGoogle,andfreelyjoin-
Bothyt andEnzohaveseenanenormousgrowth
able. Wehaveoften(successfully)encourageddiscus-
in contributions following migration to DVCS; this
sions here to turn into collaborations and code shar-
is not unique to these projects, but is a hallmark of
ing, and in fact we have had a number of success-
the lowered barrier to entry presented by DVCS. In
fulopportunities for long-termplanning andinvasive
contrast to centralized version control systems such
code changes here. They serve not only as broadcast
as CVS and Subversion, DVCS systems are fully-
media, but as venues for soliciting input and contri-
distributed. Every clone, or checkout, of a reposi-
butions.
tory is a peer with every other clone; this enables
anyone who checks out a repository to track their Nearly as important as providing mechanisms
own changes to the repository, and provides the for communication is reducing the barrier to en-
(technical) ability to much more easily contribute try for new contributors. This includes ensuring a
3A relatedworkisPrli´candProcter [2012] wherethe authors have compileda setof veryusefuland helpful guidelines for
scientificsoftwaredevelopment.
6
smoothprocesstofindingtheappropriatelocationin techniques described in the book to software engi-
the source code, making changes, submitting those neers, in many ways they can be applied directly
changesforreview,andakind,thoughtfulmentoring to collaborations around scientific software projects.
process for new code contributors. The most com- The authors enumerate three characteristics which
mon path we have seen for new code contributors is they suggest applying to all communication and in-
relatively straightforward: teraction:
1. Individual applies software package to meet • Humility
own goals • Respect
2. Individual develops a modification to the code, • Trust
for either an enhancement, a bug fix, or a con-
Communicationbytext,inparticularabouttech-
tribution of an example
nical topics, is often stripped of the nuances and
3. This change is submitted for inclusion in the
inflections that convey emotion. In textual media,
main yt repository
therefore, it is even more important to guide discus-
4. The change is read, reviewed and tested by
sion in a way that encourages participation. Within
community members to ensure that standards
the ytcommunity, weareverycarefulto ensure that
for coding, documentation and testing are all
the tonesonthe mailing list,code reviewandinIRC
met.
shouldbeconductedinthisway: thehealthofacom-
5. Theindividualbeginstoparticipateinthecom-
munitycanbejudgedbyhowittreatsnewcomersand
munity
novicesaswellashowittreatsexperienced,advanced
The main codebase is not suitable for contribu-
participants. Even in framing analysis interactions
tions of all types; we have found individuals ea-
betweenpeersinthisway,onecanseehowcommuni-
ger to contribute scripts that exist as supplemen-
cationcanservethegrowthofscientificcommunities.
tal tools, or that were used in the writing of a pa-
Inscientific softwarecommunities, an aspectthat of-
per. To support this desire, we provide a location
ten goes unmentioned is that we arguably want to
(https://hub.yt-project.org/) to submit these
foster discussions between peers, not discussions be-
scripts as well, where links to external source code
tweenanelite classandasubservientclass. By guid-
repositoriesarecollectedalongsidedescriptionsofthe
ing the discussion with a focus on humility, respect
individual projects. In this way, by providing mech-
and trust (HRT as abbreviated by Fitzpatrick and
anisms and structures, as well as an open and active
Collins-Sussman), discussions can become more con-
solicitationfortechnicalcontributions,wehavefound
genial, more productive, and can lead to a greater
that we can foster a sense of community and accom-
spirit of collective focus.
plishment among individuals.
We have found that even minor barriers to con- Humility can be well-characterized in how a
tribution (or even software deployment!) can build communityrespondsto negativefeedback. Ina com-
up a “technicalfriction” to participationthat results munity in which I participate, I witnessed a dis-
in losing contributions and community members. In cussion between an experienced community member
an effort to alleviate this, we provide easy access to and a newcomer. The newcomer reported what they
thesourcecode,detailedinstructionsforcontributing thought was a bug in a particular routine. The ex-
code, a mechanism for communication, and scripts perienced community member, who had made the
that assist with the entire process. We have found change that introduced the potential bug, responded
that by ensuring that we have a uniform review pro- with a very abrupt, dismissive response. “It’s like
cess (where even code written by founding members that for a very good reason. Don’t touch it.” By
of the community is reviewed) we ensure that the terminatingdiscussionatthatpoint,itsentthe clear
code is of acceptable quality, tests are included, and signalthatadiscussionofthereasoningbehindacode
that undocumented code is discouraged. change was simply not necessary; the message was
notonlythatthereasoningwasbeyondreproach,but
that a discussion was not worth the time it took to
5.2 Social Techniques
have. Thebettersolution,ofanopenandhumbledis-
In Fitzpatrick and Collins-Sussman [2012] the au- cussion of the background and reasoning behind the
thorsdescribeastrategyforsocialinteractionsamong decision, would have been to engage the newcomer
so-called “geek” teams. I will not provide a detailed and explain the reasoning behind a decision. This
retelling of this, but despite focus on applying the takes an investment of time, but certainly no more
7
than the investment of time it would take to justify gard external contributions in a positive way,
such a decision to a referee or journal reviewer, and rather than as threats to their accomplish-
the potential gains in this case would be a new con- ments.
tributorandanincreaseinsocialcapitalamongstthe HRT are not always easy attributes to focus on.
community. Particularly in a competitive field such as astro-
Respect can be seen in how peers contribute to physics, it can be difficult to spend time thinking
about how words or conversations will be perceived,
the development of the code. In principle, when de-
particularly when these conversations are between
velopingascientificcode,weareengaginginscientific
potential competitors. But by remaining focused
discourse,attemptingtopushthefieldofastrophysics
on engaging community members as peers, treating
forward in both our understanding of and our abil-
them with HRT and spending time and energy on
ity to ask questions of the natural world. One of the
thoughtful communication, the community will be
most common ways I’ve seen this ignored is when a
more likely to grow and flourish.
personwritestoamailinglistorappearsonIRCand
sayssomevariationof“I’venoticedsomethingisact-
ing strangely with ...” The all-too-common response
6 Conclusions
is simply, “You’re doing it wrong,” or even worse, a
dismissive instruction to “Just read the documenta-
As computationalscience has grownboth more com-
tion.” This is rarely even prefaced with the obvious
plexandmorepervasive,communitiesofpracticesur-
question of, “How did you expect it to act?” It ter-
rounding scientific softwareprojects have become es-
minates discussion and discourages individuals from
sential to the health and vibrancy of computational
returning. This has the direct effect of reducing par-
scienceprojects. Unfortunately,theacademicreward
ticipation, and disproportionately impacts new con-
systemdoesnotalwaysalignwiththedevelopmentof
tributors. A lack of respect can infect and transform
softwareprojects,whichcanleadtostagnationorcor-
a community into an unfriendly, unwelcoming envi-
nercuttinginimportanttaskssuchasinfrastructure,
ronment; the most valuable attribute a community
documentationandtesting. Thislargelyresultsfrom
of scientific software developers has is its ability to
the overlap between the traditional roles of develop-
engage in scientific practice, and that is obstructed
ers and users in software development, a distinction
by a lack of respect.
that can in fact actively harm the scientific process.
Trust within scientific software communities is
To grow communities around any software
perhaps more subtle. What we have seen with yt
project, the structure and character of the commu-
and Enzo is not that trust is an unwavering faith in
nityitself mustbe carefullyconsidered;youmustde-
someonetoproducehigh-quality,scientificallycorrect
sign this community. This includes setting the tone
code, but a trust in the process and the stewardship
of discourse, consciously fostering a diverse set of
of a code. Within communities of scientific software,
participants, and being open and enthusiastic. This
we have seen this evidenced when contributorslet go
processcan be assisted with technical infrastructure.
of a project. As a community matures, individuals
This includes using distributed version control sys-
cannotcontributetothebreadthofsub-projectsthat
tems, conducting open communicationin media that
theydidwhenitwasyoung;furthermore,asscientific
suitthestyleofcommunication,andstreamliningthe
interests shift, the leader of an individual project or
participation process and providing a clear method
aspect of the scientific code may move on to other
of contributing. Technical infrastructure alone can-
things. Trusting that the community will be able to
not “solve” the development of a community; social
steward and shepherd that project is a crucial ingre-
infrastructureandstandardsofconductmustalsobe
dient in addressing sustainability and burnout, and
developed. Conducting business while remaining fo-
an essential ingredient in expressing trust for peers
cused on humility, respect and trust can help to en-
and other community members.
sure that newcomers feel welcomed, existing contrib-
The motion of projects between individuals is a utorsandpeersfeelvalidatedandrespected,andthat
difficult social situation. We attempt to address this ultimately as interests change over time projects do
in yt by emphasizing pride over ownership. By notstagnateundertheweightofcodeorprojectown-
changing the conversation from dominion to ership.
stewardship, this changes how individuals ap- A vibrant, active community brings sustainable
proach projects, and helps enable them to re- development, synergistic applications and develop-
8
ment of code, and considerable rewards in potential and N. P. F. Lorente, editors, Astronomical Data
collaborations and discourse. I am grateful for the Analysis Software and Systems XXI, volume 461
opportunity to have participated in both the yt and of Astronomical Society of the Pacific Conference
Enzo communities, and grateful for the concrete re- Series, page 627, September 2012.
wards–scientific,socialandtechnical–thatmypar-
John Allsopp. The proof of the
ticipation has allowed me.
pudding, November 2012. URL
As a closing note, despite all of the successes we
http://www.webdirections.org/blog/the-proof-of-the-pudding
have seen in the yt and Enzo communities, we still
have considerable roomto grow from both the social Jono Bacon. The Art of Community. Building the
and technical standpoints. I hope that as communi- New Age of Participation. 2009.
ties drivenboth bylong-reachinggoalsandthe prag-
Brian W. Fitzpatrick and Ben Collins-Sussman.
matic needs of scientific inquiry, we can continue to
Teamgeek: asoftwaredeveloper’sguidetoworking
remain healthy, vibrant and growing. I believe that
well with others. O’Reilly, Sebastopol. CA, 2012.
the single biggest problem within scientific software
ISBN 978-1-449-30244-3.
communitiesisensuringcreditforcommunitypartic-
ipation can be shared with new contributors. This KarlFogel. Producing OpenSourceSoftware: Howto
affects motivations at essentially every level, and be- Run a Successful Free Software Project. O’Reilly
cause of its very nature can dramatically affect the Media, Inc., October 2005. ISBN 0596007590.
health and stunt the growth of scientific software
communities. James Howison and James D. Herbsleb. Scientific
software production: incentives and collaboration.
In Pamela J. Hinds, John C. Tang, Jian Wang,
Acknowledgments Jakob E. Bardram, and Nicolas Ducheneaut, ed-
itors, CSCW, pages 513–522. ACM, 2011. ISBN
I would like to thank the organizers of the Texas 978-1-4503-0556-3.
Scientific SoftwareDays conference,Victor Eijkhout,
A. Pawlik, J. Segal, and M. Petre. Documentation
SergeyFomel, Andy R.Terrel,andMichaelTobisfor
practices in scientific software development. In
a wonderful conference full of excellent talks. I’d
Cooperative and Human Aspects of Software En-
also like to thank the other attendees and speakers
gineering (CHASE), 2012 5th International Work-
for fascinating and enlightening conversations about
shop on, pages 113 –119, june 2012. doi: 10.1109/
scientific software. I am supported by an NSF CI
CHASE.2012.6223004.
TraCSfellowship(OCI-1048505). Anyopinions,find-
ings, and conclusions or recommendations expressed Andreas Prli´c and James B. Procter. Ten sim-
inthispublicationaremyownanddonotnecessarily ple rules for the open development of scientific
reflect the views of the National Science Foundation. software. PLoS Computational Biology, 8
I would also like to thank the members of both (12):e1002802+, December 2012. ISSN 1553-
the yt and Enzo communities. Much of this talk – 7358. doi: 10.1371/journal.pcbi.1002802. URL
andapproachtocommunitybuilding–wasdeveloped http://dx.doi.org/10.1371/journal.pcbi.1002802.
incollaborationwiththem. I’mgratefulforfeedback
Victoria Stodden. The scientific method in prac-
providedby NathanGoldbaumandBrianO’Shea on
tice: Reproducibility in the computational sci-
a draft of this manuscript. In addition, I would like
ences. Technical Report Working Paper 4773-10,
to thank Britton Smith and Greg Bryanboth for ex-
MIT SloanSchoolof Management,February 2010.
tremelyvaluablefeedbackinpreparingthispaperand
URL http://ssrn.com/abstract=1550193.
forleadingbyexampleincommunitydevelopmentfor
scientific codes. A. Szczepanski, T. Baer, Y. Mack, J. Huang, and
S. Ahern. Usage of a hpc data analysis and visu-
alizationsystem. Computer, PP(99):1,2012. ISSN
References
0018-9162. doi: 10.1109/MC.2012.192.
A. Allen, P. Teuben, R. J. Nemiroff, and L. Shamir. Gina Trapani. Your community is
Practices in Code Discoverability: Astrophysics your best feature, April 2011. URL
Source Code Library. In P. Ballester, D. Egret, http://smarterware.org/7819/my-codeconf-talk-your-communit
9