research papers J Dummy-atom modelling of stacked and helical IUCr nanostructures from solution scattering data ISSN2052-2525 BIOLOGYjMEDICINE Max Burian and Heinz Amenitsch* InstituteofInorganicChemistry,GrazUniversityofTechnology,Stremayrgasse9/V,Graz8010,Austria.*Correspondence e-mail:[email protected] Received6February2018 Accepted9April2018 Theavailabilityofdummy-atommodellingprogramstodeterminetheshapeof monodisperse globular particles from small-angle solution scattering data has led to outstanding scientific advances. However, there is no equivalent EditedbyJ.Trewhella,UniversityofSydney, procedure that allows modelling of stacked, seemingly endless structures, such Australia as helical systems. This work presents a bead-modelling algorithm that reconstructsthestructuralmotif ofhelical androd-likesystems.Thealgorithm Keywords:SAXS;stackedstructures;helical isbasedona‘projectionscheme’:byexploitingtherecurrentnatureofstacked structures;shaperetrieval;SasHel;structure systems, such as helices, the full structure is reduced to a single building-block determination;solutionscattering; computationalmodelling;structuralbiology; motif.Thisbuildingblockisfittedbyallowingrandomdummy-atommovements nanoscience. without an underlying grid.The proposed method isverified using avariety of analytical models, and examples are presented of successful shape reconstruc- Supportinginformation:thisarticlehas tion from experimental data sets. To make the algorithm available to the supportinginformationatwww.iucrj.org scientific community, it is implemented in a graphical computer program that encourages user interaction during the fitting process and also includes an option for shape reconstruction ofglobular particles. 1. Introduction Small-angle X-ray scattering (SAXS) is an established tech- nique to study the structural aspects, such as the size and shape,ofmolecularsystemsinsolution(Lietal.,2016).Asthis structural information is not directly apparent from the recorded scattering intensity, one requires a fitting process that generally relies on an underlying mathematical model (Pedersen,1997).Whileavarietyofsuchanalyticalmodelsare available in the literature (Pedersen, 1997), each of them is boundtoagivenshape.Thefittingprocessofanexperimental datasetthereforerequirespreviousknowledgeofthesample suchthattheappropriatemodelcanbechosen.Dummy-atom (DA) modelling, which describes the particle shape as a variable bead assembly, bypasses this issue as the fitting processisnolongerconstrainedtoasinglegeometry(Chaco´n etal.,1998;Waltheretal.,2000;Svergunetal.,2001;Franke& Svergun, 2009; Koutsioubas et al., 2016). Consequently, no a priori knowledge regarding the solute’s shape is necessary. This advancement has helped to make SAXS an attractive technique to characterize stable molecules in solution, far beyond the community of scatteringexperts (Yang,2014). However,singlemoleculesinsolutionarenotalwaysstable. In fact, it is in their nature to interact and aggregate, often resulting in highly organized hierarchical systems stretching over several orders of magnitude (Palmer & Stupp, 2008; Wasielewski, 2009; Busseron et al., 2013; Praetorius & Dietz, 2017). Helical and chiral superstructures are a common yet spectacularclassofsuchsystems,e.g.thelengthofthehuman genome DNA double helix exceeds several centimetres (Venter et al., 2001) while the forces that give rise to the 390 https://doi.org/10.1107/S2052252518005493 IUCrJ (2018).5,390–401 research papers Figure1 Theprojectionscheme,visualizedusingtheexampleofasingle-strandedhelix.Forsuchseeminglyendlessgeometries,thereisabuildingblock(upper left-handcorner)thatisrecurrentalongthezaxis(seefullhelixontheright).ForafullbodyconsistingofMstackedbuildingblocks,onefindsdistinct structuralmotifs,suchasthebuildingblockitself,neighbouringbuildingblocks,single-spacedbuildingblocksandsoonupto(M(cid:3)1)-spacedbuilding blocks.Thescatteringintensityofthefullgeometrycanhencebecalculatedbysummingthecontributionsofthesestructuralmotifs,scaledbytheir recurrence.ThisbypassesthedemandforanactualDAmodelrepresentingthefullstructure,aswesolelyevaluateprojectionsofthebuildingblock insteadofactualstackedduplicates,hencethetermprojectionscheme. characteristic helical motif act on the molecular (nanometre) seemingly infinite nature of e.g. helical systems onto a single level (Dobbs, 2007). Amyloid fibrillation, i.e. the aggregation building-blockunit,representedbydummyatoms(DA).This ofproteinsintoinsoluble,oftenhelical,fibrils,hasbeenlinked building block is altered by random DA movements while to critical diseases such as type 2 diabetes or Alzheimer’s simultaneously fitting the corresponding scattering curve (Dobson,2003;vonBergenetal.,2000)aswellastounwanted againsttheexperimentalone.Theproposedmethodisverified drug degradation (Morozova-Roche & Malisauskas, 2007; using simulated data sets of various one-dimensional struc- Vestergaard et al., 2007). Consequently, a comprehensive turesandwesubsequentlyapplyittoexperimentallyobtained understanding of these systems requires structural character- SAXSdata.Thefullalgorithm isimplementedinacomputer ization at the (supra-)molecular scale. Small-angle X-ray program(SasHel),whichalsoincludesanoption forglobular diffraction of aligned fibres [e.g. the famous studies of the geometries. The reduction of the system’s complexity by structureofDNA(Wilkinsetal.,1953;Watson&Crick,1953)] symmetricprojectionandthefastcodeimplementationresult is a powerful technique for this purpose. However, the in a toolkit that allows a full shape retrieval from scattering evaluationofsolutionscatteringdatafromrandomlyoriented dataintheorderof3–90minonastandardworkstation(see helicalstructuresfacestwochallenges:(i)onlyafewanalytical Appendix A),depending on the level ofdetail. models are available from which structural information, such aspitchandtwist,canberetrieved(Schmidt,1970;Pringle& Schmidt,1971;Hamley,2008;Teixeiraetal.,2010);and(ii)the 2. Projection scheme endless nature of helices does not allow DA modelling using current programs (Volkov & Svergun, 2003; Gingras et al., Dummy-atom modelling is based on the idea that a given 2008). particleshapeisrepresentedbyanassemblyofNsmallbeads In this work, we present a bead-modelling algorithm to ofequalscatteringlengthdensity,calleddummyatoms(DA). determine the structural motif of monodisperse systems, AccordingtotheDebyeformula(Debye,1915),thescattering highly elongated in one dimension, in solution from SAXS intensityofthisassemblycanthenbecalculatedfromthebead data. Weuse symmetrical boundary conditions to project the assembly as 391 IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling research papers (cid:2) (cid:3) IðqÞ¼f ðqÞ2XN XN sin qrij ; ð1Þ Fig.1(NBB=400,M=15),thisincreasesthecalculationspeed DA qr approximately seven fold. i¼1 j¼1 ij Thenotionofreducingagivenmodeltoitsstructuralmotifs where the norm of the scattering vector is denoted as q = raisesacriticalpointregardingtherequirednumberofstacks 4(cid:2)sin((cid:3))/(cid:4) (2(cid:3) is the scattering angle and (cid:4) is the radiation M necessary to represent a seemingly endless structure. This wavelength)andf (q)representstheDAformfactor(given number defines the length of the entire projected model L, DA inthisworkbyaGaussiansphereapproximation;Koutsioubas simplyviaL=MH .Accordingtoscatteringtheory,rod-like BB & Pe´rez, 2013; Svergun et al., 1995; Grudinin et al., 2017; structurespresentacharacteristicq(cid:3)1power-lawbehaviourin Fraseretal.,1978).ADAdiameterof0.2nmwasusedforall the intermediate Porod regime (Glatter & Kratky, 1982), reconstructions. As this formalism considers the distances whereas the transition from this Porod regime (q(cid:3)1) to the betweenallpossibleDApairsr =|r (cid:3)r|insidetheassembly, adjacentlower-angleGuinierregime(q(cid:3)0)occursataspecific ij j i its mathematical complexity is OðN2Þ. Hence, the computa- scatteringvectorq dependingonthelengthoftherod,which 1 tional effort increases drastically for large N. Adequate canbeestimatedusingq =181/2/Linthesimplifiedframework 1 modelling of seemingly infinite rod-like geometries requires of the Hammouda model (Hammouda, 2010). Thus, if the very large numbers of DAs (N > 10000), making the Debye experimental data present a q(cid:3)1power law at even the smal- formula appearinadequate in its standardnotation. lestaccessiblescatteringangleq ,thelengthofthestructure min Rod-likestructures,such ashelices,often possess acertain must be larger than L > 181/2/q . Similarly, we use this min structural motif, a building block, which recurs along the relationtodeterminetheminimumnumberofstacksrequired, elongation direction. Hence, only this building block is of according to M > 2 + 181/2/(H q ) (with the additional BB min interest for shape reconstruction. We define a building block term of+ 2to avoid truncationeffects). ofN dummyatomswithitselongationdirectionparallelto BB the z axis (see Fig. 1). The full rod-like structure is then 3. Determination of the stacking distance reproduced by duplicating the building block M times along thezdirection,withastacking distancecorresponding tothe The projection scheme is a faster alternative for calculating buildingblock’sheightH (seeFig.1).Theevaluationofthis the scattering intensity of a stacked structure compared with BB stackedmodelbymeansoftheDebyeformula[equation(1)] the standard Debye formula. As is apparent from equation now includes redundant terms, as distinct motifs inside the (2),theformalismrequiresanadditionalinputparameter,the structurearenowrecurrent.Forinstance,thesinglebuilding- stacking distance between the building blocks H . In the BB block motif would be evaluated at each repetition along z, following, we present a pathway to determine this using the such that the same computation is performed a total of M pairdistance distribution function (PDDF). times. It is therefore sufficient to calculate the scattering ThePDDFcorrespondstothesurface-weightedprobability intensity of the building block only once and scale it by the to find two points separated by a distance r inside a particle numberofstacks.Thesameprincipleappliestointer-building- (Glatter&Kratky,1982).Itisthusahistogramofalldistances blockcontributions.Forexample,asthemotifofneighbouring that appear inside a given particle, weighted by the corre- building blocks can be found (M (cid:3) 1) times in the full struc- spondingelectrondensities.InthecaseofDAassemblies,the ture,wecanevaluatethismotifonceand,again,multiplyitby PDDF can be also interpreted as the volume-weighted pair the corresponding scalar. correlation function p(r). Inmathematicalterms,theDebyeformulacanbeadjusted An analytically available example of this definition yields toneglectthese redundancies,resulting in thecase ofstacked spheresusingtheformalismbasedonthe workofGlatter(1980),fromwhichaseriesofconclusionscan IðqÞ¼fDAðqÞ2(MXNBB XNBB sinq(cid:2)(cid:4)(cid:4)qr(cid:4)(cid:4)r(cid:3)j(cid:3)rr(cid:4)(cid:4)i(cid:4)(cid:4)(cid:3) bsyestdermawonf(tseene Fstiagc.kSe1dinsphtheeressu).ppEovritdinegntliyn,foarsmathtieonsyfsotrema i¼1 j¼1 j i þ2MX(cid:3)1ðM(cid:3)kÞXNBB XNBB sin(cid:5)(cid:4)q(cid:2)(cid:4)(cid:4)(cid:2)rjþk(cid:4)HBBe(cid:3)z(cid:3)(cid:3)r(cid:4)i(cid:4)(cid:4)(cid:6)); cpoenaktasi.nTshteefinrsstppheeareks(bthlueectorarrceesipnoFnidgi.nSg1)PrDelDatFesptroetsheentmseteann q(cid:4) r þk(cid:4)H e (cid:3)r(cid:4) shape of all the spheres involved – it is hence an intra- k¼1 i¼1 j¼1 j BB z i building-blockPDDF.Thefollowingninepeaks(redtracesin ð2Þ Fig. S1) are caused by the repetitive nature of the system, as wheree denotestheunitvectoralongthezaxis(elongation/ for each possible sphere-to-sphere distance a new peak is z stacking direction). This formalismreducesthecalculationof found – they are hence inter-building-block PDDFs. These the scattering intensity to the sum of structural motifs found inter-building-block PDDFs, in particular the distances inside the stacked model (see scheme in Fig. 1). It further between them, therefore hold information on the stacking bypasses the demand for a DA model to represent the full distance between the building blocks. structure,asweevaluateonlyprojectionsofthebuildingblock In order to determine the stacking distance of a structure instead of all stacked duplicates, hence the term ‘projection fromagivenPDDF,itwouldbeintuitivetomeasurethepeak scheme’. Its benefit, compared with the standard Debye distances. However, the exact peak shapes and therefore formula, is a reduction in the mathematical complexity from positionsaredistorteddueto(i)thelinearhigh-rdecayinthe O½ðMN Þ2(cid:5)toOðMN2 Þ.Inthecaseoftheexampleshownin PDDF(seedashedlinesinFigs.S1andS2)and(ii)theoverlap BB BB 392 BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401 research papers of neighbouring peak contributions (Glatter, 1980; Feigin & needs to be minimized, where (cid:6) (q ) denotes the experi- exp m Svergun,1987).Adirectmeasurementofthepeakpositionsin mental error and N the number ofdata points. exp the PDDF might therefore result in misdetermination of the In DA modelling, the principle idea behind this minimiza- stackingdistance.Astableapproachtocircumventthisissueis tionprocedureisstraightforward:agiveninitialconfiguration to calculate the derivative of the PDDF (dPDDF). This is gradually altered until the best agreement between the numerically fast and easy operation suppresses the above- corresponding scattering curves is found. Here, we randomly mentioned decay distortion (see dPDDF in Fig. S1). The fillaninitialbuilding-blockvolumewithDAsthataresetfree resulting dPDDF can then be fitted by e.g. a damped sinu- at the start of the fitting procedure (500–1500 DAs per soidal function in which the period is directly related to the building block are suggested, depending on the experimental mean stacking distance (see Fig. S1 and Section S1 in the resolution – see Section S2). In order to optimize the target supporting information). function (cid:5)2, various metaheuristic methods exist such as Amorecomplexexampleillustratingthenameddistortion SimulatedAnnealing(Kirkpatricketal.,1983)ortheGenetic effects is the case of a torus (a ring with a circular cross Algorithm (Mitchell, 1996). In our case, we employ a meta- section; Kawaguchi, 2001). Such a torus presents two char- heuristic fitting procedure (Boussaı¨d et al., 2013; Gogna & acteristic dimensions: the ring–centre diameter and the ring Tayal, 2013) with an antifragile implementation (Taleb & thickness.Asaresult,thePDDFofsuchasingletorusexhibits Douady, 2013) and the algorithm improves (cid:5)2 by randomly twodistinctpeakscorrelatingtothesestructuralfeatures(see moving single DAs. In contrast with current DA modelling Fig.S2;themodelscatteringdata,includinganartificialerror programs,theDAsdonotmoveonanunderlyinggrid.Asthe band,werecalculatedaccordingtoAppendixA).Considering magnitudeofthemovementisscaledbyatemperaturefactor, now the case of multiple stacked rings (Kornmueller et al., we force the system to freeze eventually in a given condition 2015),foreachringthatisaddedontopoftheother(s)wefind andthe algorithm toconverge. anewpeakinthecorrespondingPDDF(seeFigs.S2andS3). DA modelling faces the general problem of uniqueness: as However, in this case, the radial size of the tori (diameter modelsconsistof>103DAs,theinformationcontentgivenby 20nm) was selected to be larger than the stacking distance the scattering data is highly overdetermined by a given between them (7nm), resulting in an increased distortion of configuration.Fittingofthescatteringdatawithoutrestraints the inter-building-block peak positions [see distortion canthereforeleadtophysicallyunfeasibleresults,inparticular phenomenon(ii)above,aswellasFig.S2].AsshowninFig.S2, with regard to the model homogeneity and compactness. the determination of the stacking distance by fitting of the Currentalgorithmscontrolandoptimizethesetwoproperties dPDDF yields astable result. during the fitting process by means of a ‘looseness penalty’ Weemploythisapproachtodeterminethestackingdistance regularization term as a quantitative measure of the DAs’ from all repetitive structures presented in this work. The local vicinity: by counting and maximizing the number of advantage of this choice is that we avoid the use of a contacting neighbours of each DA, a compact and homo- complementary characterization technique, as the underlying geneous configuration is achieved (Svergun, 1999; Franke & PDDFcanbedeterminedfromagivendatasetusingavariety Svergun,2009;Koutsioubasetal.,2016).Weadaptthisproven ofavailablesoftwarepackages.Amoredetaileddiscussionof technique to our grid-free algorithm in two ways. First, we thelimitationsandpossiblecausesofmisdeterminationcanbe introducetheparameter d ,denotingthedistancebetween N12 found in Sections S1.2and S1.3 and Figs.S1–S6. agivenDAandits12(close-packedlimit)nearestneighbours. Similar to the looseness penalty, d quantifies the local N12 vicinityofeachDA,actingasahomogeneityclassifierforthe algorithmtodecideifagivenDApositionisacceptedornot. 4. Fitting algorithm Second, we apply the idea of a regularization term and According to the presented projection scheme, we reduce a introduce a ‘radial compactness’ parameter RC(X) into the seeminglyinfinitestructuretoabuilding-blockmotiftogether minimizationprocedure,keepingtheDAclosetothe(radial) with two boundary conditions,the stacking distance H and BB centre ofmass of the configuration. the number of necessary stacks M. As these two values are As the choice of a grid-free DA algorithm is a departure determined directly from either the PDDF or the scattering from current implementations, we face challenges for which data, the modelling occurs within the single building-block no readily available solutions exist. In particular, we must volume. We thus represent the building-block motif by a introduce new concepts, (i) to obtain a hard-contact limit for dummy-atom configuration X such that the scattering inten- neighbouring DAs, (ii) to allow model scalability over sity I (q ) (where q denotes the q value of the measured calc m m different length scales and (iii) to generate a random move- data points) can be calculated using equation (2). To find a mentdependingonthecurrentannealingtemperature.Inthe configuration that best fits the experimental scattering inten- following subsections we discuss these topics in more detail, sity I (q ), the relation exp m startingwiththeintroductionofthescalingparameterD and X the random-movement generator, and then moving to the 1 XNexp"I ðq Þ(cid:3)I ðq Þ#2 mathematical definitions of d and RC(X) regarding (cid:5)2 ¼ exp m calc m ; ð3Þ N12 N (cid:6) ðq Þ homogeneity and compactness. A general overview of these exp m exp m conceptsandtheirparametersisgiveninTable1.Attheend 393 IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling research papers Table1 Main concepts of the proposed grid-free fitting algorithm, the corresponding mathematical parameters, their function and their implementation comparedwithcurrentsolutionsinfixed-gridDAmodellingprograms. Fordetailsofcurrentfixed-gridmethods,seeSvergun(1999),Franke&Svergun(2009)andKoutsioubas&Pe´rez(2013). Concept Fixed-gridmethods Parameter Function Implementation Scalability Gridsize D Scalingparameter d ,RC(X),randommovement X N12 generator Homogeneity Fixedgrid/loosenesspenalty d AvoidDAclustering/disconnected Movementclassifier/forcedrecon- N12 DAs figuration Compactness Loosenesspenalty/limitedsearch RC(X)/D KeepDAscloseto(radial)centre Regularizationterm/forcedrecon- X,crit volume ofmass/avoidmodelexplosion figuration of this section, we further address the topic of model Alongthezaxisweintroduceacontinuitycondition(seethe uniqueness and repeatability and give a conclusive overview limitedbuilding-blockheightinFig.1)suchthatDAsleaving ofthe algorithm’simplementationin the programSasHel. the building block in the vertical direction are re-projected back inside the building-blockvolume. 4.1. Scalability – scaling parameter D X Allowing DAs to move randomly without an underlying 4.2. Random-movement generator grid brings an advantage in terms of the model volume and As we pursue the concept of grid-free random DA move- therefore the radial size. In a fixed grid system, the final DA mentstofindanoptimalconfiguration,werequirearandom- densityispredefinedbythegridresolution,whichisrelatedto movement generator that depends on a series of parameters, theresolutionofthescatteringpatterninreciprocalspace.In including (i) the dimensions of the current DA configuration the case of reconstruction of a highly elongated structure, (H and D ), (ii) the current temperature T of the fitting BB X radial expansion of the DA configuration during the fitting process(1>T>0)and(iii)thehelicalchiralityofthesample process is only possible by the addition of new DAs, hence (helicalfieldbias(cid:7)).Inevery numericaliterationthroughout increasing the numerical overhead (N ’ D2 for radial DA X thefittingprocess,achosenDAismovedalongtheunitaxes growth).Radialcompression,ontheotherhand,islimitedby e , e and e by the random vector r (T), where x y z rand the grid resolution, such that at a certain point the grid type (e.g. hexagonal packing of the DAs) may impose artificial 0 h (cid:7) (cid:8)i 1 D T Rand þ(cid:7)Tcos 2(cid:2)zi e shtotirgubhceltyumeroalorlenfgecaoattaeurdrseesstt.rhuTachntuistrheiess,ewpxaphreetrriciemutlehanerltygarlitldrimureeitsfoaolsruattihoreenasicsoanlsiekabeollyef rrandðTÞ¼BB@DXXThRandxyþ(cid:7)Tsin(cid:7)H2H(cid:2)BBzBBi(cid:8)ieyxCCA: ð5Þ (cid:8)H TRand e BB z z computationaloverhead(N <10000)mustbemaintained. DA Our grid-free approach does not present such limitations: as Here, a random number generator Rand returns any value thenumberofDAsperbuildingblock(mainlydeterminedby between(cid:3)1<Rand <1everytimeitiscalled.Inorderto x,y,z theexperimentalresolution)iskeptconstant,thealgorithmon adjust the random movement by the size of the current its own adapts to a change in DA density while at the same configuration, the radial and longitudal contributions are time maintaining the relative model resolution and computa- scaledbythecurrentdiameterD orthebuilding-blockheight X tional resources (see Section S2.2 for a discussion of model H ,respectively(movementalongthezdirectionisscaledby BB resolution).Thisinparticularbenefitsuserswhohavechosen (cid:8) to ensure axial homogeneity, with a standard value of (cid:8) = the wrong radial size for the starting configuration, as the 0.1). Further, each movement along the x, y or z direction is algorithm adapts according to the scattering curve without scaled by the current temperature T. In the case of radial computational drawbacks. We exploit this adaptive potential movement along thexorydirection,weadded ahelical bias of our algorithm using an estimated diameter D of a given X term (depending on the DA’s current e position z; the z i configurationX,whichactsasascalingparameterthroughout building-blockmotifextendsfromz=0toz=H ).Thisbias BB the algorithm. term is scaled by the helical field bias (cid:7) (user defined with WequantitativelytrackthegrowingorshrinkingoftheDA valuesoftheorderof0<(cid:7)<1,defaultvalue0.3)aswellasby configurationXthroughoutthefittingprocessviatheradiusof the current temperature T, such that it favours counter- gyration R . For elongated cases, we assume a cylindrical G,X clockwise solutions motivatedby thefirst structural model of referencegeometry,suchthatweestimateD aftereveryDA X DNA (Watson & Crick, 1953). It is important to note that, movementaccording to mathematically, the helical bias term correlates quadratically withthe current temperature T.Therefore,itscontribution is 1 XNBB !1=2 only significant in the early stages of the fitting process (see D ¼2ð21=2ÞR ¼2ð21=2Þ x2þy2 : ð4Þ X G;X N i i Fig. S11 for its influence on the reconstruction) such that, BB i¼1 using its default value of 0.3, it does not influence the 394 BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401 research papers modellingofnon-helical systems (Fig.S15) but facilitates the 4.4.Compactness–theradialcompactnessparameterRC(X) reconstruction ofhigh-aspect helical motifs. In order to prevent unphysical disassembly of the DA In the case of fitting globular structures (M = 1), the configurationduring thefittingprocedure,weretaintheDAs random-movement vector r (T) is significantly simplified rand close to the radial centre of mass (RCOM) of a given accordingto configuration.Weachievethisbyintroducingapotentialwell such that the DAs are weighted as a function of their radial 0 1 D TRand e X x x distance from the centre of mass rCOM. DAs inside the rrandðTÞ¼@DXTRandyeyA; ð6Þ potential well should be unaffected, soithat they move freely D TRand e X z z regardless of their position. Once DAs drift towards the well border,a force should push them back towards the centre of wherethecurrentdiameterD isnowevaluatedoverallthree X mass, hence avoiding radial disassembly. To find an adequate directions, such that equation (4) now also includes the mathematicaldescriptionofthispotentialfield’wespecified contribution ofthe DAs’ current zpositions z2. i the following criteria: (i) radial continuity to avoid non- linearities; (ii) asymptotic behaviour, such that ’ðrCOM !0Þ!0 and ’ðrCOM !1Þ!1; (iii) scalability in 4.3. Homogeneity – the d parameter i i N12 magnitude (0 < ’ < 1); and (iv) scalable potential-well size The random nature of DA movements causes a side effect andsteepness.Basedon theserequirements (and inspired by withregardtotheuniquenessofthefittedmodel:asthereare the potential field for a Gaussian distribution of electrostatic infinite possibilities for describing a given structure by charges;Schlick,2010),weusethemathematicalpropertiesof randomly filling it with point scatterers, the terminology of a theerrorfunctionerfðxÞtodefinearadial-symmetricpotential unique model is not reasonable in this context. However, well acting on each DA. As a result, we obtain the radial certain criteria related to the homogeneity of the DA config- compactness parameter RC(X) according to uration make a fitted DA reconstruction, in a physical sense, veryunlikely.Thesescenariosare(i)high-densityDAclusters 1 XNBB 1(cid:11) (cid:9)(cid:2)rCOM (cid:2)þ1(cid:10)(cid:12) and(ii)singledisconnectedDAsfarawayfromtheremaining RCðXÞ¼ 1þerf i (cid:3) ; ð7Þ N 2 D 2 configuration. BB i¼1 X We optimize the homogeneity throughout the fitting process by analogy with the established looseness parameter whererCOMrepresentstheradialdistancebetweentheRCOM i infixed-gridsystems(Svergun,1999;Franke&Svergun,2009; of configuration X and the ith DA (in the case of globular Koutsioubas & Pe´rez, 2013), namely by evaluating the local geometries, rCOM represents the distance between the centre i vicinityaroundeachDA.Inthesefixed-gridimplementations, ofmassofconfigurationXandtheithDA).Withregardtothe the distance between neighbouring DAs is known such that specifications defined above, this formalism fulfils the above- onlythenumberofcontactsneedstobecounted.Inourgrid- mentioned points (i)–(iv) by means of the scaling parameter freecase,weusethesameprincipleinaninvertedmanner:we D (which is evaluated after every DA movement). In parti- X assume an ideal close-packed condition with 12 neighbours cular,forpoint(iv),scalablepotential-wellsizeandsteepness, andcalculatetheirmeandistancetothecentralDA,resulting the potential-well boundary (half-height position) will be in the parameter d . In the extreme case of a DAwithin a foundatðrCOMÞ=D =((cid:2)+1)/2(cid:2)=0.66,whereitsderivativeis N12 i X high-density cluster [scenario (i) above], d will be very 1/(cid:2).Inabsolute terms,foraverycompactstructure suchasa N12 small.Ontheotherhand,forasinglefree-floatingDAthatis cylinderwithasmoothsurfaceoraninfinitelythincylindrical far away from the core DA assembly [scenario (ii)], d will shell, RC(X) will be approximately 0.06 or 0.24, respectively. N12 be very large. The magnitude of d is hence inversely Each single atom moving further away from D will cause N12 X proportional to the DA density around a given DA. Equally, RC(X) toincrease towards 1. theaveraged overthefullDAconfigurationd denotes We account for this radial compactness throughout the N12 N12,X an inverse measure of the mean DA density of the config- fitting process by using RC(X) scaled by the compactness uration. weight (cid:9) as a regularization term. Thus, the algorithm in fact Weusethed parameterforatwofoldpurpose.Inorder minimizes the function N12 toavoidscenario(i),wedonotallowDAstocomecloserthan 0.1hd i. This acts as a hard-contact limit so that we fðXÞ¼(cid:5)2þj(cid:9)jRCðXÞ; ð8Þ N12,X circumvent DA clustering and therefore unfeasible singula- ritiesthroughoutthefittingprocess.Inordertoavoidscenario instead ofonly (cid:5)2alone. (ii), we repeatedly force DAs with d > 2hd i that are As an ultimate radial boundary, similar to the search- N12 N12,X outside the mean radial distance h|r|i to move towards the volume diameter in DAMMIN (Svergun, 1999), we apply a X centreofmassoftheclosestfractionofDAs.Thisavoidsfree criticaldiameterD =2D suchthatsingleDAsmovingfar X,crit X floatingofasingleDAandthereforeforcestheDAstoremain away from the RCOM are repeatedly forced into a more inacompactconfiguration.Adetaileddescriptionofhowthe compact configuration (see Section S3 and Fig. S27 for a d parameterisimplementedinthefittingalgorithmcanbe detailed description of how the radial compactness is imple- N12 found in Section S3 and Fig.S27. mented in the algorithm). 395 IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling research papers 4.5. Model resolution and uniqueness starting configuration according to a user-defined diameter, the building-block stacking distance and the number of DAs The most obvious visual side effect of using random per building block. Once started, the fitting algorithm under- movementsisrelatedtotheoutersurfaceofthefittedmodel. goesN iterations:startingfromatemperatureT ,thesystem For example, if DAs can only move on an artificial grid, the k 0 coolsdownasdefinedbythequenchingcoefficientq (0<q fittedconfigurationofaglobularparticlewillpresentasurface T T <1)suchthatthecurrenttemperatureatagiveniterationkis smoothness according to the lattice planes of the grid. As T =T qk(werecommendthedefaultvaluesN =100,T =1 already discussed above, allowing random DA movements k 0 T k 0 andq =0.99asaninitialsetofparametersforconvergence). resultsininfinitepossibilitiesforrepresentingagivenparticle T Ineachkiteration,alltheDAsarerandomlymovedusingthe volumeandthusalsoitssurface.Consequently,thesurfaceof random-movement generator (see Section 4.2) under the a random-movement fitted configuration will be significantly restraints explained in Section 4.3, such that a movement is rougherthanacorrespondingfixed-gridmodel.However,this only accepted if (i) the new DA position complies with the increased surface roughness is just a visual symptom of the hard-contact limit 0.1hd i (if not, up to 100 new move- information content provided by the scattering curve, as we N12,X mentsareconsidered)and(ii)animprovementinthefunction discuss in the following. f(X) is found [see equation (8)]. After each k iteration the In absolute terms, we can expect to end up with a fitted sequence of DAs is randomly mixed to avoid sequential configuration that presents structural features, both on the biasing ofthe algorithm. surface and within the volume, that are below the resolution Throughoutthefittingprocedure,wepursuetheconceptof limitoftheexperimentaldata(d =(cid:2)/q ,whereq isthe min max max antifragility (Taleb & Douady, 2013), a concept applicable to upper angular range of the experimental data). This implies metaheuristic optimization (Boussaı¨d et al., 2013; Gogna & that e.g. thin helical tapes require a large accessible angular Tayal,2013):theconvergingsystemisrepeatedlyforcedoutof range in order to be resolved properly.If this is not the case, itslocalminimumsuchthataglobalminimumismorelikelyto the retrieved model is at high risk of being over-interpreted. On a less obvious note, a low angular resolution can further lead to artefacts within the configuration that might not be seen by a common surface representation (we address this topic in moredetailin Section S2). A common technique to avoid misinterpretation of such artefacts is to test the reproducibility of the reconstruction (Volkov & Svergun, 2003). This approach brings a series of advantages. First, the overall stability and reliability of the reconstruction from a given data set are assessed. Second, a consecutive averaging process of all reconstructions projects the DAs onto an artificial occupancy-weighted grid, which provides a straightforward procedure to determine DA validityandvolumeinhomogeneity.Third,thisoccupancymap helps to identify structural artefacts in single reconstructions caused by insufficient information content of the scattering data, thus avoiding over interpretation. Fourth, the recon- struction of an arbitrary shape from scattering data is a (highly) underdetermined problem so it is not guaranteed to obtain a unique result from a given fitting algorithm. Performing a reproducibility analysis when reconstructing a structural motif from experimental scattering data, in parti- cular when the information content and validity of the data are questionable, is hence highly recommended. In quantita- tive terms, the reproducibility of a reconstruction may be judgedbythemeannormalizedspatialdiscrepancyhNSDiof parallel reconstructions, which represents an measure of dissimilarity (hNSDi = 0 for identical models) for the inde- pendentruns (Volkov &Svergun, 2003). 4.6. Implementation Figure 2 We implemented the fitting algorithm in the computer Scattering curves computed from the seemingly endless model geome- triesA–GshowninFig.3(redcircles;seeAppendixAformodeldetails) program SasHel, a Qt graphical user interface (GUI) written andfitsfromthereconstruction(blacklines).Forbettervisualization,the in C++ that allows user interaction. When starting a fitting scatteringpatternsareshiftedverticallyanderrorbarshavebeenomitted procedure, the program generates a random cylindrical (seeFig.S14forhigh-qmagnificationsofthesecurves). 396 BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401 research papers Figure3 Theoreticalandreconstructedthree-dimensionalmodelsoftheseeminglyendlessgeometriesusedhere.Thenumbersbelowthelabelsdenotethemean normalized spatial discrepancy hNSDi, obtained from eight independent reconstructions (random artificial cylinder: hNSDi = 1.03 (cid:7) 0.01, random artificialsingle-strandhelix:hNSDi=1.06(cid:7)0.01).ThePDDFsanddPDDFsusedtodeterminethestackingdistancebetweenthebuildingblockscanbe foundinFigs.S12andS13.SeeFig.2forthecorrespondingscatteringintensitiesandFig.S15forpointrepresentations,asdenotedbythelettersA–G. be found. These non-optimal moves are forced onto the oscillations in the higher-q regime 1 < q < 2nm(cid:3)1 (see configuration without consideration of the damage caused to Fig. S14), which is a resonance effect caused by the stacking themodel,makingthealgorithmlesspronetodistortionofthe nature ofidentical building blocks (see Section S2.3). solutionspacebyregularizationterms.Asummaryofthefull Fig.3presentsthecorrespondingthree-dimensionalmodels algorithm, including an example implementation in pseudo- and reconstructions (see Fig. S15 for point representations). code,can be found in Section S3 andFig.S27. Models A and B show a circular cross section in agreement TheprogramSasHelfurtherincludesa‘parallelmode’that with the model, while in the case of the cylindrical shell the runs a unique reconstruction on each available CPU core, so emptycoreispresent.Further,therectangularcrosssectionof the model’s validity and reproducibility can easily be tested model C is visible in the reconstructed model. However, the using e.g.DAMAVER (Volkov & Svergun,2003). sharp corners are not fully resolved, which is most likely the effectofinsufficientresolutionfromthescatteringdata(d ’ res 1.6nmcomparedwiththerectangularcrosssectiona(cid:6)b=4 5. Model examples (cid:6) 20nm). In cases D–G, the helical fingerprints (single- or Totesttheimplementedalgorithm,wesimulatedanumberof double-strandednature)arewellresolvedinthereconstructed scattering intensities from seemingly endless bodies (see models. Cases D and E, and F and G, present noticeable Appendix A for detailed dimensions). All reconstructions differences in their cross sections, helping to distinguish wereperformedusingthedefaultfittingparametersasfollows: between a helical filament (empty core) and a helical tape starting temperature T = 1, quenching coefficient q = 0.99, (filledcore).Further,variationsinthethicknessofthehelical 0 T numberofiterationsN =200,helicalbiasparameter(cid:7) =0.3, strands can be observed. The overall agreement between k compactness weight (cid:9) = 1, DA form factor diameter D = modelandreconstructionforthemainfeaturesdemonstrates DA 0.2nm and number of stacked building blocks M = 10. The thefunctionalityofthealgorithmforsimilarseeminglyendless dimensions of the initial random configuration, including the bodies. stackingdistanceofthebuildingblocks,weredeterminedfrom Inordertoevaluatethestabilityandreproducibilityofthe the relative dPDDFs (see Figs. S12 and S13). For all recon- reconstruction algorithm, eight independent runs for each structions,weused500–800DAsperbuildingblock,resulting model geometry were analysed using DAMAVER according incomputationtimesforeachrunbetween20and60minona to the literature (Volkov & Svergun, 2003). Accordingly, we standardworkstation, respectively. obtainedthemeannormalizedspatialdiscrepancyhNSDiasa The scattering intensities of the theoretical models are measure of the similarity of independent reconstructions of shown in Fig. 2, along with the fits from the reconstructions. each model (hNSDi = 0 for identical configurations). As For all geometries used, we find agreement between the showninFig.3,thehNSDiofmodelsA–Ggraduallyincreases theoreticalandfittedscatteringintensities(the(cid:5)2valueofall from 1.01 to 1.37 with the complexity of the model. On an fitsisbelowthethresholdof0.1).Yet,casesA–Dshowslight absolutescale,thesehNSDivaluesarehigherthanthoseinthe 397 IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling research papers literature (Volkov & Svergun, 2003), where stable recon- structionswerelinkedtoanhNSDiof0.4–0.7.Thereasonfor thisdifferenceisfoundinthegrid-freenatureofourapproach: NSD values for grid-free programs such as GASBOR are generally higher (Svergun et al., 2001). To validate our find- ings, we constructed eight randomly filled artificial cylinders and single-strand helices (N = 700), yielding hNSDi values DA of 1.03 (cid:7) 0.01 and 1.06 (cid:7) 0.01, respectively. These values, in contextwiththehNSDioftheabovereconstructions,provide a reference for grid-free models where an hNSDi between 1 and 1.4 isobserved. ItisalsopossibletorunthefittingalgorithmusingonlyM=1 stacks, which corresponds to the case of the building block alone.Thus,thesameprogramcanalsobeusedtofitglobular particles.Wetestedthisoptioninasimilarwaytothemethod outlined above by simulating a number of scattering inten- sities from globular bodies (see Appendix A for detailed dimensions).Forallreconstructions,weusedthesamedefault fittingparametersasmentionedpreviously.Thedimensionsof the initial random configurations were determined from the maximum dimension found in the relative PDDFs (see Fig.S12).However,weincreasedthenumberofDAsto800– 1200, now resulting in computation times between 5 and 10min per run on a standardworkstation, respectively. Thescatteringintensitiesofthetheoreticalmodelsandthe fits from the reconstructions are shown in Fig. 4(a). For all geometries used, we find agreement between the theoretical andfittedscatteringintensities(the(cid:5)2valueofallfitsisbelow the threshold of 0.1). Evidently, the oscillations previously foundfortherepeatingmotifinthehigher-angleregime (see Fig. 2) are not present. The three-dimensional models and reconstructions corresponding to the scattering patterns are shown in Fig. 4(b) (see Fig. S16 for point representations). Alsointhiscase,thefittedmorphologiesclearlyrepresentthe corresponding models,includingthehollowgeometries Iand L.ThehNSDivaluesforeightindependentreconstructionsof each geometry are within the range 0.96–1.28, as denoted in Fig. 4(b) [in relative terms, the hNSDi values for eight Figure 4 randomlyfilledartificialspheresandcubes(NBB=1500)were (a) Scattering curves computed from globular model geometries (red 1.04 (cid:7) 0.01 and 1.06 (cid:7) 0.01, respectively]. These findings circles;seeAppendixAformodeldetails)andfitsfromthereconstruc- validate the use of the algorithm for globular bodies as well. tion (black lines). For better visualization, the scattering patterns are shifted vertically and error bars have been omitted. The PDDFs of all modelscanbefoundinFig.S12.(b)Theoreticalandreconstructedthree- dimensionalmodelsoftheglobulargeometriesusedhere(seeFig.S16for 6. Experimental examples point representations of the reconstructions). The numbers below the As final example, we applied the described reconstruction labelsdenotethemeannormalizedspatialdiscrepancyhNSDi,obtained fromeightindependentreconstructions(randomartificialsphere:hNSDi procedure to experimental data for a self-assembled peptide =1.04(cid:7)0.01,randomartificialcube:hNSDi=1.06(cid:7)0.01). double-strand helix (Kornmueller et al., 2015). Prior to the fitting process, we determined a building-block stack spacing of53nmfromthecorrespondingPDDF(seeinsetinFig.5and independenttapeswithinthebuildingblock(seeFig.S17afor Fig.S13). The scatteringdata were then fittedusing 800 DAs pointrepresentations).However,thetwotapesdonotappear over the angular range 0.08 < q < 2.14nm(cid:3)1, resulting in a to be symmetric along the z direction, suggesting a displace- real-spaceresolutionofapproximately(cid:2)/q =1.5nm,where mentangleof’6¼180(cid:8)betweenthem.Thecrosssectionofthe max the default fitting parameters according to Section 4 were helical tapes presents a rather rough surface that does not used. allow more detailed interpretation, as is typical for such AsshowninFig.5(a),weagainfindagreementbetweenthe random-movementDAmodels.Nevertheless,acomparisonof experimental and fitted scattering curves ((cid:5)2 = 0.08). The thereconstructionwiththemodelaccordingtothepreviously correspondingreal-spacereconstruction(Fig.5b)presentstwo published dimensions (Kornmueller et al., 2015) gives good 398 BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401 research papers Figure5 Verificationofthereconstructionalgorithmusingexperimentalscatteringdata.(a)Experimentalscatteringdata(errorbarsomittedforclarity)and fittedcurvefromthereconstructionoftheself-assembledpeptidedoublehelix.ThecorrespondingPDDFisshownintheinset,whereasthedPDDFused todeterminethestackingdistancecanbefoundinFig.S13.(b)Orthogonalviewsofthemodelreconstructedfromthescatteringpatterninpanel(a) (green)comparedwiththepreviouslypublishedstructure(blue).(c)Experimentaldata,fittedcurveandPDDFforalcoholdehydrogenase1(ADH).(d) Orthogonalviewsofthemodelreconstructedfromthescatteringpatterninpanel(c)(green)comparedwiththecrystalstructuremodel(blue).SeeFig. S17forpointrepresentationsofthereconstructionsshowninpanels(b)and(d),andFig.S18foranumericalstabilityanalysisofbothreconstructions. agreement (see the red model in Fig. 5c). A movie of the showninFig.3,thehNSDiwasfoundtobe1.27(cid:7)0.02,hence rotatinghelicescanbefoundinthesupportinginformation.To confirming the reproducibility of the reconstruction. obtain a measure of the uniqueness of the final model, we Analogous to the above, we further applied the recon- repeatedthefittingproceduretoendupwith16independent struction procedure to experimental data for the globular reconstructions. The average of all 16 models (Volkov & protein alcohol dehydrogenase 1 (ADH) in phosphate- Svergun,2003),asshowninFig.S18(a),isconsistentwiththe buffered saline at pH 7.5. The data set was taken from the singlereconstruction.Inagreementwiththereferencevalues SASBDB database (Valentini et al., 2015; date of download 399 IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling
Description: