ebook img

Self-assembly, modularity and physical complexity PDF

1.1 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Self-assembly, modularity and physical complexity

Self-assembly, modularity and physical complexity S. E. Ahnert,1 I. G. Johnston,2 T. M. A. Fink,3,4,5 J. P. K. Doye,6 and A. A. Louis2 1Theory of Condensed Matter, Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, UK 2Rudolf Peierls Centre for Theoretical Physics, University of Oxford, 1 Keble Road, Oxford OX1 3NP, UK 3CNRS UMR144, INSERM U900, Institut Curie, 26 rue d’Ulm, Paris, F-75248 France 4Mines ParisTech, Fontainebleau, F-77300 France 5London Institute for Mathematical Sciences, 22 South Audley St, London W1K 2NY, UK 0 6Physical & Theoretical Chemistry Laboratory, Department of Chemisty, 1 University of Oxford, South Parks Road, Oxford OX1 3QZ, UK 0 2 We present a quantitative measure of physical complexity, based on the amount of information n required to build a given physical structure through self-assembly. Our procedure can be adapted a to any given geometry, and thus to any given type of physical system. We illustrate our approach J using self-assembling polyominoes, and demonstrate the breadth of its potential applications by 2 quantifyingthephysicalcomplexityofmoleculesandproteincomplexes. Thismeasureisparticularly 1 wellsuitedforthedetectionofsymmetryandmodularityintheunderlyingstructure,andallowsfor a quantitative definition of structural modularity. Furthermore we use our approach to show that ] symmetric and modular structures are favoured in biological self-assembly, for example of protein h complexes. Lastly,wealsointroducethenotionsofjoint,mutualandconditionalcomplexity,which c providea useful distance measure between physical structures. e m - ALGORITHMIC COMPLEXITY the theoretical study of self-assembling structures. This t a framework can be used to study the properties of real t s self-assembling systems, but, more generally, it can also Morethanfortyyearsago,Kolmogorov[1]andChaitin . t be used to measure the physical complexity of any con- a [2]laidthefoundationsofalgorithmicinformationtheory, m by introducing the concept of algorithmic information struct, self-assembling or not. The exact nature of the self-assemblyframeworkdependsontheunderlyingphys- - content, or Kolmogorov complexity, for a given string d of information [3]. This measure of complexity is de- icalsystem,but it alwayscontains twobasic ingredients: n a set of building blocks and a set of rules. We shall call fined as the length of the shortest possible programon a o this combination an assembly kit S. Eachbuilding block universal computer that will output the string in ques- c ihasf interfaces,whichtypicallyaresubjecttogeomet- [ tion. Here weproposea conceptuallyanalogousmeasure i ric constraints (depending on the physical system). At- of the complexity of any connected physical structure. 2 tachedtoeachinterfacej ofagivenbuildingblockiisan Instead of a universal computer which translates a pro- v integer χ ∈ [1,...,c]. The c possible values of these in- 4 gram into a string of information, we consider a general ij tegers are the colours of these interfaces. The number of 6 framework of self-assembly rules, which act together to 4 createa physicalobject. The ‘program’nowis oursetof distinctcolouringsofthebuildingblocksdependsentirely 3 onthegeometryoftheproblem. Thesecondingredientof self-assemblybuildingblocksandrules,the‘computer’is 2. given by the physical interactions of the self-assembling the assembly kit is the set of rules, which takes the form 1 ofaninteractionmatrixbetweencolours. Inthesimplest building blocks, and the ‘output’ is the final structure. 9 casethismatrixisbinary,where1signifiesattractionand Using this approach we investigate the physical com- 0 0signifiesnointeractionatall. Manymoresophisticated : plexity of shapes in two and three dimensions, includ- v interactionmatricesinvolvingrepulsionandacontinuous ing polyominoes, molecules and protein complexes. Our i spectrum of energies are easily imaginable. X work generalizes ideas first explored in [4, 5], and opens For any system of self-assembling particles we need to r them up to a wide range of applications. Furthermore, a in the context of protein complexes it offers the kind of also specify a model for the actual assembly process. A convenientchoiceisamodelassumingasinglenucleusin biological application of information-theoretic concepts solution [4], which makes the assumption that each dis- demanded in [6]. joint object has one fixed nucleus building block which is surrounded by a solution containing a freely moving populationcontainingmanycopiesofeachtypeofbuild- SELF-ASSEMBLY KIT ing block. Each time step (i) a fixed building block, (ii) a site adjacent site to it, (iii) a random rotational ori- Therearemanyexamplesofself-assemblingstructures entation, and (iv) a building block from the solution are in physics, chemistry and biology [7]. Examples include chosenatrandom,andthe new,randomlyrotatedbuild- thin films [8], micelles [9], viruses [10, 11] and DNA [12– ingblockbecomesfixedtoitspositioniftherulesallowit. 17]. Our aim is to introduce a general framework for Note that some assembly kits always assemble into the 2 deterministic non-deterministic information I(S ) to describe it in some given language A L. Our aim is to minimize this quantity, as we define C C the length of the description of the minimum assembly A C C kit S˜ as the complexity K(A) of structure A: A A A C A B A C B B A K(A)=I(S˜A)=minI(SA) C C SA A C B C in analogy to the concept of Kolmogorov complexity. B C C C C Anysymmetryormodularitywhichthe structureAcon- B B B C tains decreases the amount of information required to C A B describe the structure and will therefore be reflected in A A C its minimum assembly kit S˜ , and by extension in the A B C value of K(A). C B If a minimum assembly kit is deterministic, an inter- action matrix A (with elements a ) between a total of c ij FIG. 1: An example of deterministic and non-deterministic colours, of which c self-interact, can be rewritten as: s self-assembly kits, using simple 2D lattice structures (poly- ominoes). Inboth cases, colours A andB attract each other, a =[1−(imod2)]δ +(imod2)δ butCattractsneitherAnorB.Nocolourattractsitself. The ij i(j+1) i(j−1) kitontheleftwillalwaysassembleintothecrossshapewhile fori≤c−c , anda =δ otherwise,sothat onecolour s ij ij that on the right will assemble into an irregular cluster, as always only interacts with one other colour. With this thereare several ways in which the two blocks can attach. constraint, the amount of information, in bits, required to describe a self-assembly kit S , with b building block A same shape - these we call ‘deterministic’ - while ones types, is: which contain ambiguous rules are ‘non-deterministic’. b See Figure 1 for an example of a deterministic and a I(S )=log (c +1)+ c log c+log F (1) non-deterministic self-assembly kit. A 2 s i 2 2 i Xi=1 As a simple example of a self-assembling system, we will consider self-assembling polyominoes. A polyomino The first term relates to the number of self-interacting (also known as a lattice animal) is a set of connected colours,the secondmeasurestheinformationrequiredto sitesona(typicallysquare)lattice[18]. Theseconnected describe which ci colours out of the total of c colours sites are our self-assembly building blocks. Every build- appear on building block i, and the third term log2Fi ingblockhasfoursides(sothatf =4foralli),whichare measurestheinformationdescribingthedistinctarrange- i painted with one of c colours. These colours can attract ment of the ci colours on the fi faces of building block eachotherornot,asencodedinac×cbinaryinteraction i. For a general building block with fi labelled faces, Fi matrix. Each distinct way of colouring a building block takes the form of: corresponds to a different building block type. We do not regard rotated colourings as distinct. The geometry fi−ci+1fi−ci+2−k1 fi−ci+(ci−1)−Σ′ fi! F(c ,f )= ... of the 2D lattice gives rise to a particular set of build- i i ci k ! ing block colourings in the context of self-assembly. If kX1=1 kX2=1 kciX−1=1 Qm=1 m we have c colours,the total number of such colourings is where Σ′ = ci−2k(i), and the k(i) signify the number [19]: j=1 j j of times coloPur j occurs on block i. Nc =(c4+c2+2c)/4 For polyominoes Fi = F(ci) = Nc′i, where Nc′i is the number of necklaces with exactly c colours, given by i These particular colourings are also known as necklaces, whichcanbedefinedasequivalenceclassesofstringsun- N′ =N −ci−1 ci N′ der rotation [19]. The definition of necklaces used here ci ci (cid:18) k (cid:19) k Xk=1 assumes that the building blocks have a fixed chirality - inother wordsthatthe necklaceswhichthe coloursform with N′ = 1. It follows that N′ = 4, N′ = 9, and 1 2 3 on the building blocks are fixed.[43] N′ =6. Asbefore,the complexityK(A)ofpolyominoA 4 is the minimum of I(S ) over all possible assembly kits A S . Note that Wang tiles [20] are a special case of self- A THE MINIMUM KIT assemblingpolyominoes. Thetilesystemdescribedin[5] is also similar to our framework for the case of polyomi- Everydeterministic assemblykit S ,whichalwaysas- nos, but (like Wang tiles) only considers self-interacting A sembles into a structure A, requires a certain amount of colours, and treats rotated tiles as distinct. As a result 3 our encoding, based on necklaces, makes symmetry and (a) Classifythe(sub)graphaccordingtothenum- modularity in the structure more directly measurable. ber of connections and (depending on the ge- If the faces are geometrically unconstrained - as one ometry) the arrangement of connections. would imagine for a node with a set of freely moving (b) Label all nodes which are not yet labelled links-andhenceunlabelled,wewouldonlyneedtospec- and which have exactly one unlabelled node ifyhowmuchthereisofeachcolour. Thiscanbewritten among their neighbours. The new labels dis- usingF = cik(i),sothatlog F = cilog k(i). How- i j j 2 i j 2 j tinguish nodes according to their species as ever, this oQnly works under the condPition that multiple well as the topologically distinct label distri- connectionsbetweenthe samepairofbuildingblocksare butions among their neighbours. prohibited. The general algorithm we use to find the minimum (c) Repeat step 5b until all nodes are labelled or assemblykitS˜, andthus the complexity K,forpolyomi- no more nodes can be labelled. noes and other structures is described in the following (d) All labelled nodes we define as category 1 section. nodes and any remaining unlabelled nodes (i.e. nodes withatleasttwounlabelledneigh- bours) are defined as category 2 nodes. A GENERAL MEASURE OF STRUCTURAL COMPLEXITY (e) Label all category 2 nodes simultaneously ac- cording to their neighbourhoods. Below we describe a general algorithm for minimizing (f) Repeat step 5e, using the previous labellings the assembly kit size for a connected physical structure todistinguishneighbourhoods,untillabellings without relying on steric effects. Taking these into ac- are stable. count can minimize the assembly kit even further, but theircomputationishighlydependentonthegeometryof (g) Thesefinallabels,fornodesinbothcategories, thesystemandinmostcasesnon-trivial(seeDiscussion). denote the building block types. The number Note also that in some structures, such as polyominoes, of final labels, or types, is b. These can be someedgesofthe contactgraphcanbe redundantinthe subdivided in to b category 1 building block 1 context of the assembly process. Whether contact graph types and b category 2 building block types. 2 edgesingeneralcanbe redundantornotdepends onthe The category 2 type of block i is denoted t . i nature of the structure and the assumptions connected (h) Thedegreeofeachbuildingblocktypeiinthe to the self-assembly of that structure (see Discussion). contact graph (or subgraph) is the number of Similarly, when interfaces are defined by geometry, as its interfaces f . forthefoursidesofapolyominobuildingblock,itmakes i sensetointroduceaneutralcolour(ν =1below). Insys- (i) The total number of colours, including ν ∈ temswithavaryingnumberofinterfacesonthebuilding {0,1} neutral colours, is c = 2(b −1)+ν + 1 blocks, neutral colours are usually not required (ν =0). b2 1− z (1−(a δ δ )) . The Tominimizetheassemblykitwetakethefollowingsteps: i,j=1 k,l=1 kl itk jtl (cid:16) (cid:17) Psum expressiQon gives the number of different 1. Divide the structure into building blocks (usually types of interfaces which occur between cate- a naturaldivision). The number of building blocks gory 2 building block types[44]. The number is the size of the structure, denoted z. ofcoloursci onbuildingblockiisequaltothe number of building block types in its contact 2. Determine the equivalence of these units in terms graph neighbour set. ofany additionalcriteria(e.g. types ofatoms,pro- (j) Using b, c, {f } and {c } in equation (1), cal- teins). This categorization is the species of build- i i culate the information I required to specify ing block. this assembly kit, and thus the complexity K of the structure. 3. Establishacontactgrapha fortheunits(insome ij cases,suchasmolecules, this may requiresetting a distance cutoff). 6. If edges can be redundant: Minimize this quantity over all spanning subgraphs. 4. If edges can be redundant: Considerthespaceofall spanning subgraphs of this graph. Figure2illustratesthecrucialsteps5bto5jforapoly- omino. Figure 3 illustrates how the complexity value K 5. For the contactgraph(in the case of no redundant reflects symmetry and modularity present in the struc- edges)oreachsubgraph(ifredundantedgesexist): ture. 4 category 1 labelling new labels category 2 labelling new labels large size small size A) E) 1 5 6 5 1 1 2 3 4 4 3 2 1 4 3 1 5 6 5 1 5 size: 17 size: 9 y complexity: 228.7 bits complexity: 79.5 bits B) 6 arit A) C) 1 1 ul 8 13 7 2 3 2 1 1 1 1 d 1 1 F) 1 7 6 5 1 5 6 5 mo 1 3 1 5 C) 1 1 1 1 21 3 54 6 74 3 21 1 54 3 4 etry/ 9 4 1 6 4 1 1 21 21 1 12 1 5 6 5 4 67 mm 10 1 2 y 1 5 11 12 6 1 s D) 1 1 G) w 1 21 3 3 21 1 3 2 1 12 3 74 6 54 3 21 1 54 3 6 45 lo large size and low symmetry small size but low symmetry 1 5 6 7 1 7 results in high complexity results in medium complexity 4 7 (labels unchanged) 7 6 5 6 size: 17 size: 9 FIG.2: Anillustration of thecrucial steps5b to5j oftheal- arity B)complexity: 74.9 bits D)complexity: 19.1 bits gorithmforminimizingtheassemblykitsize,inthiscasefora ul d 2 1 2 1 1 polyomino. Ineveryiterationofcategory1labellings(LEFT), o m allunlabellednodeswithexactlyoneunlabelledneighbourare y/ 3 3 2 given labels which distinguish them according to their topo- etr 4 5 6 5 4 1 2 3 2 1 logically distinct neighbourhoods of unlabelled and labelled m m 3 3 2 tiles. This procedureis repeated untilnomore blockscan be y labelledinthisway. Theremainingblocksaregivencategory h s 1 2 1 2 1 2labellings (RIGHT)whichare appliedsimultaneously, with g hi each label distinguishing the topological neighbourhoods of large size but high modularity small size and high symmetry results in medium complexity results in low complexity thetilesinthepreviousiteration. Notethatinthelast itera- tion the labellings have stabilized, and only the interfaces of thebuildingblocktypesareupdated. Forstructuresinwhich FIG.3: Thecomplexityvaluesofthesefourpolyominoshapes edges can be redundant,this operation can be performed for illustratewhytheself-assemblyapproachisaneffectivewayof all spanningsubgraphs of thestructure’s connectivity graph, measuring symmetryand modularity without requiring prior whichfurtherreducesthecomplexity. (Inpolyominoes,edges assumptions. If two shapes are of equal size, the one with canberedundant,buttherearenospanningsubgraphsinthe moresymmetryandmodularityhasalowercomplexityvalue above example.) - compare A with B, and C with D. If on the other hand, twoshapesareofsimilar complexity,butofdifferentsize,the larger one will be more symmetric or modular (compare B APPLICATIONS and C). The self-assembly approach can be used to calculate complexity values for any physical structure. In order tiated. This also goes for atoms connected by different to demonstratethe broadrangeofpotentialapplications bond types. For example, in glutamine (see Figure 4), we determine the complexity of (a) molecules and (b) the oxygen atom connected with a double bond is a leaf protein complexes. oftheself-assemblytreejustlikeanyofthe(implicit)hy- The problem of molecular complexity has been stud- drogen atoms, but it requires a separate building block. iedextensively overthe pastseventyyears,starting with The two molecules in our example of Figure 4 are the workbyPo´lya[21]andRashevskyamongothers[22,23], amino acid glutamine and the explosive nitroglycerine, andculminating in a seminalpaper by Bertz [24]. These which both consist of 20 atoms. Nitroglycerine however approaches are based on Shannon entropy rather than exhibits a much higher degree of modularity, with its algorithmic information theory and focus on symmetries three NO groups, and therefore has a much lower com- 3 rather than the more general concept of modularity. In plexityofK =55.3bitsthantheglutamine,forwhichthe molecules, we take atoms to be the building blocks and valueisK =94.7bits. Note thatnitroglycerinedoesnot chemical bonds to be their interfaces. Simple molecules, exhibit simple three-fold symmetry, but a more subtle, suchasthoseinFigure4,forwhichweareonlyinterested hierarchical modularity. Such structural features would in the bond connectivity, are an example of a structure be harderto discoverusingtraditionalapproachesto the inwhichnoneoftheedgescanberegardedasredundant. measurement of molecular complexity [22–24], which do This is because, unlike for polyominoes, we are not as- nottakeaself-assemblyperspectiveandrelyonShannon suming anyinherentgeometryfor the building blocks. If entropyratherthanKolmogorovcomplexityasameasure twoatomsplaythesameself-assemblyrolebutrepresent of complexity. atoms of different atomic species, they must be differen- Many important biochemical structures are protein 5 a) NOitr-oglycerine O+- mOod-ule ON+- smubosdeutl eof a) 1oel (E. coli chaperonin GroEL) (double) symmetry N+ O N O N+ O O O O O O O copy of N+ O module O O- + N O O- b) 1nlx (P. pratense allergen PHL P 6) module subset of module subset of module b) Glutamine module 1 O O H2N OH mcoodpuyl eo f2 H2N OH NH2O NH2 Omcoodpuyl eo f1 FwIitGh.P5:DWBeidmenetaisfiuerrest1hoeelco(amcphleaxpietryoonfintw,tooppr)oatneidn1cnolmxp(alenxeasl-, module 2 lergen,bottom),whichhave14proteinseach. Thesymmetry of the chaperonin complex means that it has a much lower complexity value of K = 31.5 bits, compared to K = 50.2 FIG. 4: Measuring the complexity of molecules – The explo- bits for the allergen complex. Note that we are assuming sive nitroglycerine (top) and the amino acid glutamine (bot- non-redundant edges in this calculation, so that all building tom) both consist of 20 atoms, but differgreatly in complex- blocksofthechaperonincomplexarecategory2andallbuild- ity. The highly modular structure of nitroglycerine with its ing blocks of the allergen complex are category 1. Further- threeNO3 groupsmeansthatitscomplexityvalueK,at52.2 more we do not consider neutral colours (ν = 0), and in the bits, is little more than half that of glutamine (K = 91.0 caseofthechaperonincomplexwehavethreeself-interacting bits). Note that nitroglycerine does not have simple three- fold symmetry, but a more subtle modular structure, which colours (cs = 3). Note also that both complexes are homo- mers, i.e. they only haveone typeof subunit. theself-assemblyapproachfullyreveals. Notethatwedonot consider neutral colours in this structure(ν =0). K = 31.5 bits, versus K = 50.2 bits for the allergen complexes, consisting of several individually formed and (which is still somewhat modular). folded protein subunits bound together to produce func- More complex protein structures require more unique tional cellular machinery. These subunits may include inter-subunit bonds types, compared to less complex different types of protein and several copies of the same structures which can re-use bonds and be constructed protein. The physical structure of protein complexes, as through simple repetition of subunits. As an increase withproteinthemselves,isimportantindeterminingthe in bond types corresponds biologically to the presence functionality of the complex. The manner in which the of more unique bonding sites on subunit proteins, more subunits bond to form the final complex is known as the complex protein structures can be thought of as requir- quaternary structure of the complex. The 3DComplex ing more evolutionary innovation to produce and would database[25] contains a description of the quaternary therefore be expected to occur less frequently in biolog- structuresofthousandsofproteincomplexes,intermsof ical organisms [26, 27]. This hypothesis is confirmed by subunit type and inter-subunit bonding. If we have two Figure 6, which shows a histogram of complexity val- proteins which play the same role in the self-assembling ues – normalized by the size of the protein complex, to structure but are different proteins, we can choose to avoid size effects – for the 15733 protein complexes in countthemastwodifferentbuildingblocks(analogousto the 3DComplex database [25]. The distribution closely theaforementioneddistinctionbetweenatomicspeciesin (R2 =0.93) follows a power-law decay. molecules). In the following analyses we are only inter- In both of these cases - molecules and protein com- estedintheconnectivityofproteins(equivalenttotheQS plexes - we assume geometricallyunconstrainedfaces for Topology level in the 3DComplex database), and there- the building blocks; in other words, we use F = cik . i j j fore do not distinguish between different proteins. The While the chemical bonds of atoms and the interfQaces of two protein complexes in our example of Figure 5 are proteins are in fact usually constrained, this information a chaperonin complex (E. coli chaperonin GroEL; PDB is not part of the structural formula of the molecule or identifier: 1oel)andanallergencomplex(P. pratenseal- the contact graph of the protein complex. If this ad- lergen PHL P 6; PDB identifier: 1nlx). Both consist of ditional level of resolution is required, a more realistic 14 proteins, but the former displays a much higher de- self-assembly model can be constructed, based on the gree of symmetry and a much lower complexity value of exact three-dimensional characteristics of the atoms or 6 10000 11c g0/ e ssh u=b 0u.n2it0s 12 13 1712001141561911754328629 11 122k31y sboulobcuknsi trsequired MIncordeualsainrigty 1821 1000 1q2v 11i03 qsubunits ocks 10 Number of Proteins: Frequency 100 1c 6/ ss u=b 0u.n2it6s c / s = 10.91ohh mber of Required Bl 1111000000 111111111111111111111111111111111111111111111111111111111111 16 subunits u 1b5s 10 c / s = 40.26 N 60 subunits 1 block required 1 b/z = 1 b/z = 0.5 b/z = 0.1 1 1 10 100 1 10 Complexity / Size Size FIG. 7: (Colour online) The position of the 15733 protein FIG. 6: (Colour online) Histogram of protein quaternary complexes from [25] in the space of b (number of building structure assembly complexity with frequency of occurrence block types)and z (size of thecomplex). Many protein com- in the 3DComplex database. Insets illustrate two pairs of plexesarehighlymodular,andthisistrueacrossawiderange equally sized structures with high and low complexity val- of sizes. In this plot complexes of equal modularity m=z/b ues. 1geh, 1i3q, 1q2v, and 1ohh are the PDB identifiers of thecomplexes. The plot has an R2=0.93 correlation with a lie on a diagonal line with positive gradient. The lines are shown for m=1, 2, and 10 (b/z =1, 0.5, and 0.1). The sizes powerlawdecay. Notethatinthiscasewedonotdistinguish ofthecirclesshowhowmanycomplexeslieatagivenposition between different typesof subunit. (z,b). The insets show two examples (with PDB identifiers 1kyoand 1b5s), with high and low modularities. proteins, and using the F(c ,f ) term specified above. i i MODULARITY Theself-assemblyperspectiveprovidesanintuitivedef- inition of the modularity of a structure: If part of the structure appears several times, it still only needs to be encoded once. This is why modularity and symmetry complexes, we consider two of the outliers in the com- (beingaspecialcaseofmodularity)leadtomoreefficient plexity and modularity histograms, the high-complexity self-assembly kits and a lower value of the complexity 1ohh (Figure 6) and high-modularity 1b5s (Figure 7). measureK. Formallywe candefine the modularitym of 1ohhconsistsoftwocopiesofbovineF1-ATPase(itself a a structure of size z as the average number of times one protein complex) in complex with its regulatory protein of the b different building block types in the minimum IF1[39]. The regulatory protein binds simultaneously to assembly kit is used in the structure, which is simply: bothcopiesofthemaincomplex,butslightlyasymmetri- cally, leading to asymmetric interactions being recorded z m= in the 3DComplex database. This asymmetry results in b extra information being required to describe the com- We can furthermore define a module formally as a con- bined quaternary structure, and the observed high com- nected set of building blocks which appears more than plexity value. 1b5s is a multienzyme complex consist- once in a given structure. Note that modules can over- ing of multiple copies of dihydrolipoyl acetyletransferase lap: A subset of a module could form another module, (E2p)[40]. The E2p protein has the potential to occupy appearing a different number of times than the whole quasi-equivalentpositions,asseeninvirusstructures[41], module. The molecule in Figure 4a illustrates such a andisalsoobservedtoformcubiccomplexes. Thehighly- case. modular, dodecahedral structure exhibited in 1b5s is an The majority of protein complexes in the 3DComplex efficient way of grouping many copies of an active pro- database show high modularity values (Figure 7) with teininageometrythatfacilitatesenzymaticactivity: the a common trend observable along the b/z = 0.5 line, large windows in the structure allow passage of the sub- indicating many proteins consist of structures involving strateandproductbetweentheinnercavityandthesub- two copies of all constituent subunits. strate. The structure of the protein subunits allows this To further illustrate how the complexity K and the structuretoberealisedwithjustonebuildingblocktype, modularitymmeasurethephysicalcomplexityofprotein resulting in high modularity. 7 JOINT, CONDITIONAL, AND MUTUAL Polyominoes Amino acids COMPLEXITY O A) C) If we have two structures A and B with minimum 3 2 1 OH assembly kits S˜A and S˜B, then the joint minimum as- 4 sembly kit S˜A,B is the minimum kit which can assem- 5 4 3 H N ble both structures if an appropriate subset of building 2 2 blocks is chosen. The amount of information required to 1 NH O 2 describethiskitisthejoint complexityK(A,B)ofAand B. Thisdefinitioncaneasilybegeneralizedtomorethan two structures. B) D) O Let us define S˜′ as the subset of S˜ which forms 3 2 1 A A,B structure A, and S˜′ as the subset of S˜ which forms 4 H N B A,B 2 structureB (notethate.g. S˜ isnotnecessarilyequalto 6 2 1 OH A S˜′ duetothecolourminimization),sothatS˜ =S˜′ ∪ 2 A A,B A S˜′ . Furthermore,let us define the conditional minimum 1 B assembly kit S˜ as the set of building blocks we need A|B O NH in additionto S˜′ inorder to formstructure A. Then we 2 B can write: FIG. 8: POLYOMINOES (left): The two polyominoes share S˜ =S˜ \S˜′ A|B A,B B manybuildingblocktypes,withtheonlytwouniqueonesbe- ing blocks 5 and 6 (marked in grey). Hence, the joint set is where \ denotes the set theoretic difference operation. ThedefinitionofS˜ followsaccordingly. Hencewecan S˜A,B = {1,2,3,4,5,6}, the mutual set is S˜A:B = {1,2,3,4} B|A and the conditional sets are: S˜ = {5} and S˜ = {6}. alsodefineaconditional complexityK(A|B),whichisthe A|B B|A Buildingblock5contributesK(A|B)=2log 9+2=8.4bits 2 amount of information needed to describe the building to the complexity K′(A) of the A shape, while block 6 con- blocks in S˜A|B. Because the way we describe the assem- tributes K(B|A) = 4log29 = 12.7 bits to K′(B). It follows bly kit is additive in the number of building blocks, we thereforethatthejointcomplexityisK(A,B)=67.4bitsand can write the mutual complexity is K(A : B) = 46.4 bits, compared to the standalone values of K(A) = K′(A) = 54.7 bits and K(A|B)=K(A,B)−K′(B) K(B) = K′(B) = 59.1 bits (see Figure 3). AMINO ACIDS (right): The two amino acid molecules asparagine (top, C) since K′(B) is the information required to describe the and glutamine (bottom, D) share the amino (NH2) and car- building blocks in S˜B′ . The relationship between K(B) boxyl (CO2H) groups common to all amino acids, as well as and K′(B) is given by the carboxamide group (CONH2). In a self-assembly frame- workthesetwostructureshavecomplexitiesofK(Asn)=74.3 c K′(B)=K(B)+ c log A,B bits and K(Gln) = 91 bits. While K′(Gln) = K(Gln), Xi i 2 cB we have K′(Asn) = 78.0 bits. Because the two molecules share three groups, their joint complexity is not much larger where cA,B is the total number of colours in S˜A,B and than their individual complexities, at K(Asn,Gln) = 104.0 c is the total number of colours in S˜ . Because of the bits, and their mutual complexity is not much smaller, at B B K(Asn:Gln)=65bits,thanthecomplexitiesoftheindivid- minimization of colours, c = max(c ,c ). Hence, if A,B A B ualmolecules. Theirconditionalcomplexitiesarecorrespond- c ≥c , then K′(B)=K(B). B A ingly low, at K(Asn|Gln) = 13 bits and K(Gln|Asn) = 26 Similarly, we can define a mutual minimum assembly bits. The conditional complexities give the amount of infor- kit S˜A:B, which corresponds to the intersection mationrequiredtodescribethebuildingblocks(atoms)which are unique (in their self-assembly role) to the given amino S˜ =S˜′ ∩S˜′ =S˜′ \S˜ =S˜′ \S˜ A:B A B A A|B B B|A acid. These atoms are marked with grey circles. From this follows the mutual complexity K(A:B) = K′(A)−K(A|B)=K′(B)−K(B|A) and the relative mutual complexity = K′(A)+K′(B)−K(A,B) In order to account for the relative sizes of the struc- K(A:B) Krel(A:B)= tures we compare using these measures, we can define K(A,B) relative versions of the above quantities. These are rela- tive conditional complexity: NotethatthelattermeasureresemblestheJaccardindex K(A|B) [42]. For an illustration of joint, mutual and conditional Krel(A|B)= K′(B) complexity, see Figure 8. 8 DISCUSSION 1 2 1 2 2 2 2 1 2 1 2 1 2 1 2 Stericeffects–Forstructureswhichcontainloopstruc- 1 2 1 tures formed by repeating units, it is possible to exploit stericeffectsinordertoreducethesizeoftheassemblykit A below the minimum size found by our algorithm (which A 1 B 2 B A 1 A B 2 B explicitly excluded such effects in its definition). An ex- ample of a steric effect would be a polyomino which is self-limitinginadeterministicway,purelybecauseofthe FIG. 9: A simple example of a steric effect. The two blocks 1 and 2 have colours A and B on their interfaces. These geometric constraints of the building blocks. As long as coloursattracteachother. Allotherfacesareneutral. Certain eachdistincttypeofloopstructureisformedbybuilding arrangementsofcolourswillleadtoself-delimitingstructures blocksofadistinctspecies(orsetofspecies),theamount purely because of the geometry of the building blocks. The of information required to describe this structure can be complexity of such structures can be taken to be the same taken to be the same as that required to describe an in- asthatofaninfinitechainconsistingofthesamesequenceof finite chain consisting of the same elements. A simple blocks,butonlyifeachloopstructureinsideabiggerstructure example is given in Figure 9. The crucial assumption has a distinct (set of) species of buildingblocks. which has to hold for this simplification to work is that the geometryofthe loopis specified by the species (and, by extension, the geometry) of the building block. For proteinsasbuildingblocksofproteincomplexes,thisisa * * very reasonable assumption. In the case of molecules 4 3 2 1 5 6 1 it would furthermore be possible to simplify the self- 2 3 assembly kit by introducing building blocks representing common small loop structures, such as carbon rings. 3 2 4 1 Multiple nuclei – In principle one could consider be- 12 3 4 ginning the self-assembly with multiple nuclei in place. Multiple nuclei may, through steric hindrance or mod- ular repetition, be used to achieve certain structures in a more efficient way, using fewer building blocks than a 1 2 3 4 5 6 7 11 singlenucleus wouldrequire. Thisreductionincomplex- 8 12 ity may however be countered in practical applications by the difficulty of achieving the required precise rela- 9 5 tive displacements of nucleus particles. It is because of 10 6 5 6 these reasons that we have concentrated on a single nu- cleus model, as the positioning of multiple nuclei makes it much more difficult to construct a general measure of complexity. Within the single nucleus category, we further distin- FIG.10: Illustrationofnucleiplacement. (Top:) Ifwespecifyei- guish between structure with a specified nucleus block therofthetwostarredblocksasnuclei,deterministicbondingwill and those with general nucleus blocks. The former case result. However, if any other block is used as the nucleus, bond- encompassesthoseassemblykitswhichareguaranteedto ingwillbenon-deterministic,asboththe{1,0,0,4}and{1,0,5,0} produceagivenoutputstructureifandonlyifaspecified blockscanjointheopen2edgesthatwillform. Thisself-assembly kithasacomplexityofK=42.4bits. (Bottom:) Ageneralnucleus block is used as the nucleus (in other words, this block systemtoproducethesamestructure,illustratingtherequiredin- is placed on the substrate before other blocks are intro- creaseincomplexity(K=98.1bits). duced to the system). General-nucleus assembly kits by contrastwillformthesameoutputstructureregardlessof which block is placed first. See Figure 10 for an illustra- tion how specifying a nucleus can reduce the complexity trolled environment where a nucleus can be placed to of a assembly kit. initiate assembly, the single-nucleus model is applicable. Which of these classes to employ in a study depends The two cases correspond to different ‘languages’ being on the motivating context of the self-assembling system used to measure complexity, and so care must be taken underconsideration. Ifmodellingassemblyinadiffusion- in comparative studies to only compare numerical com- dominated environment,for example,the orderin which plexity values from within one class. interacting particles meet cannot be specified, so the Kolmogorov complexity – Our approach to measuring general-nucleus model is more appropriate. In a con- physical complexity is motivated by the concept of Kol- 9 mogorovcomplexity. Itishoweverimportanttonotethat [13] C. Mao, T. H. LaBean, J. H. Reif, and N. C. Seeman, while Kolmogorovcomplexityitselfis uncomputable due Nature 407, 493 (2000). to the Halting problem [3], our minimum is not. This is [14] A. Chworos, I. Severcan, A. Y. Koyfman, P. Weinkam, E.Oroudyev,H.G.Hansma,andL.Jaeger,Science306, because the runtime of a finite computer program with 2068 (2004). finite output can be infinite, while the assembly time of [15] R. P. Goodman et al., Science 310, 1661 (2005). a finite shape is always finite [4]. It is possible to de- [16] P. W. K. Rothemund,Nature440, 297 (2006). fine the actual Kolmogorov complexity of a shape [5], [17] K. Fujibayashi, R. Hariadi, S. H. Park, E. Winfree, and but this is uncomputable. Our computable complexity S. Murata, NanoLett. 8, 1791 (2008). measureK(A)formsaboundonthisunattainablequan- [18] E. W. Weisstein, “Polyomino.” from tity, and is dependent on the way in which we encode MathWorld - A Wolfram Web Resource. the description of the assembly kit. It therefore is useful http://mathworld.wolfram.com/Polyomino.html [19] E. W. Weisstein, “Necklace.” from Math- fortheanalysis,classificationandcomparisonofphysical World - A Wolfram Web Resource. structures, as long as we use a consistent encoding. http://mathworld.wolfram.com/Necklace.html [20] H. Wang, Bell SystemsTech. J. 40, 1 (1961). [21] G. Po´lya, Acta Math-Djursholm 68, 145 (1937). CONCLUSION [22] N. Rashevsky,Bull. Math. Biophys. 17, 229 (1955). [23] E. Trucco, Bull. Math. Biophys. 18, 129 (1956). Wepresentageneralapproachformeasuringthephys- [24] S. H. Bertz, J. Am. Chem. Soc. 103, 3599 (1981). [25] E. D. Levy, J. B. Pereira-Leal, C. Chotia, and S. A. Te- icalcomplexityofanyconnectedstructure,usingthelan- ichmann, PLoS Comp. Biol. 2, e155 (2006). guage of self-assembly. This approach is capable of de- [26] Gabriel Villar, et al., Phys. Rev. Lett. 102, 118106 tecting symmetry and modularity in a given structure, (2009). because these features significantly decrease the size of [27] E.D.Levy,E.B.Erba,C.V.Robinson,S.A.Teichmann, the required self-assembly instruction set. It therefore Nature 453, 1262 (2008). providesapowerfultoolforautomatedclassificationand [28] B.Goodwin,Howtheleopardchangeditsspots: Theevo- categorization of physical structures. In addition, the lution of complexity. (Princeton University Press, 2001) [29] J. Bronowski, Synthese21, 228 (1970). connection between self-assembly and complexity is an [30] D. W. McShea, Biology and Philosophy 6, 303 (1991). argumentfortheubiquityofmodularandsymmetricfea- [31] C. Bennett, Complexity, entropy, and the physics of in- turesinbiologicalsystems: Sincemanysuchsystemsself- formation,pp.137-148(WestviewPress,1990,ed.W.H. assemble, evolving sets of self-assembly instructions are Zurek). likely to yield symmetric and modular structures, as the [32] J. S.Wicken, J Theor. Biol. 77, 349 (1979). instructions for these are more efficient to evolve. [33] O. Toussaint and E. D. Schneider, Comparative Bio- chemistry and Physiology, Part A 120, 3 (1998). [34] C. Adami, C. Ofria and T. C. Collier, Proc. Natl. Acad. Sci. U.S.A.97, 4463 (2000). [35] M.Mitchell,Anintroductiontogeneticalgorithms(Brad- ford Books, 1996) [1] A. N. Kolmogorov, Prob. Inform. Transmission 1, 4 [36] M. Mitchell and S.Forrest, Artificial Life 1, 267 (1994). (1965). [37] J. H.Holland, ScientificAmerican 267, 66 (1992). [2] G. J. Chaitin, J. Assoc. Comput. Mach. 13, 547 (1966). [38] S. Forrest, Science 261 872 (1993). [3] T.M.CoverandJ.A.Thomas,ElementsofInformation [39] E.Cabez´on,M.G.Montgomery,A.G.W.Leslie, andJ. Theory (Wiley-Interscience,1991). E. Walker, Nature Structural & Molecular Biology, 10, [4] P. W. K. Rothemund and E. Winfree, STOC ’00: Pro- 744 (2003). ceedingsofthethirty-secondannualACMsymposiumon [40] T. Izard, A. Ævarsson, M. D. Allen, A. H. Westphal, R. Theory of computing, pp.459-468 (2000). N. Perham, A. de Kok, and W. G. J. Hol, Proc. Natl. [5] D.SoloveichikandE.Winfree,SIAMJ.Comp.36,1544 Acad. Sci. U.S.A. 96, 1240 (1999). (2006). [41] D. Caspar and A. Klug, Cold Spring Harbor Symp. [6] C. Adami, Phys. Life Rev.1, 3 (2004). Quant. Biol, 27(1) (1962). [7] G. M. Whitesides and M. Boncheva, Proc. Natl. Acad. [42] P. Jaccard, Bulletin de la Societe Vaudoise des Sciences Sci. U.S.A.99, 4769 (2002). Naturelles, 37 241, (1901). [8] G.KrauschandR.Magerle,Adv.Mater.14,1579(2002). [43] For free necklaces, which represent building blocks with [9] J. Israelachvili, Langmuir 10, 3774 (1994). [10] H.Fraenkel-ConratandR.C.Williams,Proc.Natl.Acad. no fixedchirality there areMc =(c4+2c3+3c2+2c)/8 necklaces [19]. In general we will assume fixed chirality. Sci. U.S.A,41 690 (1955). [44] Heterogeneous interfaces are double-counted as, unlike [11] A.Zlotnick, J. Mol. Biol. 241, 59 (1994). homogeneous interfaces, theyrequire two colours. [12] E. Winfree, F. Liu, L. A. Wenzler, and N. C. Seeman, Nature394, 539 (1998).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.