Interaction of proteins in solution from small angle scattering: a perturbative approach Francesco Spinozzi 1, Domenico Gazzillo2, Achille Giacometti2, Paolo Mariani1 and Flavio Carsughi3 1 Istituto di Scienze Fisiche, Universit`a di Ancona, and 2 INFM Unit`a di Ancona, Via Brecce Bianche, I-60131 Ancona, Italy 0 2Dipartimento di Chimica Fisica, Universit`a di Venezia, and 0 INFM Unit`a di Venezia, S.Marta 2137, I- 30123, Venezia, Italy 2 3Facolt`a di Agraria, Universit`a di Ancona, and n INFM Unit`a di Ancona, Via Brecce Bianche, I-60131 Ancona, Italy a (February 6, 2008) J 2 2 adetailedanalysisofthelong-rangeinteractionshasbeen In this work, an improved methodology for studying in- so far limited to few associating colloids (Chen and Lin, ] teractions of proteins in solution by small-angle scattering, h 1987;ItriandAmaral,1991)andhas usuallybeen based is presented. Unlike the most common approach, where the c on light scattering or osmotic stress methods (Parsegian e protein-proteincorrelation functionsgij(r)areapproximated andEvans,1996). However,smallanglescattering(SAS) m bytheirzero-densitylimit(i.e. theBoltzmannfactor),wepro- is certainly the most appropriate tool for studying the - poseamoreaccuraterepresentationofgij(r)whichtakesinto wholestructureofproteinsolutions,becauseofthesmall t account terms up to the first order in the density expansion a perturbing effects on the system and the possibility of ofthemean-forcepotential. Thisimprovementisexpectedto t s be particulary effective in the case of strong protein-protein derivinginformationonthe structuralpropertiesandin- . teractions under very different experimental conditions t interactions at intermediate concentrations. The method is a (pH, ionic strength, temperature, presence of cosolvents, appliedtoanalysesmallangleX-rayscatteringdataobtained m as a function of the ionic strength (from 7 to 507 mM) from ligands, denaturing agents and so on). - acidicsolutionsofβ-Lactoglobulineatthefixedconcentration InmostanalysesofSASdata,particleinteractionsare d n of 10 gL−1. The results are compared with those obtained howeverdisregarded,assumingeitherlargeseparationor o using the zero-density approximation and show a significant weak interaction forces. The interactions among macro- c improvementparticularly in themoredemandingcase of low molecules determine their spatial arrangement, which [ ionic strength. can be described by correlation functions. These func- RunningTitle: Interaction of proteins by SAS tions may be related, for instance via integralequations, 2 v Keywords: long-range interactions, mean-force potential, to the direct pair potentials, describing the interaction 0 density expansion, pair correlation functions, structure fac- betweentwoparticles. Whentheaveragedistanceamong 7 tor, β-Lactoglobuline particles is large or the interaction potentials are weak, 3 the influence of the average structure factor of the sys- 1 tem (i.e. the Fourier transform of the average correla- 0 I. INTRODUCTION tion function) may be negligible inside the considered 2 experimental angular window, and the particles can be 0 / The study of protein-protein interactions in solution reckoned as completely uncorrelated. Under these con- at andthedeterminationofboththephysicaloriginoflong ditions, the SAS intensity appears to depend only upon m range interactions and the geometry and energetics of the average form factor. Note that this approximation molecularrecognitioncanprovidethe mosteffective way ofneglectingallintermolecularforcesis usedinmostap- - d of correlating structure and biological functions of pro- plications of X-ray or neutron SAS (Kozin et al., 1997; n teins. In recent years, a large effort has been devoted Chac´on et al., 1998). o to improve the understanding of interactions between When the above conditions are not verified, then par- c macromolecules in solution. In particular, it has been ticlescannotbeconsidereduncorrelated,andtheaverage : v widelyrecognizedthattheevaluationofelectrostaticpo- structure factor cannot be neglected in the expressionof Xi tentials can produce quantitative predictions and that the SAS intensity. In this case data analysis is far more factorssuchasself-energy,polarizabilityandlocalpolar- complicate. In principle, asymptotic behaviors could be r a itycanbebiologicallycrucial(HalgrenandDamm,2001; used to separate the SAS intensity into (average) form Sheinerman et al., 2000). Nevertheless, major concep- and structure factors (Abis et al., 1990). If the par- tual and practical problems still exist, and concern, for ticle form factors are known, an experimental average instance, the experimental techniques required to mea- structure factor can be extracted, by dividing the inten- sureinteractionpotentialsunder physiologicallyrelevant sity by the average form factor. Then, some insight into conditions,as wellas the a clarificationofthe role ofthe the intermolecular forces may be obtained by compari- solvent and of the protein shape and charge anisotropy. son with the theoretical structure factor calculated from Severalbiophysicalmethodscanbeusedforextracting someinteractionmodel, byusinganalyticalornumerical quantitative data onprotein-proteininteractions,evenif 1 methodsfromthestatisticalmechanicaltheoryofliquids tractable theoretical scheme for calculating the average (Hansen and Mc Donald, 1986). structurefactortobeusedinthefitofexperimentaldata. Unfortunately, the most powerful and accurate tech- Both tasks have been accomplished in this work. niques provided by this theory - such as Monte Carlo We first report a new set of SAXS measurements on andmolecular dynamicscomputer simulationsas wellas βLG performed under the same experimental conditions integral equations - can hardly be included into a typ- of Baldini and coworkers (Baldini et al., 1999), but for ical best-fit procedure for analysing experimental data. smallerangles. Thesedataunambiguouslydisplayalow- Working at very low concentrations,a first possibility of ering in the scattering intensity at small angles, with a improvingoverthecruderecipeofneglectingtheaverage progressive development of an interference peak, when structure factor is to evaluate that quantity by approx- ionic strength is low. This occurrence is a clear signal imating the pair correlation functions g (r) with their of strong protein-proteininteractions, and we shall show ij zero-density limit, given by the Boltzmann factor (Velev that it can be simply interpreted in terms of screened etal.,1997). Inthepresentpaper,weshallshowthatthis electrostatic repulsions among charge macroions. zero-density approximation becomes quite unusable at Next, we shall propose an improvement for the calcu- the usualproteinconcentrationswhen the ionic strength lation of the theoretical average structure factor, based is low, i.e., in the presence of strong electrostatic inter- upon a new approximation to the protein-protein cor- actions. Clearly,itwouldbe desirableto findanalterna- relation functions g (r). Starting from the density ex- ij tive, simple but reasonably accurate, way for computing pansion of the corresponding mean-force potentials, we the average structure factor of globular proteins at low shall show that the simple addition of the 1st-order per- ormoderateconcentrations. Thisisthemajoraimofour turbativecorrectionto the directpairpotentials leadsto paper. a marked progress with respect to the use of the Boltz- Althoughthenewproposalismethodologicalandthus mannfactor,while retaining the same levelofsimplicity. applicable, in principle, to a wide class of spherically The new approximationis indeed able to predict, at low symmetric interaction models, it will be illustrated on ionic strength, the interference peak observed in the ex- a concrete case, as a part of a more general study on perimentalscatteringintensity,andconsequentlyitleads structural properties of a particular protein in solution, to a significantly improved fit. β-Lactoglobulin (βLG). We stress, in advance, that a check of the unavoid- In a previous paper (Baldini et al., 1999), which pro- able limits of validity of the proposed approach will not videsanaturalintroductiontothepresentwork,alllong- be treated here. A further study involving a compari- range protein-protein interactions were neglected and son with more accurate theoretical results (from Monte the average structure factor was assumed to be unity. Carlo or molecular dynamics, as well as from integral That investigation reported experimental data concern- equations) is, of course, desirable, but goes beyond the ingstructuralpropertiesofβLGacidicsolutions(pH2.3), scope of the present paper, and will be left for future atseveralvaluesofionicstrengthinthe range7-507mM work. (Baldini et al., 1999). Photon correlation spectroscopy and small angle X-ray scattering (SAXS) experiments gaveaclearevidenceofamonomer-dimerequilibriumaf- II. BASIC THEORY fectedbythe ionic strength. Inthe angularregionwhere SAXS experiments were performed, the contribution of Becauseofthepresenceofanaggregationequilibrium, long-range protein-protein interactions was expected to aβLGsolutioncontainstwodifferentformsofmacroions be rather small. Accordingly, SAXS data were analysed (protein monomers and dimers) embedded in a suspend- only in terms of βLG monomer and dimer form factors, ing fluid and in a sea of microions, which include both whichwerecalculatedveryaccurately. Short-rangeforces counter-ions neutralizing all protein charges and small responsible for protein aggregation were taken into ac- ions originated from the addition of electrolyte salts. To countonlyimplicitlythroughachemicalassociationequi- represent such a system, we shall employ a simple “two- librium, employed to evaluate the dimerization fraction. componentmacroionmodel”,whicheffectivelytakesinto A global fit procedure allowed the determination of the account only protein particles. Within this scheme, monomereffectivecharge,aswellasofthe proteindisso- which is usually referred to as the Derjaguin-Landau- ciation free energy within a wide range of ionic strength Vervey-Overbeek(DLVO) model (Vervey and Overbeek, (Baldini et al., 1999). 1948), the suspending fluid (solvent) is represented as In the present paper, we shall investigate, within the a uniform dielectric continuum and all microions are same physical system, the long-range protein-protein in- treated as point-like particles. The presence of both teractions, which can strongly influence the small-angle solvent and microions appears only in the macroion- scattering at low ionic strength. To this aim, two issues macroion effective potentials. A further simplification have to be addressed. First, one needs to extend the ex- followsfromthe assumptionofsphericallysymmetricin- perimental SAXS angular region to lower values of the teractions. We note thatinourmodel,component1and scattering vector, where long-range forces play an im- 2 correspond to monomers and dimers, respectively. portant role. Second, one has to select an accurate and 2 Before addressing the specific system under investiga- The partial structure factors (Ashcroft and Langreth, tion, it is convenient to recall some basic points of the 1967) are defined as general theory. ∞ sin(Qr) S (Q)=δ +4π(n n )1/2 dr r2 [g (r)−1] , ij ij i j ij Qr Z0 A. Scattering functions (3) The macroscopical differential coherent scattering in terms of the three-dimensional Fourier transform of crosssectiondΣ/dΩ,obtainedfromaSASexperiment,is g (r)−1,whereg (r)isthepaircorrelationfunction(or ij ij related to the presence of scattering centers, i.e. density radial distribution function) between particles of species and/orstructuralinhomogeneities,andcanyieldquanti- i and j. tativeinformationabouttheirdimensions,concentration Finally, the average form and structure factor, P(Q) as well as shape and interaction potentials. The cross and S (Q), are M sectionis proportionalto the “contrast”,namely the dif- ferenceofelectrondensitymultipliedbytheclassicalelec- p tron radius (or scattering length density in the neutron P(Q)=(∆ρ)2 ni Vi2 <Fi2(Q)>ωQ, (4) case)betweenthescatteringcentersandthesurrounding i=1 X medium; in the case of biological samples, this quantity can also be tuned in order to obtain more detailed in- dΣ formationabout the scattering structures (contrastvari- S (Q)= (Q) / P(Q). (5) M dΩ ation technique Jacrot, 1976). Proteins in solution rep- resent an excellent example of inhomogeneities for SAS measurements,duetotheirhighcontrastwithX-rays(as wellaswithneutrons). ThegeneralequationfortheSAS B. Protein form factors intensity is The angular averaged form factor of species i can be dΣ 1 2 (Q)= drδρ(r)eiQ·r , (1) written as dΩ V Q being the exchang*ed(cid:12)(cid:12)(cid:12)(cid:12)ZVwave vector, (cid:12)(cid:12)(cid:12)(cid:12)w+ith magnitude <Fi(Q)>ωQ= ∞ dr p(i1)(r) sinQ(Qrr), (6) Q = (4π/λ)sinθ, where λ represents the incident radia- Z0 tion wavelength and 2θ is the full scattering angle. The where p(1)(r) represents the probability for the i-th i integral in Eq. 1 is extended over the sample volume V, speciesthatapointatdistancer fromtheproteincenter withrbeingthepositionvectorandδρ(r)thefluctuation of mass lies inside the macromolecule. Similarly, the an- with respect to a uniform value, ρ , of the local electron 0 gular averaged squared form factor is given by (Guinier densitymultipliedbytheclassicalelectronradius(orsim- and Fournet, 1955) plythescatteringlengthdensityinthecaseofneutrons). Angular brackets represent an ensemble average over all ∞ sin(Qr) <F2(Q)> = dr p(2)(r) (7) possible configurations of the proteins in the sample. i ωQ i Qr Eq.1canbereducedtoasimplerform,whentheinter- Z0 actions are spherically symmetric. Using a “two-phase” where p(2)(r) represents the probability for the i-th representationofthefluid(onlyonetypeofhomogeneous i speciestofindasegmentoflengthrwithbothendsinside scatteringmaterialwithscatteringdensityρP insidepro- the macromolecule. Both integrals of p(1)(r) and p(2)(r) teins, embedded in a homogeneous solvent phase with i i are normalized to unity. These distribution functions density ρ ) yields 0 havebeencalculatedfromthecrystallographicstructures p of both the monomer and dimer forms of the protein, as dΣ dΩ(Q)=(∆ρ)2 niVi2 <Fi2(Q)>ωQ −<Fi(Q)>2ωQ described in Refs. (Baldini et al., 1999; Mariani et al., (cid:26)Xi=1 h i 2000), briefly recalled in Appendix A, and discussed in p Subsection III C. + (n n )1/2 V V <F (Q)> <F (Q)> S (Q) (2) i j i j i ωQ j ωQ ij i,j=1 (cid:27) X C. Protein-protein interaction potentials where ∆ρ≡ρ −ρ represents the contrast, p the num- P 0 berofproteinspecies(2foroursolutionswithmonomers The choice of the proper potential is a rather delicate and dimers), n the number density of species i, V the i i volume,F (Q)theformfactor,S (Q)theAshcroft-Lan- matter and depends on the investigated system. For in- i ij stance,inastudyonlysozyme(Kuehneretal.,1997)the greth partial structure factor and <...> denotes an ωQ protein-proteininteractionwasassumedtobethesumof orientational average. 3 four contributions, namely a hard-sphere term, an elec- We have explicitly checked that the addition of an trostaticrepulsion,anattractivedispersionpotentialand attractive term with the form of a Hamaker potential ashort-rangeattraction. Inadifferentstudy,onlysozime uH(r) (Israelachvili, 1992) does not alter our final con- ij and chymotrypsinogen(Velev et al., 1997)five contribu- clusions. The basic reason for this can be traced back tions were,onthe other hand, considered: charge-charge to the fact that van der Waals attractions may be com- repulsion,charge-dipole,dipole-dipoleandvanderWaals pletelymaskedbyuC(r),whentheelectrostaticrepulsion ij attraction,alongwithfurthercomplexshort-rangeinter- is strong, and are also negligible for moderately charged actions. In this paper we follow a different route mo- particles with diameter smaller than 50 nm (N¨agele, tivated by the fact that the presence of several interac- 1996). Moreover,uH(r) diverges at r =R +R , so that ij i j tion terms may obscure the relative importance of each its applicability could be preserved only by the addition of them. Moreover, the choice of a very refined poten- of a non-interpenetrating hydration/Sternlayer (Baldini tial would be in striking contrast with the very crude et al., 1999; Kuehner et al., 1997). approximations used in calculating the RDFs. On this We stress the fact that some attractive interactions basis we shall search for the simplest possible model po- must, however, be present in the system, since they are tential which is still capable of capturing the essential responsiblefortheaggregationofmonomersintodimers, features of the system. It will be the sum of two repul- and determine the value of the monomer molar fraction sive contributions: x , which is required to complete the definition of our 1 model. However, due to the complexity of these in- u (r)=uHS(r)+uC(r) (8) ij ij ij teractions (including hydrogen bonding), a clear under- standingoftheir explicitfunctionalformsis stilllacking. where Therefore,followingBaldinietal. (1999),wewillaccount uHS(r)= +∞ 0≤r <Ri+Rj (9) forthemindirectly,byusingachemicalassociationequi- ij (cid:26)0 r ≥Ri+Rj librium to fix x1. The dissociation free energy, which determines the equilibrium constant,is written as a sum is a hard-sphere (HS) term which accounts for the of two contributions, i.e. excluded-volume effects (R being the radius of species i i) and ∆G =∆G +∆G , (12) dis el nel Z Z e2 exp[−κ (r−R −R )] uC(r)= i j D i j where ∆G is an electrostatic term calculated within ij ε(1+κ R )(1+κ R ) r el D i D j a Debye-Hu¨ckel theory, and ∆G is an unknown non- nel (10) electrostaticcontribution, whichwill be left as a free pa- rameter in the best-fit analysis. represents a screened Coulomb repulsion between the macroioncharges,whichareof the same sign. This term hasthesameYukawaformasintheDebye-Hu¨ckeltheory D. Radial distribution functions ofelectrolytes,butthe couplingcoefficientsareofDLVO type (Vervey and Overbeek, 1948). Here, e is the ele- Given a model potential, one has to calculate the cor- mentary charge, ε the dielectric constant of the solvent responding radial distribution functions (RDF) g (r), ij and the effective valency of species i, Z , may depend on i which can be expressed by the exact relation the pH. The inverse Debye screening length κ , defined D as g (r)=exp[−βW (r)], (13) ij ij 8πβe2N 1/2 A κ = (I +I ) , (11) D S c ε −βW (r)=−βu (r)+ω (r) (14) (cid:20) (cid:21) ij ij ij depends on temperature ( β = (k T)−1 ) and on the B where W (r) is the potential of mean force, which ij ionic strength of all microions. I and I represent the S c includes the direct pair potential u (r) as well as ij ionic strength of all added salts (S) and of the counte- −β−1ω (r), i.e. the indirect interaction between i and ij rions (c), respectively. Both these terms are of the form j due to their interactionwith all remainingmacroparti- (1/2) cmicro(Zmicro)2, with cmicro = nmicro/N being i i i i i A clesofthefluid. Inthezero-densitylimit,ωij(r)vanishes the molar concentration of micro- species i (N is Avo- A and g (r) reduces to the Boltzmann factor, i.e. P ij gadro’s number). I is related to the macroion number c densities n1 and n2 (1 = monomer, 2 = dimer) through gij(r)=exp[−βuij(r)] as n→0, (15) the electroneutrality condition, according to which the counterions must neutralize all macroion charges, i,.e. which represents a 0th-order approximation, frequently n |Z |=n |Z |+n |Z |. Noticethatthedependenceof usedintheanalysisofexperimentalscatteringdata(n≡ c c 1 1 2 2 κ onI impliesthatthestrengthoftheeffectivepoten- n is the total number density). D S m m tial uC(r) can largely be varied by adding an electrolyte Themostcommonprocedurefordetermininganaccu- ij P to the solution. rate g (r) or, equivalently, the correction term ω (r), ij ij 4 would be to solve the Ornstein-Zernike (OZ) integral A. Samples equationsofthe liquidstate theory,withinsomeapprox- imate closure relation (Hansen and Mc Donald, 1986). A bovin milk βLG B stock solution (concentration 40 This can typically be done numerically, with the excep- gL−1)wasobtainedbyionicexchangeofproteinsamples tionoffewsimplecases(forsomepotentialsandpeculiar against a 12 mM phosphate buffer (ionic strength I = S closures) where the solution can be worked out analyti- 7mMandpH=2.3)(Baldinietal.,1999). Ninesamples cally. at ionic strength 7, 17, 27, 47, 67, 87, 107, 207, 507 mM For our hard-sphere-Yukawa potential (neglecting the were then prepared by adding appropriate amounts of Hamakerterm),theOZequationsdoadmitanalyticalso- NaCl. The final protein concentrations were about 10 lution, when coupled with the “mean spherical approx- gL−1. imation” (MSA) (Blum and Hoye, 1978; Ginoza, 1990; The monomeric βLG unit is composed by 162 ammi- Hayter and Penfold, 1981). Nevertheless, at low density noacidresidues and has a molecular weightof 18400Da. and for strong repulsion the MSA RDFs may assume The excluded protein volume has been calculated from unphysical negative values close to interparticle contact theaminoacidvolumes,asreportedbyJacrotandZaccai (N¨agele, 1996). To overcome this difficulty, it would be (Jacrot, 1976; Jacrot and Zaccai, 1981). The monomer possibletoutilize ananalytical“rescaledMSA” (N¨agele, volume results to be V = 23400 ˚A3; hence, the βLG 1996;HansenandHayter,1982;Ruiz-Estradaetal.,1990 1 electron density is ρ = 0.418e˚A−3. By considering the ),ortoresorttodifferentclosures(Rogers-Youngapprox- P basicity of the amino acids, at pH=2.3 the monomer imation or “hypernetted chain” closure), which compel charge would be near 20e. This result is confirmed by numerical solution (Rogers and Young, 1984; Zerah and the Gasteiger- Marsili method (Gasteiger and Marsili, Hansen, 1986; Wagner et al., 1991; Krause et al., 1991; 1980),assumingthatallaminogroupsNH areprotoned D’Aguanno and Klein, 1992; D’Aguanno et al., 1992; 2 atpH=2.3. ThecrystallographicstructureofβLGboth Na¨gele et al., 1993). in monomer and in dimer form can be found in the Pro- Inmoregeneral,whenonlynumericalsolutionsarefea- tein Data Bank, entry 1QG5 (Oliveira et al., 2001). A sible,integralequationalgorithmscanhardlybeincluded sketch of βLG dimer structure can be found in Fig. 1 of inabest-fitprogramfortheanalysisofSASresults. The Ref. (Baldinietal.,1999). Itcanbe observedthatall20 use of analytical solutions, or simple approximations re- basic amino acids are on the protein surface, but two of quiringonlyaminorcomputationaleffort,isclearlymuch them are at the monomer-monomer interface; therefore more advantageous when fitting experimental data. The atpH=2.3theratioZ /Z betweendimerandmonomer 0th-orderapproximationgiveninEq.15avoidstheprob- 2 1 charges could be about 1.8. lemofsolvingtheOZequations,butislargelyinaccurate except, perhaps, at very low densities. Inordertoimproveoverthis 0th-orderapproximation B. SAXS experiments to the RDFs, the basic idea put forward in the present workhingesuponthe expansionofthe potentialofmean SAXS measurements were collected at the Physik De- force into a power series of the total number density n partment of the Technische Universit¨at Mu¨nchen (Ger- (Meeron, 1958). Neglecting all terms beyond the first many) using a rotating-anode generator. The radiation order, Eq. 13 then becomes wavelength was λ = 0.71 ˚A and the temperature 20◦C. (1) The Q range was 0.035−0.1 ˚A−1. βLG samples were g (r)=exp −βu (r)+ω (r)n . (16) ij ij ij measured in quartz capillaries with a diameter of 2 mm h i and a thickness of 10 µm (Hilgenberg, Malsfeld, D). X- Byconstruction,thisexpressionisnevernegative,thus raypatternswerecollectedbyatwo-dimensionaldetector avoiding the major drawback of MSA. The explicit ex- andradiallyaveraged. Thescatteringfromasolventcap- pression for the perturbative correction ω(1)(r) is given ij illary was subtracted from the data after correction for in Appendix B. The considered 1st-order approximation transmission, capillary thickness and detector efficiency. substantially improves the accuracy of the RDFs with respect to Eq. 15, while remaining at nearly the same level of simplicity (see Appendix B). Moreover, it is to C. Best-Fit analysis be stressed that the usage of the new approximation is not restricted to the model of this paper, but the pro- ApreviousanalysisofSAXSdataforsimilarsamplesin posed calculation scheme can be equally well applied to the range Q=0.07÷0.3˚A−1 has been recently reported different spherically symmetric potentials. by some of us (Baldini et al., 1999). In the present work we have extended these experiments to the range Q = III. MATERIALS AND METHODS 0.035÷ 0.1˚A−1, where protein-protein interactions are expected to play a major role. The two sets have then 5 been combined into a single set of measurements with Q 1 NS ranging from 0.035 to 0.3˚A−1. χ2 = N χ¯2m S As regards the calculation of the monomer and dimer mX=1 form factors, it is well known that the scattering form 1 NQ,m [dΣ/dΩ]exp(Q )−κ [dΣ/dΩ]fit(Q )−B 2 χ¯2 = m i m m i m (17) factor of a biomolecule in solution depends on the crys- m N σ (Q ) Q,m ( m i ) tallographic coordinates and the form factors of all con- Xi=1 stituent atoms, as well as on the hydration shell of the where N is the number ofscattering curves under anal- S resulting macroparticle. Computer programs such as ysis, N is the number of experimental points in the Q,m CRYSOL (Svergun et al., 1995) are able to calculate m−thcurve,andσ (Q )istheexperimentaluncertainty m i such a form factor, taking all the above-mentioned vari- onthe intensityvalueatQ . [dΣ/dΩ]fit(Q )isthecorre- ables into account. It is also widely accepted that the i m i sponding cross section predicted by the model by using SAS technique is a low-resolution one, and approximat- Eq.2;foreachexperiment,thecalibrationfactorκ and ing the βLG protein by a homogeneous scattering parti- m theflatbackgroundB havebeenadjustedfromalinear cle yields comparable results up to Q = 0.4˚A−1, as we least-squaresfit of [dΣm/dΩ]exp(Q). The partial structure have tested by checking our method against the results m factors, Eq. 3, have been calculated with an integration of the CRYSOL software. The equivalent homogeneous upper limit of r =500˚A and a grid size of 1˚A. scatteringparticlehasashapedefinedbytheenvelopeof The physical meaning of the “flat background” re- the van der Waals spheres centered on each atom. The quires a comment, since constant subtraction is usually SAS community often exploits the Monte Carlo method accepted for neutron scattering, but not for X-ray scat- to calculate the form factor of a given shape (Hender- tering. Introducing these backgrounds is suggested by son, 1996). We have modeled the hydration shell with a observingthat one of majorexperimental problems with semigaussian function, instead of a linear one proposed X-raysistheexactdeterminationofthetransmissionfac- by Svergun (Svergun et al., 1997). Our simple and ef- tor. A non-exact value would result into a non-perfect ficient method has already been applied with success in subtractionofthebackgroundduetotheelectronicnoise. previousworks(Baldinietal.,1999;Marianietal.,2000). However, as shown later in Table II, the low values ob- The Monte Carlo method used to calculate the distri- tained for B , as compared to the values of the scaling bution functions p(1)(r) and p(2)(r) of both monomers m i i factors,indicate that these parameters play a minor role (i = 1) and dimers (i = 2) from their crystallographic in the data analysis. structuresisoutlinedinAppendix A.Thenthe formfac- Typical calculation times for the best-fit on a Digital tors <Fi(Q)>ωQ and <Fi2(Q)>ωQ have been obtained Alpha 433 are a few minutes for the 0th-order approxi- through Eqs. 6 and 7, by calculating the radial integrals mation and ≃ 20 hours for the 1st-order one. The effect withagridsizeof1˚Auptoamaximumr corresponding ofexperimentalerrorsonthefittingparametershasbeen to p(i)(r)=0, (i=1,2). determined using a sampling method. For each scatter- According to the dissociation free energy model de- ingcurve,westartfromN intensities[dΣ/dΩ]exp(Q ) scribedinRef. (Baldinietal.,1999),themonomermolar Q,m m i with their experimental standard deviation and we gen- fraction x is a function of the ionic strength I . This 1 S erate N new data sets (for βLG we used N = 15) by I I suggeststhepossibilityofasimultaneousfitforallSAXS sampling from N gaussians of width σ (Q ) centred Q,m m i intensitiescurves,usingjustfewparameters,allindepen- at the observed values. Each data set generated for all dent of I . In particular, as in Baldini et al. (Baldini et S curves is then analyzed with the global fit algorithm de- al., 1999), the following parameters have been fixed: the scribed earlier. The errors on the fitting parameters, Z 1 dielectric constant of the solvent, ε = 78.5; the experi- and ∆G , and on the scaling parameters, κ and B , nel m m mental temperature, T = 293 K; the ratio between the are obtained by calculating their values from each data effective charges of dimer and monomer, Z /Z = 1.8; 2 1 set and, finally, their standard deviation from the first the monomer and dimer “bare” radii, R =19.15 ˚A and 1 value. R =21/3R . ThechoiceforR iseasilyunderstoodifwe 2 1 2 recall that our model of long-range interactions involves the approximation of considering a dimer as a sphere IV. RESULTS AND DISCUSSION with volume twice as large as the monomer one. This introduction of an equivalent sphere is a simplifying ap- Fig. 1 depicts the experimental results for the X-ray proximation often used by the SAS community. On the intensity[dΣ/dΩ](Q)asafunctionofthetransferredmo- other hand, we have calculated the form factor of the mentum Q at several values of ionic strength. Here, in- dimer from its exact, rather elongated form. steadoftheusuallogarithmicscale,wehavepreferredthe In the globalfit the only free parametersare therefore use ofa linear scale,in orderto let the readerappreciate Z and ∆G , the non-electrostatic free energy. The 1 nel more easily the small differences between experimental merit functional to be minimized was defined as data and theoretical curves. On a log scale these differ- ences would be hardly visible. 6 Ourmeasurementsclearlyshowtheformationandevo- the resulting protein charges (see Table I) are relatively lutionofaninterferencepeakatsmallangles,astheionic large, it is reasonable to expect that the contribution strength decreases. The appearance of such a peak is of the higher-order terms might be appreciable. As the evidently due to increasing protein-protein interactions. protein concentration increases, this correction becomes In the same figure, the performance of our 1st-order ap- more and more significant, and eventually the rather proximationiscomparedwiththatofthecommonlyused good performance of our 1st-order approximation must 0th-order one. The 1st- order approximation yields a fit break down. ofrathergoodqualitythroughthewholemeasuredrange Since a direct computation of even the second order Q. The development of the interference peak, underes- correctionsdemandsa highcomputationaleffort,the ac- timated by the 0th-order approximation, is now well re- curacy of the 1st-order approximation may alternatively produced, indicating that the main physical features of be investigated by checking our RDF results against ex- the βLG solution are indeed taken into account by our act Monte Carlo or molecular dynamics simulation data simple interaction model. relevant to the same model. A simpler indication about In Fig. 2 the theoretical results for the average struc- the limits of validity of our scheme may come from a turefactorS (Q)areshownalongwiththeexperimental systematiccomparisonwithintegral-equationpredictions M data. While at high I (i.e. at weak effective interac- based upon more accurate closures. One could use, for S tions) the two approximations are practically undistin- instance, the multi-component version of the “rescaled guishable, for I ≤ 27mM the 1st-order results outplay MSA” approach (Ruiz-Estrada et al., 1990), which has S the 0th-order ones, mainly in the low- Q region. the advantage of being nearly fully analytical. On the A more transparent comparison between the two ap- other hand, if more accurate results are required, then proximationsiscarriedoutinFig.3atthelevelofRDFs. the Rogers-Young closure (Rogers and Young, 1984)is As I decreases, the 1st-order g (r) (i,j = 1,2) become preferable for our potential, but in this case the corre- S ij strongly different from the 0th-order ones, exhibiting a sponding integral equations must be solved numerically. peak ofincreasingheight. Interms ofpotentials ofmean We have planned some investigations in this sense, and force,g (r)>1insomeregions(mainlyforI ≤27mM) their resultswillbe reportedelsewhere. However,webe- ij S implies that W (r)<0, although u (r) always remains lieve that, at the considered protein concentration, the ij ij positive. The first-order correction ω(1)(r)n therefore 1st-order approximation does yield the correct trend of ij corresponds to an attractive contribution, due to an “os- the RDFs. It is our opinion that the inclusion of the motic depletion” effect (Asakura and Oosawa, 1954) ex- neglected terms cannot alter the qualitative (or semi- ertedontwogivenmacroparticlesbytheremainingones. quantitative) picture of βLG interactions supported by This many-body effect is clearly lacking in the 0th-order ourmodel,evenifslightlydifferentvaluesforthe best-fit approximation, as depicted in Fig. 3. Depletion forces parameters should be expected. arise when two protein molecules are close together. In Theparametervaluesresultingfromtheglobalbest-fit this case the pressure exerted on these molecules by all procedure, using the 0th-order and 1st-order approxima- other macroparticles becomes anisotropic, leading to a tions, are reported in Tabs. I and II. strong indirect protein-protein attraction, even though The improved quality of the fit corresponding to the all direct interactions are repulsive. first-order approximation can clearly be appreciated by It is worth stressing that the behavior of the 1st-order comparing not only the global χ2 value (Table I), but gij(r) at low ionic strength could be reproduced even above all the partial χ¯2m ones (Table II), in particular by the 0th-order approximation, but only at the cost for IS ≤ 27 mM. Although the change of global χ2 is of adding some unnecessary, and somewhat misleading, not so large,if one considersthe relative variationof the density-dependent attractive term to the direct pair po- χ¯2m’s (last column of Table II), then the improvement is tentials. Our model, based only on the physically sound rather evident for the low ionic strength samples, while repulsive part of the DLVO potential, turns out to be it becomes less and less important with increasing ionic rather accurate for the purposes of the present paper. strength. The proposed method is able to improve the We have also performed some calculations including a goodness of the fit by about 43% for the first sample Hamaker term into our perturbative scheme, without (where the interference peak is more pronounced). The finding any significative change in the 1st-order results decrease of the relative variation, as the ionic strength with respect to the previous ones. increases, is in agreement with the expected progressive The 1st-order RDFs shown in Fig. 3 are undoubtedly weakening of protein-protein repulsions. correctly shaped, although the peak heights might be Note that the values of both fitting parameters, i.e. modified by the neglected second- and higher-order cor- Z1 and ∆Gnel, turn out to be very similar for both ap- rections to the potentials of mean force. Unfortunately, proximations. Thescalingfactors,κm,andtheflatback- anestimateforthemagnitudeofthesuccessiveperturba- grounds,Bm,arealsosimilarforallsamplesandforboth tive terms (depending on both concentrationand charge approximations,confirmingthatnoothereffects,likede- of the protein molecules) is a far more complicate task naturation or larger aggregation,are really present. and goes beyond the scope of the present paper. Since 7 V. CONCLUSIONS dissociation free energy and the monomer charge. This finding means that our simple interaction model is al- In this paper we have presented a novel methodologi- readyabletodescribethemainstructuralfeaturesofthe cal approachto the study of protein-protein interactions examined βLG solutions. Satisfactory results obtained using SAXS techniques. Our workbuilds up upona pre- by many other structural studies on colloidal or pro- vious investigation by some of us (Baldini et al., 1999). tein solutions,basedupon similarvery simplifiedmodels As widely discussed by Baldini et al., 1999, the struc- (Wagner et al., 1991; Krause et al., 1991; D’Aguanno tural properties of βLG in acidic solution, studied by and Klein, 1992; D’Aguanno et al., 1992; Na¨gele et al., light and X-ray scattering over a wide range of ionic 1993; Wanderlingh et al., 1994), suggest that the use of strength and concentration,are consistentwith the exis- veryrefined potentials, containing a large number of dif- tence of monomers and dimers, and cannot be ascribed ferent contributions, is often unnecessary, at least at the to a denaturation process. first stages of a research. Using sophisticated interac- Since the form factors of both the species are eas- tion models may even be a nonsense, when coupled with ily known, the so-called “measured” or average struc- a simultaneous very rough treatment of the correlation ture factor S (Q) can be obtained from the ratio be- functions, as is often the case with the widely employed M tween experimental intensity and average form factor 0th-order approximation, in spite of the fact that the in- P(Q) at a certain monomer fraction x . S (Q) is re- troduction of a larger number of parameters can clearly 1 M latedto the protein-proteineffective interactions. Short- improvetheactualfittingofthedata. Moreover,wehave range attractive interactions like hydrogen bonds, re- pointedoutthat,eveninmodelswithpurelyrepulsivein- sponsibleofthe dimerformationandstronglydepending teractions,attractiveeffects(dueto“osmoticdepletion”) onthemonomer-monomerorientation,aretakenintoac- are predicted by every sufficiently accurate theory. On count using a quasi-chemical description of the thermo- the contrary, within the zero-density approximation for dynamic equilibriumbetweenmonomeranddimer forms theRDFs,thesameattractiveeffectsmaybereproduced of βLG. Thus, in addition to the hard core repulsions, only at the cost of adding artificial contributions to the the effective potentials of mean force only describe long- potentials. range monomer-monomer, monomer-dimer and dimer- Second, the proposed 1st-order approximation to the dimer electrostatic repulsions, which can be reduced to RDFs is really able to yield accurate predictions for the theirorientationalaverages,dependingonlyontheinter- average structure factor of weakly-concentrated protein molecular distance r. solutions,inarathersimplebutphysicallysoundway. It In the work by Baldini et al., 1999 all long-range isworthstressingthattheunderlyingcalculationscheme protein-protein forces were neglected, because the mea- isnotrestrictedtotheparticularmodelconsideredinthis suredSAXSintensitywasspanningaQ-rangewheresuch paper, but may be easily applied to different spherically interactions are essentially negligible. On the contrary, symmetric potentials. Although the limit of validity of we have explicitly addressed this issue in the present the 1st- order approximation is still an open question, work. To this aim, i) we have extended the range of which we are planning to investigate in future work, we measuredintensities to lowerQ values in orderto exper- thinkthatitmayrepresentanewusefultoolfortheanal- imentally probe these long-rangeinteractions,and ii) we ysis of experimental SAS data of globular protein solu- haveproposedasimplebutefficientperturbativescheme, tions, when their concentration is not too high and the whose first terms are able to yield reasonably accurate strengthoftheirinteractionforcesisnottoolarge. When RDFs for dilute or moderately concentrate solutions of these two conditions fail, then it is unavoidable to com- globular proteins, with a rather little computational ef- pute the correlation functions by exploiting some more fort. Inparticular,wehaveexplicitlycomputedthe0th− powerful method from the statistical mechanical theory and1st-orderapproximationsandcomparedtheirresults. ofliquids(HansenandMcDonald,1986). Wehope,how- The improvement in the quality of the fit for S (Q), ever,thatthispaperwillstimulatethe applicationofthe M obtainedwiththefirst-ordercorrectionforthepotentials proposed 1st-order approximationto different sets of ex- ofmeanforcecorrespondingtotheRDFs,withrespectto perimental data on proteins, as well as new theoretical the standard zero-density approximation, is particularly workon the quality and limit of this calculation scheme. visible at low ionic strength, where Coulomb repulsions arepoorly screened. In this case,the new representation ACKNOWLEDGEMENT of the RDFs is able to reproduce the interference peak present in the experimental S (Q), whereas the com- M monly used zero-density approximation turns out to be Thisworkhasbeenpartiallysupportedbythegrantfor quite inadequate at low ionic strength. the Advanced Research Project on Protein Crystalliza- Finally, two points are particularly noteworthy. tion“Procry”fromtheitalianIstitutoNazionalediFisica First, the adopted model allows a simultaneous fit of della Materia(INFM). We alsothank BrunoD’Aguanno nine SAS curves with only two free parameters, inde- and Giorgio Pastore for useful discussions. pendent of the ionic strength, i.e., the non-electrostatic 8 APPENDIX A: CALCULATION OF PROTEIN −βW (r)=−βu (r)+ω(1)(r)n+ω(2)(r)n2+..., ij ij ij ij FORM FACTORS (B1) In detail, the scattering particle is assumed to be ho- the exact power coefficients ω(k)(r) ( k = 1,2,...) can mogeneous and its size and shape are described by the ij becomputedbyusingstandarddiagrammatictechniques function s(r), which gives the probability that the point (Meeron,1958),whichyieldtheresultsintermsofappro- r ≡ (r,ω ) (where ω indicates the polar angles α and r r r priate multi- dimensional integrals of products of Mayer β ) lies within the particle. For compact particles, like r functions globularproteins,thisfunctioncanbewrittenintermsof auniquetwo-dimensionalangularshapefunctionF(ω ), r f ( r)=exp[−βu (r)]−1 (B2) as ij ij Within our approximation, we are only required to 1 r ≤F(ω ) s(r)= r (A1) computethefirstterm,whichinvolvesaconvolutionand exp{−[r−F(ω )]2/2σ2} r >F(ω ) r r (cid:26) turns out to be where σ is the width of the gaussian that accounts for the particle surface mobility (Svergun et al., 1998). The ωi(j1)(r)= xkγi(j1,)k(r)= xk dr′ fik(r′) fkj(|r−r′|), shapefunctionF(ω )isevaluatedbyfixingtheaxisorigin k k Z r X X onthemeanvalueoftheatomiccoordinatesandrunning (B3) over each atom m and taking the maximum distance r betweentheoriginandtheintersection,ifany,ofthevan where x =n /n is the molar fraction of species k. The k k der Waals sphere centered in m with the direction ωr. evaluation of the convolution integral γ(1)(r) is not a Assuming homogeneous particles belonging to species i, ij,k difficult taskinbipolarcoordinates. Integrationoveran- Mi randompoints are generatedfrom polar coordinates. gles is easily performed and γ(1)(r) reduces to a double The sampling is made for the variables α , cosβ and r3 ij,k r r integral, which can be written as in the ranges [0,2π], [−1,1] and [0,r3 ], respectively. max Following Eq. A1, if r ≤ F(ωr), the point is accepted, 2π ∞ x+r otherwise the probability P = exp{−[r−F(ωr)]2/2σ2} γi(j1,)k(r)= r dx [xfik(x)] dy [yfkj(y)]. (B4) is calculated. A random number y between 0 and 1 is Z0 Z|x−r| extracted and if y < P the point is accepted, otherwise (1) (1) We have evaluated all these γ (r) terms at the points is rejected. The p (r) histogram is then determined by ij,k taking into accounit the distances between the Mi points ri = i∆r (i = 1,...,500), with ∆r = 1˚A. At each ri and the centre, while the p(2)(r) histogram depends on value, the double integral has been carried out numeri- i cally, simply by using the trapezoidal rule for both x− the distances between all possible pairs of M points, i andy-integration. Forthex-integration,wehavechosen 1 Mi as upper limit the value xmax =max(xcut,R2+r), with p(i1)(r)= ∆rM H(∆r/2−|r−rn|), x(Acu2t)= R2 +12/κD (depending on the ionic strength), i and as grid size ∆x = x /200. For the y-integration, n=1 cut X 2 Mi−1 Mi ∆y =∆x. p(2)(r)= H(∆r/2−|r−r |), (A3) i ∆rM (M −1) nm i i n=1 m=n+1 X X where ∆r is the grid amplitude in the space of radial distance, r the distance between the centre and the n- n th point. Here r is the distance between the points n nm andm, andH(x) isthe Heavisidestepfunction (H(x)= 0 if x < 0 and H(x) = 1 if x ≥ 0). The number of random scattering centres was M = 2000, the grid size i was ∆r = 1˚A, while the width of the surface mobility was fixed to σ =2 ˚A. APPENDIX B: FIRST-ORDER PERTURBATIVE CORRECTIONS Inthedensityexpansionofthepotentialsofmeanforce W (r) ij 9 FIG. 2. Comparison between the measured structure fac- 6 tors SM(Q) for the βLG at pH=2.3 and concentration 10 gL−1 in different ionic strength conditions (as indicated aboveeachcurve). Thebestfitlinesresultingfromthesimul- 5 taneous analysis of the corresponding SAXS curves (Fig. 1) usingthe0th-order(dashed)and1st-order(solid)approxima- 507mM tions of the pair correlation functions are reported. Data for 4 207mM Q>0.12˚A−1 are not shown for clarity. u:) (a: 107mM Q) 3 (cid:10)]( 87mM d = (cid:6) 67mM d [ 2 47mM 27mM 1 17mM 7mM 507mM 8 207mM 0.05 0.1 0.15 0.2 0.25 0.3 107mM Q ((cid:23)A(cid:0)1) 6 FIG. 1. SAXS linear profiles for the βLG at pH=2.3 and concentration 10 gL−1 in different ionic strength conditions 87mM r) (as indicated above each curve). Points are experimental re- ( sults,whereasthedashedandthesolidlinesrepresentthebest gij 67mM 4 fitsobtainedbyapplyingthe0th-orderand1st-orderapprox- imations of the pair correlation functions, respectivley. The 47mM curvesare scaled for clarity bya factor 0.5. 27mM 2 17mM 7mM 0 0 100 200 300 400 0 100 200 300 400 r ((cid:23)A) 507mM FIG. 3. Partial correlation functions gij(r) resulting from the simultaneous analysis of the nine SAXS curves of Fig. 1 5 207mM (the ionic strength, IS, is indicated near each set of curves) by applying the 0th-order (left column) and 1st-order (right 107mM column) approximation in the density expansion of the mean-force potential. Depicted are the monomer-monomer, 4 87mM g11(r) (dotted lines), the monomer- dimer g12(r) (dashed lines)andthedimer-dimerg22(r)(solidline)correlationfunc- Q) 67mM tions. ( M S 3 47mM 27mM 2 17mM 7mM 1 0.04 0.06 0.08 0.1 0.12 (cid:0)1 Q ((cid:23)A ) 10