Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves Rhys M. Adams1,2,3, Thierry Mora4,∗, and Aleksandra M. Walczak1,∗, Justin B. Kinney2,∗ 1LaboratoiredePhysiqueTh´eorique, UMR8549, CNRSandE´coleNormaleSup´erieure, 24, rueLhomond, 75005Paris, France; 2Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungown Rd., Cold Spring Harbor, NY, 11724, USA; 3Francis Crick Institute, 1 Midland Rd, London NW1 1AT, United Kingdom; and 4Laboratoire de Physique Statistique, UMR8550, CNRS and E´cole Normale Sup´erieure, 24, rue Lhomond, 75005 Paris, France. ∗Equal contribution. (Dated: November 16, 2016) 6 1 Despite the central role that antibodies play in the adaptive immune system and in biotechnol- 0 ogy, much remains unknown about the quantitative relationship between an antibody’s amino acid 2 sequenceanditsantigenbindingaffinity. Herewedescribeanewexperimentalapproach,calledTite- v Seq,thatiscapableofmeasuringbindingtitrationcurvesandcorrespondingaffinitiesforthousands o of variant antibodies in parallel. The measurement of titration curves eliminates the confounding N effectsofantibodyexpressionandstabilitythatariseinstandarddeepmutationalscanningassays. WedemonstrateTite-SeqontheCDR1HandCDR3Hregionsofawell-studiedscFvantibody. Our 5 datashedlightonthestructuralbasisforantigenbindingaffinityandsuggestsaroleforsecondary 1 CDRloopsinestablishingantibodystability. Tite-Seqfillsalargegapintheabilitytomeasurecrit- ical aspects of the adaptive immune system, and can be readily used for studying sequence-affinity ] landscapes in other protein systems. M Q I. INTRODUCTION and mammalian cell display [9], have been developed . o for optimizing antibodies ex vivo. Advances in DNA se- i quencing technology have also made it possible to effec- b During an infection, the immune system must rec- - tively monitor both antibody and T-cell receptor diver- ognize and neutralize invading pathogens. B-cells con- q sity within immune repertoires, e.g. in healthy individ- [ tribute to immune defense by producing antibodies, pro- teins that bind specifically to foreign antigens. The as- uals [10–21], in specific tissues [22], in individuals with 2 diseases [23] or following vaccination [24–28]. Yet many tonishing capability of antibodies to recognize virtually v questions remain about basic aspects of the quantita- anyforeignmoleculehasbeenrepurposedbyscientistsin 0 tive relationship between antibody sequence and antigen 6 wide variety of experimental techniques (immunofluores- bindingaffinity. Howmanydifferentantibodieswillbind 1 cence,westernblots,ELISA,ChIP-Seq,etc.). Antibody- a given antigen with specified affinity? How large of a 2 based therapeutic drugs have also been developed for role do epistatic interactions between amino acid posi- 0 treating many different diseases, including cancer [1]. . tions within the CDRs have on antigen binding affinity? 1 Much is known about the qualitative mechanisms of How is this sequence-affinity landscape navigated by the 0 antibody generation and function [2]. The antigenic V(D)J recombination process, or by somatic hypermuta- 6 specificityofantibodiesinhumans,mice,andmostjawed 1 tion? Answering these and related questions is likely to vertebrates is primarily governed by six complementar- : prove critical for developing a systems-level understand- v ity determining regions (CDRs), each roughly 10 amino ing of the adaptive immune system, as well as for using i acids(aa)long. ThreeCDRs(denotedCDR1H,CDR2H, X antibody repertoire sequencing to diagnose and monitor and CDR3H) are located on the antibody heavy chain, r disease. a and three are on the light chain. During B-cell differ- entiation, these six sequences are randomized through Recently developed “deep mutational scanning” V(D)J recombination, then selected for functionality as (DMS)assays[29]provideonepotentialmethodformea- well as against the ability to recognize host antigens. suring binding affinities with high enough throughput to Upon participation in an immune response, CDR re- effectivelyexploreantibodysequence-affinitylandscapes. gions can further undergo somatic hypermutation and In DMS experiments, one begins with a library of vari- selection, yielding higher-affinity antibodies for specific ants of a specific protein. Proteins that have high levels antigens. Among the CDRs, CDR3H is the most highly of a particular activity of interest are then enriched via variable and typically contributes the most to antigen oneormoreroundsofselection,whichcanbecarriedout specificity; less clear are the functional roles of the other inavarietyofways. Thesetofenrichedsequencesisthen CDRs, which often do not interact with the target anti- comparedtotheinitiallibrary,andproteinsequences(or gen directly. mutations within these sequences) are scored according Many high-throughput techniques, including phage to how much this enrichment procedure increases their display [3–5], ribosome display [6], yeast display [7, 8], prevalence. 2 Multiple DMS assays have been described for investi- recognition that are independent of effects on antibody gating protein-ligand binding affinity. But no DMS as- stability. say has yet been shown to provide quantitative binding affinity measurements, i.e., dissociation constants in mo- larunits. Forexample,oneofthefirstDMSexperiments II. RESULTS [30]usedphagedisplaytechnologytomeasurehowmuta- tions in a WW domain affect the affinity of this domain A. Overview of Tite-Seq foritspeptideligand. Thesedataweresufficienttocom- puteenrichmentratiosandcorrespondingsequencelogos, Our general strategy is illustrated in Fig. 1. First, but they did not yield quantitative affinities. Analogous a library of variant antibodies is displayed on the sur- experimentshavesincebeenperformedonantibodiesus- face of yeast cells (Fig. 1A). The composition of this li- ingyeastdisplay[31,32]andmammaliancelldisplay[9], brary is such that each cell displays a single antibody but these approaches do not provide quantitative affin- variant, and each variant is expressed on the surface of ity values either. SORTCERY [31, 33], a DMS assay multiple cells. Cells are then incubated with the anti- that combines yeast display and quantitative modeling, gen of interest, bound antigen is fluroescently labeled, has been shown to provide approximate rank-order val- and fluorescence-activated cell sorting (FACS) is used to uesfortheaffinityofaspecificproteinforshortunstruc- sort cells one-by-one into multiple “bins” based on this tured peptides of varying sequence. Determining quanti- fluorescent readout (Fig. 1B). Deep sequencing is then tativeaffinitiesfromSORTCERYdata,however,requires usedtosurveytheantibodyvariantspresentineachbin. separate low-throughput calibration measurements [31]. Because each variant antibody is sorted multiple times, Moreover, it is unclear how well SORTCERY, if applied it will be associated with a histogram of counts spread to a library of folded proteins rather than unstructured across one or more bins (Fig. 1C). The spread in each peptides, can distinguish sequence-dependence effects on histogramisduetocell-to-cellvariabilityinantibodyex- affinity from sequence-dependent effects on protein ex- pression, and to the inherent noisiness of flow cytometry pression and stability. measurements. Finally, the histogram corresponding to To enable massively parallel measurements of abso- eachantibodyvariantisusedtocomputean“averagebin lutebindingaffinitiesforantibodiesandotherstructured number” (Fig. 1C, dots), which serves as a proxy mea- proteins, we have developed an assay called “Tite-Seq.” surement for the average amount of bound antigen per Tite-Seq, like SORTCERY, builds on the capabilities of cell. Sort-Seq, an experimental strategy that was first devel- It has previously been shown that KD values can be opedforstudyingtranscriptionalregulatorysequencesin accurately measured using yeast-displayed antibodies by bacteria [34]. Sort-Seq combines fluorescence-activated taking binding titration curves, i.e., by measuring the cell sorting (FACS) with high-throughput sequencing to average amount of bound antigen as a function of anti- provide massively parallel measurements of cellular flu- gen concentration [8, 38]. The median fluorescence f of orescence. In the Tite-Seq assay, Sort-Seq is applied to labeled cells is expected to be related to antigen concen- antibodies displayed on the surface of yeast cells and in- tration via cubated with antigen at a wide range of concentrations. c f =A +B (1) Fromtheresultingsequencedata,thousandsofantibody- c+K D antigen binding titration curves and their corresponding where A is proportional to the number of functional an- absolutedissociationconstants(heredenotedK )canbe D tibodies displayed on the cell surface, B accounts for inferred. Byassayingfullbindingcurves,Tite-Seqisable background fluorescence, and c is the concentration of tomeasureaffinitiesovermanyordersofmagnitude[35]. free antigen in solution. Fig. 1D illustrates the shape Moreover, the resulting sequence-dependent affinity val- of curves having this form. By using flow cytometry ues are not confounded by potential sequence-dependent to measure f on clonal populations of yeast at differ- variation in protein expression or stability, as is the case ent antigen concentrations c, one can infer curves having with SORTCERY and other DMS assays. thesigmoidalformshowninEq.1andtherebylearnK . D WedemonstratedTite-Seqonaproteinlibraryderived Such measurements, however, can only be performed in fromawell-studiedsingle-chainvariablefragment(scFv) a low-throughput manner. antibodyspecifictothesmallmoleculefluorescein[7,36]. Tite-Seq allows thousands of binding titration curves Mutations were restricted to CDR1H and CDR3H re- to be measured in parallel. The Sort-Seq procedure il- gions, which are known to play an important role in the lustrated in Fig. 1A-C is performed at multiple antigen antigen recognition of this scFv [36, 37]. The result- concentrations, andtheresultingaveragebinnumberfor ing affinity measurements were validated with binding each variant antibody is plotted against concentration. curves for a handful of clones measured using standard Sigmoidal curves are then fit to these proxy measure- low-throughput flow cytometry. Our Tite-Seq measure- ments, enabling K values to be inferred for each vari- D ments reveal both expected and unexpected differences ant. betweentheeffectsofmutationsinCDR1HandCDR3H. We emphasize that K values cannot, in general, be D Thesedataalsoshedlightonstructuralaspectsofantigen accuratelyinferredfromSort-Seqexperimentsperformed 3 A B A B C library bin 0 (low) 1H PE V L scFv V c-Myc H 3H BV HA Aga2 bin 1 (medium) s s s s V L SORT SEQ V Aga1 H cell wall bin 2 (high) C D variable: TFSDYWMNWV GSYYGMDYWG variation 1H 3H 100 bp 1 codon 600 600 1L 2L 3L 1H2H 3H 2 codon 1100 1100 0 1 2 Aga2 HA VL VH c-Myc 3 codon 150 150 bin FIG. 2. Yeast display construct and antibody li- D E braries. (A) Co-crystal structure of the 4-4-20 (WT) anti- 2 bodyfrom[39](PDBcode1FLR).TheCDR1HandCDR3H u]103 regions are colored blue and red, respectively. (B) The yeast a [ e display scFv construct from [7] that was used in this study. c orescen102 eanbin1 Adynet.ibTohdey-abmouounndtaonftisguernfa(cfleu-oexrepsrceesisne)dwparsovteisinuawliazsedseupsainragtPelEy flu m visualizedusingBVdye. ApproximatelocationoftheCDR1H n a (blue) and CDR3H (red) regions within the scFv are illus- e m101 trated. (C) The gene coding for this scFv construct, with 0 10−9 10−7 10−5 10−9 10−7 10−5 the six CDR regions indicated. The WT sequence of the two antigen[M] antigen[M] 10 aa variable regions are also shown. (D) The number of 1-, 2-, and 3-codon variants present in the 1H and 3H scFv libraries. Fig. S1 shows the cloning vector used to construct FIG. 1. Schematic illustration of Tite-Seq. (A) A li- the CDR1H and CDR3H libraries, as well as the form of the brary of variant antibodies (various colors) are displayed on resulting expression plasmids. the surface of yeast cells (tan). (B) The library is exposed to antigen (green triangles) at a defined concentration, cell- bound antigen is fluorescently labeled, and FACS is used to sort cells into bins according to measured fluorescence. (C) Theantibodyvariantsineachbinaresequencedandthedis- tributionofeachvariantacrossbinsiscomputed(histograms; colorscorrespondtospecificvariants). Themeanbinnumber at a single antigen concentration. Because the relation- (dot) is then used to quantify the typical amount of bound ship between binding and KD is sigmoidal, the amount antigen per cell. (D) Binding titration curves (solid lines) of bound antigen provides a quantitative readout of KD and corresponding K values (vertical lines) can be inferred onlywhentheconcentrationofantigenusedinthelabel- D for individual antibody sequences by using the mean fluores- ing procedure is comparable in magnitude to K . How- D cencevalues(dots)obtainedfromflowcytometryexperiments ever, single mutations within a protein binding domain performedonclonalpopulationsofantibody-displayingyeast. often change K by multiple orders of magnitude. Sort- D (E) Tite-Seq consists of performing the Sort-Seq experiment Seq experiments used to measure sequence-affinity land- in panels A-C at multiple antigen concentrations, then infer- scapesmustthereforebecarriedoutoverarangeofcon- ring binding curves using mean bin number as a proxy for centrations large enough to encompass this variation. mean cellular fluorescence. This enables K measurements D for thousands of variant antibodies in parallel. We note that theTite-SeqresultsillustratedinpanelEweresimulatedus- Furthermore, as illustrated in Figs. 1C and 1D, dif- ingthreebinsunderidealizedexperimentalconditions,asde- ferent antibody variants often lead to different levels of scribed in Appendix A. The inference of binding curves from functional antibody expression on the yeast cell surface. realTite-Seqdataismoreinvolvedthanthispanelmightsug- If one performs Sort-Seq at a single antigen concentra- gest, due to the multiple sources of experimental noise that tion, high affinity (low K ) variants with low expression must be accounted for. D (blue variant) may bind less antigen than low affinity (high K ) variants with high expression (orange vari- D ant). Only by measuring full titration curves can the effect that sequence has on affinity be deconvolved from sequence-dependent effects on functional protein expres- sion. 4 B. Proof-of-principle Tite-Seq experiments A C cells 107 expression 0 To test the feasibility of Tite-Seq, we used a well- 0 characterized antibody-antigen system: the 4-4-20 sin- gle chain variable fragment (scFv) antibody [7], which 10−9.5 10−9.5 106 binds the small molecule fluorescein with K = 1.2 nM 10−9.0 D [8]. This system was used in early work to establish the 10−9.0 [M] 10−8.5 capabilities of yeast display [7], and a high resolution cein 10−8.0 105 co-crystal structure of the 4-4-20 antibody bound to flu- 10−8.5 ores 10−7.5 orescein,showninFig.2A,hasbeendetermined[39]. An flu 10−7.0 ultra-high-affinity (KD = 270 fM) variant of this scFv, 10−8.0 10−6.5 104 called 4m5.3, has also been found [36]. In what follows, 10−6.0 wvaeriraenfetrftroomthe[346-]4a-2s0OsPcFTv.from[7]asWT,andthe4m5.3 M] 10−7.5 10−5.5 The scFv was expressed on the surface of yeast as ein[ 10−5.0 103 part of the multi-domain construct illustrated in Fig. oresc 10−7.0 0 1bin2 3 2B and previously described in [7]. Following [36], we flu used fluorescein-biotin as the antigen and labeled scFv- 10−6.5 D reads boundantigenwithstreptavidin-RPE(PE).Theamount 107 ofsurface-expressedproteinwasseparatelyquantifiedby 10−6.0 expression labeling the C-terminal c-Myc tag using anti-c-Myc pri- 0 mary antibodies, followed by secondary antibodies con- 10−5.5 10−9.5 106 jugatedtoBrilliantViolet421(BV).SeeAppendixBfor 10−9.0 details on this labeling procedure. 10−5.0 [M] 10−8.5 ouTslwy.oIdniffthereen“t1Hsc”Flvibrliabrrya,raies10waeraeraegssioanyeedncsiommuplatasinneg- 102 103 104 escein 1100−−87..05 105 the CDR1H region of the WT scFv (see Fig. 2C) was PEsignal[au] fluor 10−7.0 mutagenized using a microarray-synthesized oligos (see Appendix C for details on library generation). The re- B 1100−−66..50 104 sulting 1H library consisted of all 600 single-codon vari- ants of this 10 aa region, 1100 randomly chosen 2-codon 10−5.5 variants,and150random3-codonvariants(Fig.2D).An 10−5.0 103 analogous “3H” library was generated for a 10 aa region 102 103 104 105 0 1 2 3 BVsignal[au] bin containing the CDR3H region of this scFv. In all of the Tite-Seqexperimentsdescribedbelow,thesetwolibraries FIG.3. DetailsofourTite-Seqexperiments. (A)Gates were pooled together and supplemented with WT and used to sort cells based on PE fluorescence, which provides a OPT scFvs, as well with a nonfunctional scFv referred readout of bound antigen. Cells were labeled at the eleven to as ∆. different antigen concentrations. Shades of red indicate the Tite-Seq was carried out as follows. Yeast cells ex- fourfluorescencegates usedtosortcells; these correspondto pressingscFvfromthemixedlibrarywereincubatedwith bins 0,1,2, and 3 (from left to right). (B) Gates, indicated in fluorescein-biotin at one of eleven concentrations: 0 M, shadesofpurple,usedtosortcellsbasedonBVfluorescence, 10−9.5 M, 10−9 M, 10−8.5 M, 10−8 M, 10−7.5 M, 10−7 which provides a readout of antibody expression. (C) The M, 10−6.5 M, 10−6 M, 10−5.5 M, and 10−5 M. After sub- number of cells sorted into each bin. (D) The number of sequent PE labeling of bound antigen, cells were sorted Illumina reads obtained from each bin of sorted cells after into four bins using FACS (Fig. 3A). Separately, BV- quality control measures were applied. The data shown in this figure corresponds to a single Tite-Seq experiment. Fig. labeled cells were sorted according to measured scFv ex- S2 and Fig. S3 show data for two independent replicates of pression levels (Fig. 3B). The number of cells sorted into this experiment. each bin is shown in Fig. 3C. Each bin of cells was re- grown and bulk DNA was extracted. The 1H and 3H variableregionswerethenPCRamplifiedandsequenced usingpaired-endIlluminasequencing,asdescribedinAp- Tite-Seq experiment (Fig. 4A). As illustrated in Fig. 1E, pendix D. The final data set consisted of an average of this fitting procedure uses the sigmoidal function in Eq. 2.6 106 sequences per bin across all 48 bins (Fig. 3D). 1 to model mean bin number as a function of antigen × Threeindependentreplicatesofthisexperimentwereper- concentration. However,theneedtoaccountformultiple formed on three different days. sources of noise in the Tite-Seq experiment necessitates For each variant scFv gene, a K value was inferred a more complex procedure than Fig. 1E might suggest; D by fitting a binding curve to the resulting Tite-Seq data, the details of this inference procedure are described in with separate curves independently fit to data from each Appendix E. 5 A B The Sort-Seq data obtained by sorting the BV-labeled 3 104 librarieswereusedtoseparatelydeterminetheexpression 2 103 level of each scFv. In what follows we use E to denote 1 OPT 102 OPT themeanbincomputedforeachscFvgeneinthelibrary. 0 3 These E values are scaled so that the mean of such mea- 104 2 surements for all synonymous WT scFv gene variants is 103 1 1.0. 0 C107 au]102 C107 meanbin123 WT fluroescence[111000234 WT C. Low-throughput validation experiments 03 C5 mean104 C5 To judge the accuracy of Tite-Seq, we separately mea- 2 sured binding curves for individual scFv clones as de- 103 1 scribed for Fig. 1D. In addition to the WT, OPT, and 102 0 ∆ scFvs, we assayed eight clones from the 1H library 3 C45 104 C45 (named C3, C5, C7, C18, C22, C132, C133 and C144) 2 103 and eight clones from the 3H library (C39, C45, C93, 1 102 C94, C102, C103, C107, C112). Each clone underwent 0 the same labeling procedure as in the Tite-Seq experi- 0 10−9 10−7 10−5 0 10−9 10−7 10−5 fluorescein[M] fluorescein[M] ment, after which median fluorescence values were mea- C sured using standard flow cytometry. KD values were 10−5 theninferred byfittingbinding curvesof theformin Eq. R2=0.89 1 using the procedure described in Appendix F. These Seq 10−6 P=1.0·10−9 curves, which can be directly compared to the Tite-Seq Tite- 10−7 measurement (Fig.4A), are plotted in Fig. 4B; at least K[M],D 10−8 tchlorneee.rSeepeliFcaigte.Sb4infodrintghectuitrrvaetsiownecruervmeesaosfuarelldthfeorteesatechd 10−9 clones. 10−10 10−10 10−9 10−8 10−7 10−6 10−5 KD[M],flow D. Tite-Seq can measure dissociation constants FIG.4. Accuracy and precision of Tite-Seq. (A)Bind- ingcurvesandK measurementsinferredfromTite-Seqdata. D Fig. 4C reveals a strong correspondence between the (B) Mean fluorescence values (dots) and corresponding in- ferredbindingcurves(lines)obtainedbyflowcytometrymea- KD values measured by Tite-Seq and those measured surements for five selected scFvs (WT, OPT, C5, C45, and using low-throughput flow cytometry. The robustness C107). In (A,B), values corresponding to 0 M fluorescein are of Tite-Seq is further illustrated by the consistency of plottedontheleft-mostedgeoftheplot,dottedlinesshowthe K values measured for the WT scFv. Using Tite- D upper(10−5M)andlower(10−9.5M)limitsonKDsensitivity, Seq, and averaging the results from the 33 synonymous vertical lines show inferred KD values, and different shades variants and over all three replicates, we determined correspond to different replicate experiments. (C) Compar- K = 10−8.87±0.02 M for the WT scFv. These mea- D ison of the Tite-Seq-measured and flow-cytometry-measured surementsareconsistentwiththemeasurementofK = K valuesforallclonestested. ColorsindicatedifferentscFv D D 10−8.61±0.07 M obtained by averaging low-throughput protein sequences as follows: WT (purple), OPT (green), ∆ flow cytometry measurements across 10 replicates, and (black),1Hclones(blue),and3Hclones(red). EachK value D coincides with the previously measured value of 1.2 nM plotted along either axis indicates the mean log K value obtainedacrossallreplicates,witherrorbarsindi1c0atinDgstan- = 10−8.9 M reported in [8]. The three independent dard error. Clones with K outside of the affinity range are replicate Tite-Seq experiments give reproducible results D drawn on the boundaries of this range, which are indicated (Fig. S5 and S6) with Pearson coefficients ranging from with dotted lines. The amino acid sequences and measured r =0.82tor =0.89forallthemeasuredK values. The D KD values for all clones tested are provided in Table I. Fig. error bars for KD values in Fig. 4C calculated from the S4 provides plots, analogous to those in panels A and B, for variability of the fits to different replicates support the all of the assayed clones. Fig. S5 compares the K and E D reproducibility of the experiment. values obtained across all three Tite-Seq replicates. Fig. S6 The necessity of performing K measurements over a providesadditionalinformationaboutthecompositionofthe D widerangeofantigenconcentrationsisillustratedinFig. libraries assayed in each replicate experiment. Fig. S7 illus- S7. At each antigen concentration used in our Tite-Seq trates the poor correlation between the enrichment of scFvs in the high-PE bins with the Tite-seq measured K values. experiments,theenrichmentofscFvsinthehigh-PEbins D Fig.S8showsdifferencesinthefractionofdisplayedreceptors correlated poorly with the KD values inferred from full thatarefunctionalinOPTandWTclones. Fig.S9illustrates titration curves. Moreover, at each antigen concentra- thesimulationsweusedtotestouranalysispipeline,andFig. tion used, a detectable correlation between K and en- D S10illustratestheabilityofourpipelinetocorrectlyinferK D values from these simulated data. 6 richment was found only for scFvs with K values close A B D 1H 3H to that concentration. E ≥10−5.0 E ≥10−5.0 D D Fig.S8suggestsapossiblereasonfortheweakcorrela- R R K K tionbetweenKD valuesandenrichmentinhigh-PEbins. H 10−6.0 H 10−6.0 P P Wefoundthat,atsaturatingconcentrationsoffluorescein C C Q Q (m2uµcMh)fl,ucoerlelssceexinpraessscienlglstehxeprOesPsTingsctFhve WboTunsdcFtvw.icTehaiss STN 10−7.0 [M]D STN 10−7.0 [M]D differencewasnotduetovariationinthetotalamountof W K W K Y Y displayed scFv, which one might control for by labeling F 10−8.0 F 10−8.0 M M the c-Myc epitope as in [33]. Rather, this difference in L L I I binding reflects a difference in the fraction of displayed AV 10−9.0 AV 10−9.0 secxFpversimtheanttsarpeecrofomrmpeetdenatttaosbiningdleaannttiiggeenn.cYoenacsetndtrisaptiloany G 28 31 34 37 ≤10−9.5 G100 103 106 109 ≤10−9.5 VHposition VHposition cannot distinguish such differences in the fraction of dis- playedscFvmoleculesthatfunctionproperlyfromdiffer- C D ences in scFv affinity. 1H 3H E ≥1.4 E ≥1.4 TofurthertestthecapabilityofTite-Seqtoinferdisso- D D R R ciationconstantsfromsequencingdataoverawiderange K K of values, as well as to validate our analysis procedures, PH 1.2 PH 1.2 C C wesimulatedTite-Seqdatain silicoandanalyzedthere- Q Q N N sultsusingthesameanalysispipelinethatweusedforour ST 1.0 E ST 1.0 E experiments. Details about the simulations are given in W W AppendixG.ThesimulateddataisillustratedinFig.S9. FY FY KD values inferred from these simulated data agreed to LM 0.8 LM 0.8 high accuracy with the K used in the simulation (Fig. I I D V V S10), thus validating our analysis pipeline. A A G 0.6 G 0.6 28 31 34 37 ≤ 100 103 106 109 ≤ VHposition VHposition E. Properties of the affinity and expression landscapes FIG. 5. Effects of substitution mutations on affinity and expression. Heatmaps show the measured effects on affinity (A,B) and expression (C,D) of all single amino acid Fig. 5 shows the effect that every single-amino-acid substitutionswithinthevariablesregionsofthe1H(A,C)and substitution mutation within the 1H and 3H variable re- 3H (B,D) libraries. Purple dots indicate residues of the WT gions has on affinity and on expression; histograms of scFv. Greendotsindicatenon-WTresiduesintheOPTscFv. these effects are provided in Fig. S11. In both regions, Fig.S11provideshistogramsofthenon-WTvaluesdisplayed the large majority of mutations weaken antigen binding in panels A-D. Fig. S12 compares the effects on K of both D (1H:88%;3H:93%),withmanymutationsincreasingK single-point and multi-point mutations. D above our detection threshold of 10−5 M (1H: 36%; 3H: 52%). Far fewer mutations reduced K (1H: 12%; 3H: D 7%),andveryfewdroppedK belowourdetectionlimit betrue,ofcourse; still, thisresultprovidesanimportant D of 10−9.5 M (1H: 0%; 3H: 3%). Histograms of the effect validation of our Tite-Seq measurements. of 2 or 3 amino acid changes relative to WT, shown in TofurthervalidateourTite-Seqaffinitymeasurements, Fig. S12A, show that multiple random mutations tend we examined positions in the high affinity OPT scFv to further deteriorate affinity. We also observed that (from[36])thatdifferfromWTandthatliewithinthe1H mutations within the 3H variable region have a larger and 3H variable regions. As illustrated in Figs. 5A and effect on affinity than do mutations in the 1H variable 5B, five of the six OPT-specific mutations reduce K or D region. Specifically, single amino acid mutations in 3H are nearly neutral. Previous structural analysis [37] has were seen to increased K more than mutations in 1H suggested that D106E, the only OPT mutation that we D (1H median K = 10−6.84; 3H median K (cid:38) 10−5.0; find significantly increases K , may indeed disrupt anti- D D D P = 4.7 10−4, one-sided Mann-Whitney U test). This gen binding on its own while still increasing affinity in × result suggests that binding affinity is more sensitive to the presence of the S101A mutation. variationinCDR3HthantovariationinCDR1H,afind- Next, we used our measurements to build a “matrix ing that is consistent with the conventional understand- model” [42] (also known as a “position-specific affinity ing of these antibody CDR regions [40, 41]. matrix,” or PSAM [43]) describing the sequence-affinity Ourobservationsarethusfullyconsistentwiththehy- landscapeofthesetworegions. Ourmodelassumedthat pothesis that the amino acid sequences of the CDR1H the log K value for an arbitrary amino acid sequence 10 D and CDR3H regions of the WT scFv have been selected could be computed from the log K value of the WT 10 D for high affinity binding to fluorescein. We know this to scFv,plusthemeasuredchangeinlog K producedby 10 D 7 therefore correctly predicts that OPT has higher affin- A D 31 Y ity than WT. The quantitative affinity predicted by our Y 103 102 modeldoesnotmatchtheknownaffinityoftheOPTscFv S30 W33 (KD =10−12.6 M),butthisisunsurprisingforthreerea- sons. FirsttheOPTscFvdiffersfromWTin14residues, T28 only 6 of which are inside the 1H and 3H variable re- gions assayed here. Second, one of the OPT mutations F Y 29 32 M34 G104 (W108L) reduces KD below our detection threshold of S101 10−9.5 M;inbuildingourmatrixmodel,wesetthisvalue G 100 equal to 10−9.5, knowing it would likely underestimate W the affinity-increasing effect of the mutation. Third, our 36 N additive model ignores potential epistatic interactions. 35 Still, we thought it worth asking how likely it it would V Y107 D106M105 hi be for 6 random mutations within the 1H and 3H vari- 37 W able regions to reduce affinity as much as our model pre- 108 SE dicts for OPT. We therefore simulated a large number G (107) of variants having a total of 6 substitution muta- 109 low tions randomly scattered across the 1H and 3H variable low hi SK regions. Thefractionoftheserandomsequencesthathad B R2=0.46,P=0.001 C R2=0.01,P=0.647 anaffinityatorbelowourpredictedaffinityforOPTwas 4.7 10−5. This finding is fully consistent with the fact 4 4 × that the mutations in OPT relative to WT were selected 3 for increased affinity, an additional confirmation of the SK2 SK 2 validity of our Tite-Seq measurements. 1 The sequence-expression landscape measured in our separate Sort-Seq experiment yielded qualitatively dif- 0 0 0 4 8 12 4 8 12 16 ferent results (Figs. 5C and 5D). We observed no signif- numberofcontacts distancetoantigen(A˚) icant difference in the median effect that mutations in D R2=0.10,P=0.170 E R2=0.19,P=0.056 the variable regions of 1H (median E = 0.826) versus 3H (median E = 0.822) have on expression (P = 0.96, 6 6 two-sided Mann-Whitney U test); see also Fig. S11. The 4 4 variance in these effects, however, was larger in 3H than SE SE in 1H (P = 9.9 10−16, Levene’s test). These results 2 2 suggesttwothing×s. First,the3Hvariableregionappears to have a larger effect on scFv expression than the 1H 0 0 0 4 8 12 4 8 12 16 variable region has. At the same time, since we observe numberofcontacts distancetoantigen(A˚) fewer beneficial mutations in 1H (Fig. 5 C) than in 3H (Fig. 5 D), the WT sequence appears to be more highly optimizedforexpressioninCDR1HthaninCDR3H.The FIG. 6. Structural context of mutational effects. (A) effect of double or triple mutations further reduced ex- Crystalstructure[39]oftheCDR1HandCDR3Hvariablere- pression in both CDRs (Fig. S12B), similar to what was gions of the WT scFv in complex with fluorescein (green). observed for affinity. Each residue (CDR1H: positions 28-37; CDR3H: positions 100-109) is colored according to the S and S values com- K E puted for that position. These variables, S and S , respec- K E tively quantify the sensitivity of K and E to amino acid F. Structural correlates of the sequence-affinity D substitutionsateachposition,withlargervaluesgreatersen- landscape sitivity; see Eqs. 2 and 3 for definitions of these quantities. (B,C) For each position in the CDR1H and CDR3H variable We asked if the sensitivity of the antibody to muta- regions, S is plotted against either (B) the number of con- K tions could be understood from a structural perspective. tacts the WT residue makes within the protein structure, or To quantify sensitivity of affinity and expression at each (C)thedistanceoftheWTresiduetothefluoresceinmolecule. position i, we computed two quantities: (D,E) Similarly, S is plotted against either (D) the number E of contacts or (E) the distance to the antigen. Si =(cid:114)(cid:68)(cid:0)log Kia log KWT(cid:1)2(cid:69) , (2) K 10 D − 10 D a|i (cid:114)(cid:68) (cid:69) each amino acid substitution away from WT. We evalu- SEi = (Eia−EWT)2 a|i. (3) ated our matrix models on the 1H and 3H variable re- gions of OPT, finding an affinity of 10−9.16 M. Our sim- Here,KWTandEWTrespectivelydenotethedissociation D plemodelforthesequenceaffinitylandscapeofthisscFv constantandexpressionlevelmeasuredfortheWTscFv, 8 Kia and Eia denote analogous quantities for the scFv More generally, changing a protein’s amino acid se- D with a single substitution mutation of amino acid a at quence can be expected to change multiple biochemical position i, and denotes an average computed over properties of that protein. Our work emphasizes the im- a|i (cid:104)·(cid:105) the 19 non-WT amino acids at that position. portance of designing massively parallel assays that can Fig. 6A shows the known structure [39] of the 1H and disentangle these multiple effects so that measurements 3H variable regions of the WT scFv in complex with flu- of a specific activity of interest can be obtained. Tite- orescein. Each residue is colored according to the S Seq provides a general solution to this problem for mas- K and S values computed for its position. To get a bet- sively parallel studies of protein-ligand binding. Indeed, E terunderstandingofwhataspectsofthestructuremight theTite-Seqproceduredescribedherecanbereadilyap- govern affinity, we plotted S values against two other pliedtoanyproteinbindingassaythatiscompatiblewith K quantities: the number of amino acid contacts made by yeastdisplayandFACS.Manysuchassayshavebeende- the WTresidue within the antibody structure (Fig.6B), veloped[47]. WeexpectthatTite-Seqcanalsobereadily and the distance between the WT residue and the anti- adapted for use with other expression platforms, such as gen(Fig.6C).WefoundastrongcorrelationbetweenS mammalian cell display [9]. K andthenumberofcontacts,butnosignificantcorrelation Our Tite-Seq measurements reveal interesting distinc- betweenS anddistancetoantigen. Bycontrast,S did tions between the effects of mutations in the CDR1H K E not correlate significantly with either of these structural and CDR3H regions of the anti-fluorescein scFv anti- quantities (Figs. 6D and 6E). body studied here. As expected, we found that varia- tioninandaroundCDR3Hhadalargereffectonaffinity than variation in and around CDR1H. We also found III. DISCUSSION that CDR1H is more optimized for protein expression than is CDR3H, an unexpected finding that appears to be novel. Yeast display expression levels are known to We have described a massively parallel assay, called correlate with thermostability [48]. Our data is limited Tite-Seq, for measuring the sequence-affinity landscape in scope, and we remain cautious about generalizing our of antibodies. The range of affinities measured in our observations to arbitrary antibody-antigen interactions. Tite-Seq experiments (10−9.5 M to 10−5.0 M) includes a Still, this finding suggests the possibility that secondary large fraction of the physiological range relevant to affin- CDR regions (such as CDR1H) might be evolutionar- ity maturation (10−10 M to 10−6 M to) [44–46]. Ex- panding the measured range∼of affinities below 10−9.5 M ily optimized to help ensure antibody stability, thereby freeing up CDR3H to encode antigen specificity. If this mightrequirelargervolumelabelingreactions,butwould hypothesisholds,itcouldprovideabiochemicalrationale bestraight-forward. Tite-Seqthereforeprovidesapoten- for why CDR3H is more likely than CDR1H to be mu- tiallypowerfulmethodformappingthesequence-affinity tated in functioning receptors [41] and why variation in trajectories of antibodies during the affinity maturation CDR3H is often sufficient to establish antigen specificity process,aswellasforstudyingotheraspectsoftheadap- [40]. tive immune response. Tite-Seq can also potentially shed light on the struc- Tite-Seq fundamentally differs from prior DMS exper- turalbasisforantibody-antigenrecognition. Bycompar- iments in that full binding titration curves, not enrich- ing the effects of mutations with the known antibody- ment statistics, are used to determine binding affinities. fluoresceinco-crystalstructure[39],weidentifiedastrong The measurement of binding curves provides three ma- correlationbetweentheeffectthatapositionhasonaffin- jor advantages. First, binding curves provide absolute ityandthenumberofmolecularcontactsthattheresidue K values in molar units, not just rank-order affinities D atthatpositionmakeswithintheantibody. Bycontrast, like those provided by SORTCERY [31]. Second, be- no such correlation of expression with this number of cause ligand binding is a sigmoidal function of affinity, contacts is observed. Again, we are cautious about gen- DMS experiments performed at a single ligand concen- eralizingfromobservationsmadeonasingleantibody. If tration are insensitive to receptor K s that differ sub- D our observation were to hold for other antibodies, how- stantially from this ligand concentration. Yet mutations ever, it would suggest that paratopes might represent within a protein’s binding domain often change K by D protein “sectors” [49], with antigen specificity (but not multiple orders of magnitude. Binding curves, by con- antibody stability) governed by tightly interacting net- trast, integrate measurements over a wide range of con- works of residues. centrationsandarethereforesensitivetoawiderangeof K s. Third,proteinsequencedeterminesnotjustligand- D bindingaffinity,butalsotheamountofsurface-displayed IV. METHODS protein,aswellasthefractionofthisproteinthatisfunc- tional. Our data demonstrate that these confounding ef- fects can strongly distort yeast display affinity measure- Tite-Seq was performed as follows. Variant 3H and ments made at a single antigen concentration, and that 1H regions were generated using microarray-synthesized thebindingcurvesmeasuredbyTite-Seqsuccessfullyde- oligos (LC Biosciences, Houston TX. USA). These were convolve these effects from affinity measurements. inserted into the 4-4-20 scFv of [7] using cassette- 9 replacement restriction cloning as in [34]; see Appendix Appendix B: Yeast display C. Yeast display experiments were performed as previ- ously described [36] with modifications; see Appendix B. To help ensure consistency across samples, the yeast Sorted cells were regrown and bulk DNA was extracted display cultures used in our low-throughput flow cytom- usingstandardtechniques, andampliconscontainingthe etrymeasurementsandinourTite-Seqexperimentswere 1H and 3H variable regions were amplified using PCR inoculated with carefully prepared frozen liquid culture and sequenced using the Illumina NextSeq platform; see inocula. Specifically, inoculation cultures were grown at AppendixD.Threereplicateexperimentswereperformed 30◦CinSC-trp+2%glucosetoanOD600valuebetween on different days. Raw sequencing data has been posted 0.9 and 1.1, then stored at 80◦ in aliquots containing on the Sequence Read Archive under BioProject ID PR- − 10%glycerolandeither0.4mlODofcells(forclones)or JNA344711. Low-throughput flow cytometry measure- · 1 mlOD of cells (for libraries). ments were performed on clones randomly picked from · Theexpressionofyeast-displayedscFvswasinducedas the Tite-Seq library. Sequence data and flow cytometry follows. Liquid cultures of SC-trp + 2% glucose were in- data were analyzed using custom Python scripts, as de- oculated using single frozen inocula, yielding an approx- scribedinAppendicesEandF.Processeddataandanal- imatestartingODof0.05. Theseculturesweregrownat ysis scripts are available at github.com/jbkinney/ 30◦C for 8 hours; the final OD of these cultures was ap- 16_titeseq. proximately0.7. Cellswerethenspundownat1932gfor Acknowledgements. WewouldliketothankJacklyn 8minutesat4◦C,resuspendedinSC-trp+2%galactose Jansen, AmyKeating, LotherReich, andBruceStillman + 0.1% glucose at 0.2 OD, and incubated for 16 hr at forhelpfuldiscussions. WewouldalsoliketothankDane 20◦C. We note that adding 0.1% glucose to these galac- Wittrup for sharing plasmids and yeast strains. RMA, toseinductioncultureswasessentialforreliablyachieving TM and AMW were supported by European Research scFv expression in a large fraction of yeast cells. Council Starting Grant n. 306312. JBK was supported Induced yeast were fluorescently labeled as follows. by the Simons Center for Quantitative Biology at Cold Galactoseinductioncultureswerespundownandwashed Spring Harbor Laboratory. with ice cold TBS-BSA (0.2 mg/ml BSA, 50 mM Tris, The authors declare that they have no conflict of in- 25 mM NaCl, pH 8). This yielded approximately 5.3 terests. mlODofcellsfortite-seqFACS.Forantigenbindingre- · actions, cells were then resuspended in a primary label- ingreactioncontaining40mlTBS-BSAandbiotinylated fluorescein(ThermoFisherB1370)ataconcentrationbe- tween0Mand10−5M,thenincubatedwithshakingfor1 Appendix A: Schematic simulations hour at room temperature. Reaction volumes were large enough to ensure that (cid:38) 10 antigen molecules per scFv For panels D and E of Fig. 1, data was simulated (us- were present, assuming 105 scFvs per cell [7]. Cells ∼ ing Eq. 1) for two hypothetical scFvs: one similar to were then washed twice with 40 ml ice cold TBS-BSA, WT, with K = 1.2 10−9 M, A = 300, and B = 10, suspended in a secondary labeling reaction containing 1 D and one similar to a×typical mutant, with K = 10−6 ml ice-cold TBS-BSA and 7 µg/ml streptavidin R-PE D M, A = 1000, B = 10. Simulated sorts were performed (ThermoFisher S866), and incubated for 30 min at 4◦C at the eleven antigen concentrations used in our exper- while shaking. Cells were then spun down and resus- iments (c = 0 M, 10−9.5 M, 10−9.0 M, ..., 10−5.0 M). pended in ice cold TBS-BSA and saved for FACS later For each clone at each antigen concentration, fluores- that day. Expression labeling reactions proceeded in the cencesignalsweresimulatedfor1000cellsbymultiplying same manner, except that the primary labeling reaction the f quantity in Eq. 1 by a factor of exp(η) where η is contained1.4µg/mlrabbitanti-c-Mycantibody(Sigma- a normally distributed random number. Fig. 1D shows AldrichC3956)inplaceoftheantigen,andthesecondary the mean values of these simulated fluorescence signals. labelingreactioncontained0.8µg/mlBV421-conjugated CurvesoftheforminEq.1werefittothesedatabymin- donkeyanti-rabbitantibody(BioLegend406410)inplace imizing the square deviation between predicted log f ofstreptavidinR-PE.Thelabelingreactionsusedtofilter 10 values and the log mean of the simulated fluoresence out improperly cloned scFvs (as described in Appendix 10 values. The Tite-Seq measurements illustrated in Fig. C)proceededinthesamemannerastheexpressionlabel- 1E were simulated by sorting 1000 cells, using fluores- ing reaction, except that 0.8 µg/ml mouse anti-HA an- cencevaluesgeneratedinthesamemannerasabove,into tibody (Roche 11583816001) was added to the primary three bins defined by the following fluorescence bound- labeling reaction, while 0.4 µg/ml APC-conjugated anti- aries: (0,30) for bin 0, (30,300) for bin 1, and (300, ) mouse antibody (BD Biosciences 550826) was added to ∞ for bin 2. The mean bin number for each clone at each the secondary labeling reaction. For clonal flow cytome- antigen concentration was then computed. Curves hav- try measurements, excluding secondary labeling we kept ing the form in Eq. 1 were then fit to these data by min- reagent and cell concentrations the same as described imizing the square deviation of predicted log f values above, but reduced reaction volumes 27-fold. Secondary 10 from these mean bin values. labeling reactions with streptavidin R-PE were done at 10 4 µg/ml 112.5 µl to facilitate mixing. Secondary label- then chosen for low-throughput K measurements. D ing reactions with 0.8 µg/ml BV421-conjugated donkey anti-rabbit antibody were performed in 60 µl. Appendix D: Tite-Seq procedure Appendix C: Cloning strategy The inocula used for our Tite-Seq experiments com- prisedyeastharboringthe1Hand3HpRA11plasmidli- Amplicons containing variable CDR1H or CDR3H re- braries,mixedinequalproportions,andspikedat0.625% gions were generated as follows. An oligonucleotide li- withOPT-containingyeast(asapositivecontrol)andat brary containing mutagenized 1H and 3H variable re- 0.625%with∆-containingyeast(negativecontrol). Cells gions (see Table II) was generated by LC Sciences were then grown, induced, and labeled with antigen at using microarray-based synthesis. The specific oli- elevendifferentconcentrations(0M,10−9.5 M,10−9.0 M, gosusedareprovidedatgithub.com/jbkinney/16_ ..., 10−5.0 M) as described in Appendix B. titeseq. 1H and 3H library oligos were separately am- Each batch of labeled cells was then sorted, using plifiedviaPCRusingprimersoRAL10andoRAR10(for FACS, over a period of approximately 20 min. During 1H)oroRAL11andoRAR11(for3H).Oligoscontaining FACS, cells were first filtered based on forward scatter the WT sequence were amplified from plasmid pCT302 and side scatter to help ensure exactly one live cell per [7] using primers 1H2F and 1H1R (for the 1H region) or droplet. Cells passing this criterion were sorted into 4 3H1F and 3H2R (for the 3H region). Overlap-extension bins based on R-PE fluorescence. The fluorescence gates PCR using primers oRA10 and oRA11, one oligo library used in these sorts were kept the same across all antigen (1Hor3H)andthecomplementaryWToligo(3Hor1H, concentrations(seeFigs.3,S2,andS3). Cellsweresorted respectively), and plasmid pCT302, were then used to into a rounded 5 ml polypropylene tube containing 1 ml create the iRA11 amplicon library (Fig. S1A). Note that 2X YPAD media. In our separate Sort-Seq experiments each amplicon in this library has mutations only in the assaying scFv expression levels, cells were prepared and 1H variable region or in the 3H variable region, but not sortedinthesameway,saveforthechangestothelabel- in both of these regions. ingreactiondescribedinAppendixBandtheuseofgates The pRA10 cloning vector (Fig. S1B) was assem- on BV421 fluorescence instead of R-PE fluorescence. bled using Gibson cloning [50] with template plasmids Each of the 48 bins of sorted cells, as well as a sample pCT302 [7] and pJK14 [34]. pCT302 is the yeast dis- of unsorted cells, were then deposited in 5 ml of SC-trp playexpressionplasmidcontainingtheWTscFv. pJK14 + 2% glucose and regrown overnight at 30◦C. Approxi- contains a ccdB cloning cassette flanked by outward- mately 25 mlOD of cells were spun down, resuspended · facing BsmBI restriction sites. pRA10 closely resembles in a lysis reaction containing 200 µl 0.5 mm glass beads, pCT302, except that it contains the ccdB cassette from 200 µl of Phenol/chloroform/isoamyl alcohol and 200 µl pJK14 in place of the region of the scFv gene that we of yeast lysis buffer (10 mM NaCl, 1 mM Tris, 0.1 mM aimed to mutagenize. Multiple spurious BsmBI restric- EDTA, 0.2% Triton X-100, 0.1% SDS), and vortexed for tion sites present pCT302 were also removed in pRA10. 30min. 200µlofwaterwasadded,cellswerespundown, pRA10waspropagatedinEscherichia coli strainDB3.1, and the aqueous layer was extracted. Four subsequent which is resistant to the CcdB toxin. extractions were performed, the first two using 200 µl of The pRA11 plasmid library (Fig. S1C) was gener- Phenol/chloraform/isoamyl alcohol, the second two us- ated by digesting pRA10 with BsmBI, digesting the ing 200 µl for chloroform/isoamyl alcohol. Bulk nucleic iRA11 amplicon library with BsaI, and subsequent lig- acid was then ethanol precipitated and resuspended in ation with T4 DNA ligase. Ligation reactions were de- 100 µl of IDTE (Integrated DNA Technologies). salted and transformed into DH10B E. coli via electro- Two rounds of PCR were then performed on each of poration, yielding (cid:38)108 transformants. The 1H and 3H the 49 samples of bulk nucleic acid. In the first round of libraries were cloned separately. PCR,primersL1AF XXandL2AF XXwereusedtoam- ThepRA11librarieswereintroducedintotheEBY100 plify the 1H-to-3H region and to add a bin-specific bar- strain of Saccharomyces cerevisiae using high-efficiency code(numberedXX=01,02,...,64)oneitherendofthe LiAc transformation [51]. This yielded (cid:38) 105 transfor- 1H-to-3Hregion;seeFig.S1. TokeepPCRcrossovertoa mants. To filter out yeast containing improperly cloned minimum,only15PCRcycleswereused. These49PCR scFvs, we induced scFv expression, immuno-affinity la- reactions were then pooled, purified using a QIAquick beled the HA and c-Myc epitopes on the scFv, and used PCR purification kit (Qiagen), and used as template for FACStorecover8 105 2 106 cellsthatregisteredpos- secondroundofPCRwithprimersPE1v3extandPE2v3. × − × itive for both epitopes. The scFv induction and labeling Again,tokeepcrossovertoaminimum,only25PCRcy- procedures used to do this are described in Appendix B. cles were used. This PCR reaction was again purified, 144 yeast clones were picked at random from this library mixedwithPhiXDNA(at 25%molarity)andsubmit- ∼ and submitted for low-throughput Sanger sequencing of ted for sequencing using the Illumina NextSeq platform. the 1H and 3H variable regions of the scFv. Based on Analysis of the resulting sequence data revealed some preliminaryTite-Seqexperiments,19ofthesecloneswere of the 49 FACS bins to be highly under-sampled. PCR

