Table Of ContentJWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
Computational and Statistical
Methods for Protein Quantification
by Mass Spectrometry
i
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
Computational and Statistical
Methods for Protein Quantification
by Mass Spectrometry
Ingvar Eidhammer
DepartmentofInformatics, UniversityofBergen,Norway
Harald Barsnes
DepartmentofBiomedicine,UniversityofBergen,Norway
Geir Egil Eide
CentreforClinical Research,HaukelandUniversity
HospitalandDepartmentofPublicHealthandPrimaryHealthCare,
UniversityofBergen,Norway
Lennart Martens
DepartmentofBiochemistry,Faculty ofMedicineand
HealthSciences,GhentUniversity,Belgium
A John Wiley & Sons, Ltd., Publication
iii
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
Thiseditionfirstpublished2013
(cid:2)C 2013JohnWiley&Sons,Ltd
Registeredoffice
JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,United
Kingdom
Fordetailsofourglobaleditorialoffices,forcustomerservicesandforinformationabouthowtoapply
forpermissiontoreusethecopyrightmaterialinthisbookpleaseseeourwebsiteatwww.wiley.com.
Therightoftheauthortobeidentifiedastheauthorofthisworkhasbeenassertedinaccordancewiththe
Copyright,DesignsandPatentsAct1988.
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or
transmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,
exceptaspermittedbytheUKCopyright,DesignsandPatentsAct1988,withoutthepriorpermissionof
thepublisher.
Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmay
notbeavailableinelectronicbooks.
Designationsusedbycompaniestodistinguishtheirproductsareoftenclaimedastrademarks.Allbrand
namesandproductnamesusedinthisbookaretradenames,servicemarks,trademarksorregistered
trademarksoftheirrespectiveowners.Thepublisherisnotassociatedwithanyproductorvendor
mentionedinthisbook.Thispublicationisdesignedtoprovideaccurateandauthoritativeinformationin
regardtothesubjectmattercovered.Itissoldontheunderstandingthatthepublisherisnotengagedin
renderingprofessionalservices.Ifprofessionaladviceorotherexpertassistanceisrequired,theservices
ofacompetentprofessionalshouldbesought.
LibraryofCongressCataloging-in-PublicationDataappliedfor.
AcataloguerecordforthisbookisavailablefromtheBritishLibrary.
ISBN:978-1-119-96400-1
Typesetin10/12ptTimesbyAptaraInc.,NewDelhi,India
iv
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
Contents
Preface xv
Terminology xvii
Acknowledgements xix
1 Introduction 1
1.1 Thecompositionofanorganism 1
1.1.1 Asimplemodelofanorganism 1
1.1.2 Compositionofcells 3
1.2 Homeostasis,physiology,andpathology 4
1.3 Proteinsynthesis 4
1.4 Site,sample,state,andenvironment 4
1.5 Abundanceandexpression–proteinandproteomeprofiles 5
1.5.1 Theproteindynamicrange 6
1.6 Theimportanceofexactspecificationofsitesandstates 6
1.6.1 Biologicalfeatures 7
1.6.2 Physiologicalandpathologicalfeatures 7
1.6.3 Inputfeatures 7
1.6.4 Externalfeatures 7
1.6.5 Activityfeatures 7
1.6.6 Thecellcycle 8
1.7 Relativeandabsolutequantification 8
1.7.1 Relativequantification 8
1.7.2 Absolutequantification 9
1.8 Invivoandinvitroexperiments 9
1.9 Goalsforquantitativeproteinexperiments 10
1.10 Exercises 10
2 CorrelationsofmRNAandproteinabundances 12
2.1 Investigatingthecorrelation 12
2.2 Codonbias 14
2.3 Mainresultsfromexperiments 15
2.4 TheidealcaseformRNA-proteincomparison 16
v
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
vi CONTENTS
2.5 Exploringcorrelationacrossgenes 17
2.6 Exploringcorrelationwithinonegene 18
2.7 Correlationacrosssubsets 18
2.8 ComparingmRNAandproteinabundancesacrossgenesfrom
twosituations 19
2.9 Exercises 20
2.10 Bibliographicnotes 21
3 Proteinlevelquantification 22
3.1 Two-dimensionalgels 22
3.1.1 Comparingresultsfromdifferentexperiments–DIGE 23
3.2 Proteinarrays 23
3.2.1 Forwardarrays 24
3.2.2 Reversearrays 25
3.2.3 Detectionofbindingmolecules 25
3.2.4 Analysisofproteinarrayreadouts 25
3.3 Westernblotting 25
3.4 ELISA–Enzyme-LinkedImmunosorbentAssay 26
3.5 Bibliographicnotes 26
4 Massspectrometryandproteinidentification 27
4.1 Massspectrometry 27
4.1.1 Peptidemassfingerprinting(PMF) 28
4.1.2 MS/MS–tandemMS 29
4.1.3 Massspectrometers 29
4.2 Isotopecompositionofpeptides 32
4.2.1 Predictingtheisotopeintensitydistribution 34
4.2.2 Estimatingthecharge 34
4.2.3 Revealingisotopepatterns 34
4.3 Presentingtheintensities–thespectra 36
4.4 Peakintensitycalculation 38
4.5 PeptideidentificationbyMS/MSspectra 38
4.5.1 Spectralcomparison 41
4.5.2 Sequentialcomparison 41
4.5.3 Scoring 42
4.5.4 Statisticalsignificance 42
4.6 Theproteininferenceproblem 42
4.6.1 Determiningmaximalexplanatorysets 44
4.6.2 Determiningminimalexplanatorysets 44
4.7 Falsediscoveryratefortheidentifications 44
4.7.1 Constructingthedecoydatabase 45
4.7.2 Separateorcompositesearch 46
4.8 Exercises 46
4.9 Bibliographicnotes 47
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
CONTENTS vii
5 Proteinquantificationbymassspectrometry 48
5.1 Situations,protein,andpeptidevariants 48
5.1.1 Situation 48
5.1.2 Proteinvariants–peptidevariants 48
5.2 Replicates 49
5.3 Run–experiment–project 50
5.3.1 LC-MS/MSrun 50
5.3.2 Quantificationrun 51
5.3.3 Quantificationexperiment 52
5.3.4 Quantificationproject 52
5.3.5 Planningquantificationexperiments 52
5.4 Comparingquantificationapproaches/methods 54
5.4.1 Accuracy 54
5.4.2 Precision 55
5.4.3 Repeatabilityandreproducibility 56
5.4.4 Dynamicrangeandlineardynamicrange 56
5.4.5 Limitofblank–LOB 56
5.4.6 Limitofdetection–LOD 57
5.4.7 Limitofquantification–LOQ 57
5.4.8 Sensitivity 57
5.4.9 Selectivity 57
5.5 ClassificationofapproachesforquantificationusingLC-MS/MS 57
5.5.1 Discoveryortargetedproteinquantification 58
5.5.2 Labelbasedvs.labelfreequantification 59
5.5.3 Abundancedetermination–ioncurrentvs.peptide
identification 60
5.5.4 Classification 60
5.6 Thepeptide(occurrence)space 60
5.7 Ionchromatograms 62
5.8 Frompeptidestoproteinabundances 62
5.8.1 Combinedsingleabundancefromsingleabundances 64
5.8.2 Relativeabundancefromsingleabundances 65
5.8.3 Combinedrelativeabundancefromrelativeabundances 66
5.9 Proteininferenceandproteinabundancecalculation 67
5.9.1 Useofthepeptidesinproteinabundancecalculation 67
5.9.2 Classifyingtheproteins 68
5.9.3 Cansharedpeptidesbeusedforquantification? 68
5.10 Peptidetables 70
5.11 Assumptionsforrelativequantification 70
5.12 Analysisfordifferentiallyabundantproteins 71
5.13 Normalizationofdata 71
5.14 Exercises 72
5.15 Bibliographicnotes 74
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
viii CONTENTS
6 Statisticalnormalization 75
6.1 Someillustrativeexamples 75
6.2 Non-normallydistributedpopulations 76
6.2.1 Skeweddistributions 76
6.2.2 Measuresofskewness 76
6.2.3 Steepnessofthepeak–kurtosis 77
6.3 Testingfornormality 78
6.3.1 Normalprobabilityplot 79
6.3.2 Someteststatisticsfornormalitytesting 81
6.4 Outliers 82
6.4.1 Teststatisticsfortheidentificationofasingleoutlier 83
6.4.2 Testingformorethanoneoutlier 86
6.4.3 Robuststatisticsformeanandstandarddeviation 88
6.4.4 Outliersinregression 89
6.5 Varianceinequality 90
6.6 Normalizationandlogarithmictransformation 90
6.6.1 Thelogarithmicfunction 90
6.6.2 Choosingthebase 91
6.6.3 Logarithmicnormalizationofpeptide/proteinratios 91
6.6.4 Pitfallsoflogarithmictransformations 92
6.6.5 Variancestabilizationbylogarithmictransformation 92
6.6.6 Logarithmicscaleforpresentation 93
6.7 Exercises 94
6.8 Bibliographicnotes 95
7 Experimentalnormalization 96
7.1 Sourcesofvariationandlevelofnormalization 96
7.2 Spectralnormalization 98
7.2.1 Scalebasednormalization 99
7.2.2 Rankbasednormalization 101
7.2.3 Combiningscalebasedandrankbasednormalization 101
7.2.4 Reproducibilityofthenormalizationmethods 102
7.3 Normalizationatthepeptideandproteinlevel 103
7.4 Normalizingusingsum,mean,andmedian 104
7.5 MA-plotfornormalization 104
7.5.1 Globalintensitynormalization 105
7.5.2 Linearregressionnormalization 106
7.6 Localregressionnormalization–LOWESS 106
7.7 Quantilenormalization 107
7.8 Overfitting 108
7.9 Exercises 109
7.10 Bibliographicnotes 109
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
CONTENTS ix
8 Statisticalanalysis 110
8.1 Useofreplicatesforstatisticalanalysis 110
8.2 Usingasetofproteinsforstatisticalanalysis 111
8.2.1 Z-variable 111
8.2.2 G-statistic 112
8.2.3 Fisher–Irwinexacttest 115
8.3 Missingvalues 116
8.3.1 Reasonsformissingvalues 116
8.3.2 Handlingmissingvalues 118
8.4 Predictionandhypothesistesting 118
8.4.1 Predictionerrors 119
8.4.2 Hypothesistesting 120
8.5 Statisticalsignificanceformultipletesting 121
8.5.1 Falsepositiveratecontrol 122
8.5.2 Falsediscoveryratecontrol 123
8.6 Exercises 127
8.7 Bibliographicnotes 128
9 Labelbasedquantification 129
9.1 Labelingtechniquesforlabelbasedquantification 129
9.2 Labelrequirements 130
9.3 Labelsandlabelingproperties 130
9.3.1 Quantificationlevel 130
9.3.2 Labelincorporation 131
9.3.3 Incorporationlevel 131
9.3.4 Numberofcomparedsamples 132
9.3.5 Commonlabels 132
9.4 Experimentalrequirements 132
9.5 Recognizingcorrespondingpeptidevariants 133
9.5.1 RecognizingpeptidevariantsinMSspectra 133
9.5.2 RecognizingpeptidevariantsinMS/MSspectra 134
9.6 Referencefreevs.referencebased 135
9.6.1 Referencefreequantification 135
9.6.2 Referencebasedquantification 135
9.7 Labelingconsiderations 136
9.8 Exercises 136
9.9 Bibliographicnotes 137
10 ReporterbasedMS/MSquantification 138
10.1 Isobariclabels 138
10.2 iTRAQ 140
10.2.1 Fragmentation 141
10.2.2 Reporterionintensities 143
10.2.3 iTRAQ8-plex 144
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
x CONTENTS
10.3 TMT–TandemMassTag 145
10.4 Reporterbasedquantificationruns 145
10.5 Identificationandquantification 145
10.6 Peptidetable 147
10.7 Reporterbasedquantificationexperiments 147
10.7.1 NormalizationacrossLC-MS/MSruns–useofa
referencesample 147
10.7.2 NormalizingwithinanLC-MS/MSrun 149
10.7.3 Fromreporterintensitiestoproteinabundances 149
10.7.4 Findingdifferentiallyabundantproteins 150
10.7.5 Distributingthereplicatesonthequantificationruns 151
10.7.6 Protocols 152
10.8 Exercises 152
10.9 Bibliographicnotes 153
11 FragmentbasedMS/MSquantification 155
11.1 Thelabelmasses 155
11.2 Identification 157
11.3 Peptideandproteinquantification 158
11.4 Exercises 158
11.5 Bibliographicnotes 159
12 LabelbasedquantificationbyMSspectra 160
12.1 Differentlabelingtechniques 160
12.1.1 Metaboliclabeling–SILAC 160
12.1.2 Chemicallabeling 162
12.1.3 Enzymaticlabeling–18O 165
12.2 Experimentalsetup 166
12.3 MaxQuantasamodel 167
12.3.1 HL-pairs 167
12.3.2 ReliabilityofHL-pairs 169
12.3.3 Reliableproteinresults 169
12.4 TheMaxQuantprocedure 169
12.4.1 RecognizeHL-pairs 169
12.4.2 EstimateHL-ratios 176
12.4.3 IdentifyHL-pairsbydatabasesearch 177
12.4.4 Inferproteindata 181
12.5 Exercises 183
12.6 Bibliographicnotes 184
13 LabelfreequantificationbyMSspectra 185
13.1 Anidealcase–twoproteinsamples 185
13.2 Therealworld 186
13.2.1 Multiplesamples 187
JWST252-fm JWST252-Eidhammer Printer:YettoCome October31,2012 10:44 Trim:229mm×152mm
CONTENTS xi
13.3 Experimentalsetup 187
13.4 Forms 187
13.5 Thequantificationprocess 188
13.6 Formdetection 189
13.7 Pair-wiseretentiontimecorrection 191
13.7.1 Determiningpotentiallycorrespondingforms 191
13.7.2 Linearcorrections 192
13.7.3 Nonlinearcorrections 192
13.8 Approachesforformtupledetection 193
13.9 Pair-wisealignment 193
13.9.1 Distancebetweenforms 194
13.9.2 Findinganoptimalalignment 195
13.10 Usingareferencerunforalignment 196
13.11 Completepair-wisealignment 197
13.12 Hierarchicalprogressivealignment 197
13.12.1 Measuringthesimilarityorthedistanceoftworuns 198
13.12.2 Constructingstaticguidetrees 198
13.12.3 Constructingdynamicguidetrees 199
13.12.4 Aligningsubalignments 199
13.12.5 SuperHirn 199
13.13 Simultaneousiterativealignment 200
13.13.1 ConstructingtheinitialalignmentinXCMS 200
13.13.2 Changingtheinitialalignment 201
13.14 Theendresultandfurtheranalysis 202
13.15 Exercises 202
13.16 Bibliographicnotes 204
14 LabelfreequantificationbyMS/MSspectra 205
14.1 Abundancemeasurements 205
14.2 Normalization 207
14.3 Proposedmethods 207
14.4 Methodsforsingleabundancecalculation 207
14.4.1 emPAI 208
14.4.2 PMSS 208
14.4.3 NSAF 209
14.4.4 SI 209
14.5 Methodsforrelativeabundancecalculation 210
14.5.1 PASC 210
14.5.2 RIBAR 210
14.5.3 xRIBAR 211
14.6 Comparingmethods 212
14.6.1 AnanalysisbyGriffin 212
14.6.2 AnanalysisbyColaert 213
14.7 Improvingthereliabilityofspectralcountquantification 213