Table Of ContentGo Optimizations 101
TapirLiu
Contents
0.1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1 AboutGoOptimizations101 7
1.1 Abouttheauthor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 ValuePartsandValueSizes 9
2.1 Valuesandvalueparts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Value/typesizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Detailedtypesizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Memoryalignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Structpadding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Valuecopycostsandsmall-sizetypes/values . . . . . . . . . . . . . . . . . . . . 13
2.7 Valuecopyscenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 MemoryAllocations 22
3.1 Memoryblocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Memoryallocationplaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Memoryallocationscenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Memorywastingcausedbyallocatedmemoryblockslargerthanneeded . . . . . . 23
3.5 Reducememoryallocationsandsavememory . . . . . . . . . . . . . . . . . . . 25
3.6 Avoidunnecessaryallocationsbyallocatingenoughinadvance . . . . . . . . . . 25
3.7 Avoidallocationsifpossible . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8 Savememoryandreduceallocationsbycombiningmemoryblocks . . . . . . . . 29
3.9 Usevaluecachepooltoavoidsomeallocations . . . . . . . . . . . . . . . . . . 31
4 StackandEscapeAnalysis 33
4.1 Goroutinestacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Escapeanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Escapeanalysisexamples . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Stackframesoffunctioncalls . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Stacksgrowthandshrinkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5 Memorywastingcausedbyunusedstackspaces . . . . . . . . . . . . . . . . . . 39
4.6 Forallkindsofreasons,avalue(part)willescapetoheapevenifitisonlyusedin
onegoroutine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6.1 Alocalvariablesdeclaredinaloopwillescapetoheapifitisreferenced
byavalueoutoftheloop . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6.2 Thevaluepartsreferencedbyanargumentwillescapetoheapiftheargu-
mentispassedtointerfacemethodcalls . . . . . . . . . . . . . . . . . . 40
1
4.6.3 Areflect.ValueOffunctioncallmakesthevaluesreferencedbyitsar-
gumentescapetoheap . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6.4 Thevaluesreferencedbyfunctionreturnresultswillescape . . . . . . . . 41
4.7 Functioninlinemightaffectescapeanalysisresults. . . . . . . . . . . . . . . . . 42
4.7.1 Functioninliningisnotalwayshelpfulforescapeanalysis . . . . . . . . . 43
4.8 Controlmemoryblockallocationplaces . . . . . . . . . . . . . . . . . . . . . . 43
4.8.1 Ensureavalueisallocatedonheap . . . . . . . . . . . . . . . . . . . . . 44
4.8.2 Useexplicitvaluecopiestohelpcompilersdetectsomevaluesdon’tescape 45
4.8.3 Memorysizethresholdsusedbythecompilertomakeallocationplacement
decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.8.4 Usesmallerthresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8.5 Allocatethebackingarrayofasliceonstackevenifitssizeislargerthan
orequalto64K(butnotlargerthan10M) . . . . . . . . . . . . . . . . . 51
4.8.6 Allocatethebackingarrayofaslicewithanarbitrarylengthonstack . . . 52
4.8.7 Moretrickstoallocatearbitrary-sizevaluesonstack . . . . . . . . . . . . 53
4.9 Growstackinlesstimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 GarbageCollection 56
5.1 GCpacer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 AutomaticGCmightaffectGoprogramexecutionperformance . . . . . . . . . . 57
5.3 HowtoreduceGCpressure? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Memoryfragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5 Memorywastingcausedbysharingmemoryblocks . . . . . . . . . . . . . . . . 58
5.6 Trytogeneratelessshort-livedmemoryblockstolowerautomaticGCfrequency . 59
5.7 UsenewheapmemorypercentagestrategytocontrolautomaticGCfrequency. . . 59
5.8 SinceGotoolchain1.18,thelargerGCroots,thelargerGCcycleintervals. . . . . 62
5.9 UsememoryballaststoavoidfrequentGCcycles . . . . . . . . . . . . . . . . . 65
5.10 UseGotoolchain1.19introducedmemorylimitstrategytoavoidfrequentGCcycles 66
6 Pointers 68
6.1 Avoidunnecessarynilarraypointerchecksinaloop . . . . . . . . . . . . . . . . 68
6.1.1 Thecaseinwhichanarraypointerisastructfield . . . . . . . . . . . . . 69
6.2 Avoidunnecessarypointerdereferencesinaloop . . . . . . . . . . . . . . . . . 71
7 Structs 74
7.1 Avoidaccessingfieldsofastructinaloopthoughpointerstothestruct . . . . . . 74
7.2 Small-sizestructsareoptimizedspecially . . . . . . . . . . . . . . . . . . . . . 75
7.3 Makestructsizesmallerbyadjustingfieldorders . . . . . . . . . . . . . . . . . 75
8 ArraysandSlices 76
8.1 Avoidusingliteralsoflarge-sizearraytypesascomparisonoperands . . . . . . . 76
8.2 Usingslice-to-array-pointerconversionsintroducedinGo1.17tocopyslices . . . 77
8.3 Themakeandappendbuiltinfunctionimplementations . . . . . . . . . . . . . . 78
8.4 Trytoclipthefirstargumentofanappendcallifweknowthecallwillallocate . . 81
8.5 Growslices(enlargeslicecapacities) . . . . . . . . . . . . . . . . . . . . . . . . 81
8.6 Trytogrowasliceinonestep . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.7 Cloneslices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.8 Mergetwoslices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.9 Mergemorethantwoslices(intoanewslice) . . . . . . . . . . . . . . . . . . . 83
8.10 Insertasliceintoanotherone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.11 Don’tusetheseconditerationvariableinafor-rangeloopifhighperformance
isdemanded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2
8.12 Resetallelementsofanarrayorslice . . . . . . . . . . . . . . . . . . . . . . . 87
8.13 Specifycapacityexplicitlyinsubsliceexpression . . . . . . . . . . . . . . . . . 88
8.14 Useindextablestosavesomecomparisons . . . . . . . . . . . . . . . . . . . . 89
9 StringandByteSlices 90
9.1 Conversionsbetweenstringsandbyteslices . . . . . . . . . . . . . . . . . . . . 90
9.1.1 Astring-to-byte-sliceconversionfollowingtherangekeyworddoesn’tal-
locate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.1.2 A byte-slice-to-string conversion appearing as a comparison operand
doesn’tallocate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.1.3 Abyte-slice-to-stringconversionappearingastheindexkeyofamapele-
mentretrievalexpressiondoesn’tallocate . . . . . . . . . . . . . . . . . 92
9.1.4 A byte-slice-to-string conversion appearing as an operand in a string
concatenation expression doesn’t allocate if at least one of concatenated
operandsisanon-blankstringconstant . . . . . . . . . . . . . . . . . . . 94
9.1.5 Iftheresultofanoperationisastirngorbyteslice, andthelengthofthe
resultislargerthan32,thenthebyteelementsoftheresultwillbealways
allocatedonheap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.2 Efficientwaystoconcatenatestrings . . . . . . . . . . . . . . . . . . . . . . . . 96
9.2.1 Thestrings.Builderwaymightwastesomememorysometimes . . . . 98
9.2.2 Usebyteslicetoconcatenatestrings . . . . . . . . . . . . . . . . . . . . 98
9.3 Mergeastringandabytesliceintoanewbyteslice . . . . . . . . . . . . . . . . 99
9.4 Thestrings.Comparefunctionisnotveryperformantnow . . . . . . . . . . . 100
9.5 Avoidallocationsifpossible . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10 BCE(BoundCheckEliminate) 106
10.1 Example1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10.2 Example2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10.3 Example3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.4 Example4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.5 Sometimes,thecompilerneedssomehintstoremovesomeboundchecks . . . . . 110
10.6 WritecodeinBCE-friendlyways. . . . . . . . . . . . . . . . . . . . . . . . . . 113
10.7 The current official standard Go compiler fails to eliminate some unnecessary
boundchecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
11 Maps 118
11.1 Clearmapentries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
11.2 aMap[key]++ismoreefficientthanaMap[key] = aMap[key] + 1 . . . . . . . 119
11.3 Pointersinmaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
11.4 Usingbytearraysinsteadofshortstringsaskeys . . . . . . . . . . . . . . . . . . 119
11.5 Lowermapelementmodificationfrequency . . . . . . . . . . . . . . . . . . . . 120
11.6 Trytogrowamapinonestep . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
11.7 Useindextablesinsteadofmapswhichkeytypeshaveonlyasmallsetofpossible
values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
12 Channels 125
12.1 Programmingwithchannelsisfunbutchannelsarenotthemostperformantway
forsomeusecases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
12.2 Useonechannelinsteadofseveralonestoavoidusingselectblocks . . . . . . . 126
12.3 Try-sendandtry-receiveselectcodeblocksarespeciallyoptimized . . . . . . . 128
13 Functions 130
13.1 Functioninlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3
13.1.1 Whichfunctionsareinline-able? . . . . . . . . . . . . . . . . . . . . . . 131
13.1.2 Acalltoafunctionvalueisnotinline-ableifthevalueishardtobedeter-
minedatcompiletime . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
13.1.3 Thego:noinlinecommentdirective . . . . . . . . . . . . . . . . . . . 134
13.1.4 Writecodeinthewayswhicharelessinlinecostly . . . . . . . . . . . . . 134
13.1.5 Makehotpathsinline-able . . . . . . . . . . . . . . . . . . . . . . . . . 138
13.1.6 Manualinliningisoftenmoreperformancethanauto-inlining . . . . . . . 139
13.1.7 Inliningmightdonegativeimpactonperformance . . . . . . . . . . . . . 140
13.2 Pointerparameters/resultsvs. non-pointerparameters/results . . . . . . . . . . . . 141
13.3 Namedresultsvs. anonymousresults . . . . . . . . . . . . . . . . . . . . . . . . 143
13.4 Trytostoreintermediatecalculationresultsinlocalvariables . . . . . . . . . . . 145
13.5 Avoiddeferredcallsinloops . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
13.6 Avoidusingdeferredcallsifextremehighperformanceisdemanded . . . . . . . . 147
13.7 Theargumentsofafunctioncallwillbealwaysevaluatedwhenthecallisinvoked 147
13.8 Trytomakelessvaluesescapetoheapinthehotpaths . . . . . . . . . . . . . . . 148
14 Interfaces 150
14.1 Boxvaluesintoandunboxvaluesfrominterfaces . . . . . . . . . . . . . . . . . 150
14.2 Trytovoidmemoryallocationsbyassigninginterfacetointerface . . . . . . . . . 157
14.3 Callinginterfacemethodsneedsalittleextracost . . . . . . . . . . . . . . . . . 158
14.4 Avoid using interface parameters and results in small functions which are called
frequently . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4
5
0.1 Acknowledgments
Firstly,thankstothewholeGocommunity. Anactiveandresponsivecommunityguaranteesthis
bookisfinishedintime.
Specially, I want to give thanks to the following people who helped me understand some imple-
mentationdetailsintheofficialstandardcompilerandruntime: KeithRandall,IanLanceTaylor,
AxelWagner,CuongManhLe,MichaelPratt,JanMercl,MatthewDempsky,MartinMöhrmann,
etc. I’msorryifIforgotmentioningsomebodyinabovelists. Therearesomanykindandcreative
gophersintheGocommunitythatImusthavemissedoutonsomeone.
IalsowouldliketothankallgopherswhoevermadeinfluencesontheGo101book,beitdirectly
orindirectly,intentionallyorunintentionally.
ThankstoOlexandrShalakhinforthepermissiontouseoneofthewonderfulgophericondesigns
asthecoverimage. AndthankstoReneeFrenchfordesigningthelovelygophercartooncharacter.
Thankstotheauthorsofthefollowingopensourcesoftwareandlibrariesusedinbuildingthisbook:
• golang,https://go.dev/
• gomarkdown,https://github.com/gomarkdown/markdown
• goini,https://github.com/zieckey/goini
• go-epub,https://github.com/bmaupin/go-epub
• pandoc,https://pandoc.org
• calibre,https://calibre-ebook.com/
• GIMP,https://www.gimp.org
Thanks the gophers who ever reported mistakes in this book or made corrections for this book:
yingzewen,ivanburak,cortes-,skeeto@reddit,etc.
6
Chapter 1
About Go Optimizations 101
Thisbookprovidessomecodeperformanceoptimizationtricks,tips,andsuggestions. Mostofthe
contentsinthisbookaremadebasedontheofficialstandardGocompilerandruntimeimplemen-
tation.
Lifeisfulloftrade-offs,theprogrammingworldis,too. Inprogramming,weoftenneedtomake
trade-offs between code readability, maintainability, development efficiency, and program effi-
ciency, etc. Even for one of the aspects, there are also trade-offs needing to be made. Taking
program efficiency for an example, we might need to make trade-offs between memory saving,
codeexecutionspeed,andimplementationdifficulty,etc.
Inpractice, mostpartsofthecodebaseofaprojectdon’tneedtobeimplementedwithhighper-
formances. Keepingthemmaintainableandreadableisoftenmoreimportant(thanmakingthem
memorysavingandrunveryfast). Thesuggestionsmadeinthisbookarejustforthecodeparts
whichimplementationsreallyneedtobehighperformant. Someofthesuggestionsoftenleadto
more verbose code. And please note that some of the suggested implementations might be only
performantatsomecertainscenarios.
Thecontentsinthisbookinclude:
• howtoconsumelessCPUresources.
• howtoconsumelessmemory.
• howtomakelessmemoryallocations.
• howtocontrolmemoryallocationplaces.
• howtoreducegarbagecollectionpressure.
Thisbookneitherexplainshowtouseperformanceanalysistools,suchaspprof,nortriestostudy
deeply on compiler and runtime implementation details. None of the contents provided in this
bookmakeuseofunsafepointersandcgo. Andthebookdoesn’ttalkaboutalgorithms. Inother
words,thisbooktriestoprovidesomeoptimizationsuggestionsinawaywhichisclearandeasy
tounderstand,fordailygeneralGoprogramming.
Without particularly indicated, the code examples provided in this book are tested and run on a
notebookwiththefollowingenvironmentsetup:
go version go1.19 linux/amd64
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
7
Somebenchmarktimesinformationisremovedfrombenchmarkresultstokeepbenchmarklines
short.
Pleasenotethat:
• someofthetalkedsuggestionsinthisbookworkonanyplatformandforanyCPUmodels,
butsomeothersmightonlyworkonspecifiedplatformsandforspecifiedCPUmodels. So
pleasebenchmarkthemonthesameenvironmentsasyourproductionenvironmentsbefore
adoptinganyofthem.
• someimplementationdetailsoftheofficialstandardGocompilerandruntimemightchange
from version to version, which means some of the talked suggestions might not work for
futureGotoolchainversions.
• the book will be open sourced eventually since a later undetermined time point (about
2023/Jun),inachapterbychapterway.
1.1 About the author
Tapiristheauthorofthisbook. HealsowrotetheGo101book. Heisplanningtowritesomeother
Go101seriesbooks. Pleaselookforwardto.
Tapir was ever (maybe will be again) an indie game developer. You can find his games here:
tapirgames.com.
1.2 Feedback
WelcometoimprovethisbookbysubmittingcorrectionstoGo101issuelist(https://github.com/g
o101/go101)forallkindsofmistakes,suchastypos,grammarerrors,wordinginaccuracies,wrong
explanations,descriptionflaws,codebugs,etc.
ItisalsowelcometosendyourfeedbacktotheGo101twitteraccount: @go100and1(https://twit
ter.com/go100and1).
8
Chapter 2
Value Parts and Value Sizes
2.1 Values and value parts
InGo,avalueofsomekindsoftypesalwayscontainsonlyonepart(inmemory),whereasavalue
ofotherkindsoftypesmightcontainmorethanonepart. Ifavaluecontainsmorethanonepart,
thenoneofthepartsiscalledthedirectpartandtheothersarecalledindirectparts. Thedirectpart
referencestheindirectparts.
Ifavaluealwayscontainsonlyonepart,thenthepartmaybealsocalledasthedirectpartofthe
value,andwesaythevaluehasnoindirectparts.
In the official standard Go compiler implementation, each value of the following kinds of types
alwayscontainsonlyonepart:
• booleantypes
• numerictypes(int8,uint8,int16,uint16,int32,uint32,int64,uint64,int,uint,uintptr,float32,
float64,complex64,complex128)
• pointertypes
• unsafepointertypes
• structtypes
• arraytypes
Andavalueofthefollowingkindsoftypesalwaysmaycontainoneormoreindirectparts:
• slicetypes
• maptypes
• channeltypes
• functiontypes
• interfacetypes
• stringtypes
When assigning/copying a value, only the direct part of the value is copied. After copying, the
directpartsofthedestinationandsourcevaluesbotharereferencingtheindirectpartsofthesource
value(iftheindirectpartsexist).
Atruntime,eachvaluepartiscarriedononememoryblock(memoryblockswillbeexplainedina
followingchapter). So,ifavaluecontainstwoparts,thevalueisverypossiblydistributedontwo
memoryblocks.
(Note: The terminology value part was invented by the Go 101 series books. It is not widely
9