Table Of ContentPurely Functional Data Structures
Chris Okasaki
September 1996
CMU-CS-96-177
SchoolofComputerScience
CarnegieMellon University
Pittsburgh,PA15213
Submittedinpartialfulfillmentoftherequirements
for thedegree ofDoctor ofPhilosophy.
Thesis Committee:
PeterLee, Chair
RobertHarper
DanielSleator
RobertTarjan, PrincetonUniversity
Copyright c 1996ChrisOkasaki
(cid:13)
Thisresearch was sponsoredby the Advanced Research Projects Agency (ARPA) under ContractNo. F19628-
95-C-0050.
The views and conclusionscontainedin thisdocumentare those ofthe authorand shouldnotbe interpretedas
representingtheofficialpolicies,eitherexpressedorimplied,ofARPAortheU.S.Government.
Keywords: functionalprogramming,data structures,lazy evaluation,amortization
For Maria
Abstract
When a C programmer needs an efficient data structure for a particular prob-
lem, he or she can often simply look one up in any of a number of good text-
books or handbooks. Unfortunately, programmers in functional languages such
as Standard ML or Haskell do not have this luxury. Although some data struc-
tures designedforimperativelanguagessuch as C canbe quiteeasily adaptedtoa
functionalsetting,mostcannot,usuallybecausetheydependincrucialwaysonas-
signments, which are disallowed, orat least discouraged, in functionallanguages.
Toaddressthisimbalance,wedescribeseveraltechniquesfordesigningfunctional
data structures, and numerous original data structures based on these techniques,
including multiple variations of lists, queues, double-ended queues, and heaps,
many supporting more exotic features such as random access or efficient catena-
tion.
In addition, we expose the fundamental role of lazy evaluation in amortized
functional data structures. Traditional methods of amortization break down when
old versions of a data structure, not just the most recent, are available for further
processing. This property is known as persistence, and is taken for granted in
functional languages. On the surface, persistence and amortization appear to be
incompatible,butweshowhowlazyevaluationcanbeusedtoresolvethisconflict,
yielding amortized data structures that are efficient even when used persistently.
Turning this relationship between lazy evaluation and amortization around, the
notionofamortizationalsoprovidesthefirstpracticaltechniquesforanalyzingthe
timerequirementsofnon-triviallazy programs.
Finally,ourdatastructuresoffernumeroushintstoprogramminglanguagede-
signers, illustrating the utility of combining strict and lazy evaluation in a single
language, and providing non-trivial examples using polymorphic recursion and
higher-order,recursivemodules.
Acknowledgments
Without the faith and support of my advisor, Peter Lee, I probably wouldn’t
even be a graduate student, much less a graduate student on the eve of finishing.
Thanks forgivingmethefreedomtoturnmyhobbyintoathesis.
I am continually amazed by the global nature of modern research and how e-
mailallowsmetointeractaseasilywithcolleaguesinAarhus,DenmarkandYork,
England as with fellowstudents down the hall. In the case of one such colleague,
GerthBrodal,wehave co-authoredapaperwithouteverhavingmet. Infact,sorry
Gerth, butIdon’tevenknowhowtopronounceyourname!
IwasextremelyfortunatetohavehadexcellentEnglishteachersinhighschool.
LoriHueninkdeservesspecialrecognition;herwritingandpublicspeakingclasses
are undoubtedlythe mostvaluableclasses I haveever taken,inany subject. Inthe
same vein, I was lucky enough to read my wife’s copy of Lyn Dupre´’s BUGS in
WritingjustasIwasstartingmythesis. Ifyourcareerinvolveswritinginanyform,
you oweittoyourselftobuya copyofthisbook.
Thanks to Maria and Colin for always reminding me that there is more to life
than grad school. And to Amy and Mark, for uncountable dinners and other out-
ings. We’llmiss you. Specialthanks toAmyforreadinga draftofthis thesis.
And to my parents: who would have thought on that first day of kindergarten
that I’dstillbein school24years later?
Contents
Abstract v
Acknowledgments vii
1 Introduction 1
1.1 Functionalvs. ImperativeDataStructures . . . . . . . . . . . . . . . . . . . . 1
1.2 Strictvs. Lazy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 SourceLanguage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Lazy Evaluationand$-Notation 7
2.1 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 HistoricalNotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 AmortizationandPersistence viaLazyEvaluation 13
3.1 TraditionalAmortization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Example: Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Persistence: The ProblemofMultipleFutures . . . . . . . . . . . . . . . . . . 19
3.2.1 Execution Traces and LogicalTime . . . . . . . . . . . . . . . . . . . 19
3.3 ReconcilingAmortizationandPersistence . . . . . . . . . . . . . . . . . . . . 20
3.3.1 The Role ofLazy Evaluation . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2 A FrameworkforAnalyzingLazy Data Structures . . . . . . . . . . . 21
x CONTENTS
3.4 The Banker’sMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 JustifyingtheBanker’s Method . . . . . . . . . . . . . . . . . . . . . 23
3.4.2 Example: Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 The Physicist’sMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5.1 Example: Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.2 Example: Bottom-UpMergesortwithSharing . . . . . . . . . . . . . . 32
3.6 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 EliminatingAmortization 39
4.1 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Real-Time Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Bottom-UpMergesortwithSharing . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Lazy Rebuilding 49
5.1 BatchedRebuilding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 GlobalRebuilding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3 Lazy Rebuilding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4 Double-EndedQueues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4.1 Output-restrictedDeques . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4.2 Banker’sDeques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.4.3 Real-Time Deques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6 Numerical Representations 61
6.1 PositionalNumberSystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 BinaryRepresentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.1 BinaryRandom-Access Lists . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.2 BinomialHeaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 SegmentedBinaryNumbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.3.1 SegmentedBinomialRandom-Access Lists and Heaps . . . . . . . . . 75