ebook img

Introduction to Compiler Design PDF

219 Pages·2011·1.469 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Compiler Design

Undergraduate Topics in Computer Science UndergraduateTopicsinComputerScience(UTiCS)delivershigh-qualityinstructionalcontentforun- dergraduatesstudyinginallareasofcomputingandinformationscience.Fromcorefoundationaland theoreticalmaterialtofinal-yeartopicsandapplications,UTiCSbookstakeafresh,concise,andmod- ernapproachandareidealforself-studyorforaone-ortwo-semestercourse.Thetextsareallauthored byestablishedexpertsintheirfields,reviewedbyaninternationaladvisoryboard,andcontainnumer- ousexamplesandproblems.Manyincludefullyworkedsolutions. Forfurthervolumes: www.springer.com/series/7592 Torben Ægidius Mogensen Introduction to Compiler Design TorbenÆgidiusMogensen DepartmentofComputerScience UniversityofCopenhagen Copenhagen,Denmark [email protected] url:http://www.diku.dk/~torbenm Serieseditor IanMackie Advisoryboard SamsonAbramsky, UniversityofOxford,Oxford,UK ChrisHankin, ImperialCollegeLondon,London,UK DexterKozen, CornellUniversity,Ithaca,USA AndrewPitts, UniversityofCambridge,Cambridge,UK HanneRiisNielson, TechnicalUniversityofDenmark,Lungby,Denmark StevenSkiena, StonyBrookUniversity,StonyBrooks,USA IainStewart, UniversityofDurham,Durham,UK ISSN1863-7310 ISBN978-0-85729-828-7 e-ISBN978-0-85729-829-4 DOI10.1007/978-0-85729-829-4 SpringerLondonDordrechtHeidelbergNewYork BritishLibraryCataloguinginPublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary LibraryofCongressControlNumber:2011933601 ©Springer-VerlagLondonLimited2011 Apartfromanyfairdealingforthepurposesofresearchorprivatestudy,orcriticismorreview,asper- mittedundertheCopyright,DesignsandPatentsAct1988,thispublicationmayonlybereproduced, storedortransmitted,inanyformorbyanymeans,withthepriorpermissioninwritingofthepublish- ers,orinthecaseofreprographicreproductioninaccordancewiththetermsoflicensesissuedbythe CopyrightLicensingAgency.Enquiriesconcerningreproductionoutsidethosetermsshouldbesentto thepublishers. Theuseofregisterednames,trademarks,etc.,inthispublicationdoesnotimply,evenintheabsenceofa specificstatement,thatsuchnamesareexemptfromtherelevantlawsandregulationsandthereforefree forgeneraluse. Thepublishermakesnorepresentation,expressorimplied,withregardtotheaccuracyoftheinformation containedinthisbookandcannotacceptanylegalresponsibilityorliabilityforanyerrorsoromissions thatmaybemade. Coverdesign:VTeXUAB,Lithuania Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface “Languageisaprocessoffreecreation;itslawsandprinciples arefixed,butthemannerinwhichtheprinciplesofgeneration areusedisfreeandinfinitelyvaried.Eventheinterpretationand useofwordsinvolvesaprocessoffreecreation.” NoamChomsky(1928–) In order to reduce the complexity of designing and building computers, nearly all ofthesearemadetoexecuterelativelysimplecommands(butdosoveryquickly). Aprogramforacomputermustbebuiltbycombiningtheseverysimplecommands intoaprograminwhatiscalledmachinelanguage.Sincethisisatediousanderror- proneprocessmostprogrammingis,instead,doneusingahigh-levelprogramming language.Thislanguagecanbeverydifferentfromthemachinelanguagethatthe computercanexecute,sosomemeansofbridgingthegapisrequired.Thisiswhere thecompilercomesin. A compiler translates (or compiles) a program written in a high-level program- minglanguagethatissuitableforhumanprogrammersintothelow-levelmachine languagethatisrequiredbycomputers.Duringthisprocess,thecompilerwillalso attempttospotandreportobviousprogrammermistakes. Using a high-level language for programming has a large impact on how fast programscanbedeveloped.Themainreasonsforthisare: • Comparedtomachinelanguage,thenotationusedbyprogramminglanguagesis closertothewayhumansthinkaboutproblems. • Thecompilercanspotsomeobviousprogrammingmistakes. • Programswritteninahigh-levellanguagetendtobeshorterthanequivalentpro- gramswritteninmachinelanguage. Anotheradvantageofusingahigh-levellanguageisthatthesameprogramcanbe compiled to many different machine languages and, hence, be brought to run on manydifferentmachines. Ontheotherhand,programsthatarewritteninahigh-levellanguageandauto- maticallytranslatedtomachinelanguagemayrunsomewhatslowerthanprograms that are hand-coded in machine language. Hence, some time-critical programs are stillwrittenpartlyinmachinelanguage.Agoodcompilerwill,however,beableto v vi Preface get very close to the speed of hand-written machine code when translating well- structuredprograms. ThePhasesofaCompiler Sincewritingacompilerisanontrivialtask,itisagoodideatostructurethework. Atypicalwayofdoingthisistosplitthecompilationintoseveralphaseswithwell- definedinterfaces.Conceptually,thesephasesoperateinsequence(thoughinprac- tice,theyareofteninterleaved),eachphase(exceptthefirst)takingtheoutputfrom the previous phase as its input. It is common to let each phase be handled by a separatemodule.Someofthesemodulesarewrittenbyhand,whileothersmaybe generated from specifications. Often, some of the modules can be shared between severalcompilers. Acommondivisionintophasesisdescribedbelow.Insomecompilers,theorder- ingofphasesmaydifferslightly,somephasesmaybecombinedorsplitintoseveral phasesorsomeextraphasesmaybeinsertedbetweenthosementionedbelow. Lexicalanalysis Thisistheinitialpartofreadingandanalysingtheprogramtext: Thetextisreadanddividedintotokens,eachofwhichcorrespondstoasymbolin theprogramminglanguage,e.g.,avariablename,keywordornumber. Syntaxanalysis This phase takes the list of tokens produced by the lexical anal- ysisandarrangestheseinatree-structure(calledthesyntaxtree)thatreflectsthe structureoftheprogram.Thisphaseisoftencalledparsing. Typechecking This phase analyses the syntax tree to determine if the program violatescertainconsistencyrequirements,e.g.,ifavariableisusedbutnotdeclared orifitisusedinacontextthatdoesnotmakesensegiventhetypeofthevariable, suchastryingtouseabooleanvalueasafunctionpointer. Intermediatecodegeneration The program is translated to a simple machine- independentintermediatelanguage. Registerallocation Thesymbolicvariablenamesusedintheintermediatecodeare translatedtonumbers,eachofwhichcorrespondstoaregisterinthetargetmachine code. Machinecodegeneration The intermediate language is translated to assembly language(atextualrepresentationofmachinecode)foraspecificmachinearchi- tecture. Assemblyandlinking The assembly-language code is translated into binary rep- resentationandaddressesofvariables,functions,etc.,aredetermined. Thefirstthreephasesarecollectivelycalledthefrontendofthecompilerandthelast threephasesarecollectivelycalledthebackend.Themiddlepartofthecompileris inthiscontextonlytheintermediatecodegeneration,butthisoftenincludesvarious optimisationsandtransformationsontheintermediatecode. Eachphase,throughcheckingandtransformation,establishesstrongerinvariants onthethingsitpassesontothenext,sothatwritingeachsubsequentphaseiseasier than if these have to take all the preceding into account. For example, the type Preface vii checker can assume absence of syntax errors and the code generation can assume absenceoftypeerrors. Assembly and linking are typically done by programs supplied by the machine oroperatingsystemvendor,andarehencenotpartofthecompileritself,sowewill notfurtherdiscussthesephasesinthisbook. Interpreters Aninterpreterisanotherwayofimplementingaprogramminglanguage.Interpre- tationsharesmanyaspectswithcompiling.Lexing,parsingandtype-checkingare inaninterpreterdonejustasinacompiler.Butinsteadofgeneratingcodefromthe syntaxtree,thesyntaxtreeisprocesseddirectlytoevaluateexpressionsandexecute statements, and so on. An interpreter may need to process the same piece of the syntaxtree(forexample,thebodyofaloop)manytimesand,hence,interpretation is typically slower than executing a compiled program. But writing an interpreter is often simpler than writing a compiler and the interpreter is easier to move to a differentmachine,soforapplicationswherespeedisnotofessence,interpretersare oftenused. Compilationandinterpretationmaybecombinedtoimplementaprogramming language: The compiler may produce intermediate-level code which is then inter- pretedratherthancompiledtomachinecode.Insomesystems,theremayevenbe partsofaprogramthatarecompiledtomachinecode,somepartsthatarecompiled tointermediatecode,whichisinterpretedatruntimewhileotherpartsmaybekept asasyntaxtreeandinterpreteddirectly.Eachchoiceisacompromisebetweenspeed andspace:Compiledcodetendstobebiggerthanintermediatecode,whichtendto bebiggerthansyntax,buteachstepoftranslationimprovesrunningspeed. Usinganinterpreterisalsousefulduringprogramdevelopment,whereitismore importanttobeabletotestaprogrammodificationquicklyratherthanrunthepro- gramefficiently.Andsinceinterpretersdolessworkontheprogrambeforeexecu- tion starts, they are able to start running the program more quickly. Furthermore, sinceaninterpreterworksonarepresentationthatisclosertothesourcecodethan iscompiledcode,errormessagescanbemorepreciseandinformative. WewilldiscussinterpretersbrieflyinChap.4,buttheyarenotthemainfocusof thisbook. WhyLearn AboutCompilers? Fewpeoplewilleverberequiredtowriteacompilerforageneral-purposelanguage likeC,JavaorSML.Sowhydomostcomputerscienceinstitutionsoffercompiler coursesandoftenmakethesemandatory? Sometypicalreasonsare: viii Preface (a) Itisconsideredatopicthatyoushouldknowinordertobe“well-cultured”in computerscience. (b) Agoodcraftsmanshouldknowhistools,andcompilersareimportanttoolsfor programmersandcomputerscientists. (c) The techniques used for constructing a compiler are useful for other purposes aswell. (d) There is a good chance that a programmer or computer scientist will need to writeacompilerorinterpreterforadomain-specificlanguage. The first of these reasons is somewhat dubious, though something can be said for “knowingyourroots”,eveninsuchahastilychangingfieldascomputerscience. Reason “b” is more convincing: Understanding how a compiler is built will al- lowprogrammerstogetanintuitionaboutwhattheirhigh-levelprogramswilllook like when compiled and use this intuition to tune programs for better efficiency. Furthermore,theerrorreportsthatcompilersprovideareofteneasiertounderstand whenoneknowsaboutandunderstandsthedifferentphasesofcompilation,suchas knowingthedifferencebetweenlexicalerrors,syntaxerrors,typeerrorsandsoon. Thethirdreasonisalsoquitevalid.Inparticular,thetechniquesusedforreading (lexingandparsing)thetextofaprogramandconvertingthisintoaform(abstract syntax)thatiseasilymanipulatedbyacomputer,canbeusedtoreadandmanipulate anykindofstructuredtextsuchasXMLdocuments,addresslists,etc. Reason“d”isbecomingmoreandmoreimportantasdomainspecificlanguages (DSLs) are gaining in popularity. A DSL is a (typically small) language designed for a narrow class of problems. Examples are data-base query languages, text- formattinglanguages,scenedescriptionlanguagesforray-tracersandlanguagesfor settingupeconomicsimulations.ThetargetlanguageforacompilerforaDSLmay betraditionalmachinecode,butitcanalsobeanotherhigh-levellanguageforwhich compilers already exist, a sequence of control signals for a machine, or formatted text and graphics in some printer-control language (e.g. PostScript). Even so, all DSLcompilerswillsharesimilarfront-endsforreadingandanalysingtheprogram text. Hence,themethodsneededtomakeacompilerfront-endaremorewidelyappli- cablethanthemethodsneededtomakeacompilerback-end,butthelatterismore importantforunderstandinghowaprogramisexecutedonamachine. TheStructure ofThisBook Thefirstchaptersofthebookdescribesthemethodsandtoolsrequiredtoreadpro- gramtextandconvertitintoaformsuitableforcomputermanipulation.Thisprocess ismadeintwostages:Alexicalanalysisstagethatbasicallydividestheinputtext intoalistof“words”.Thisisfollowedbyasyntaxanalysis(orparsing)stagethat analysesthewaythesewordsformstructuresandconvertsthetextintoadatastruc- ture that reflects the textual structure. Lexical analysis is covered in Chap. 1 and syntacticalanalysisinChap.2. Preface ix The remainder of the book (Chaps. 3–9) covers the middle part and back-end ofinterpretersandcompilers.Chapter3covershowdefinitionsandusesofnames (identifiers) are connected through symbol tables. Chapter 4 shows how you can implementasimpleprogramminglanguagebywritinganinterpreterandnotesthat this gives a considerable overhead that can be reduced by doing more things be- foreexecutingtheprogram,whichleadstothefollowingchaptersaboutstatictype checking (Chap. 5) and compilation (Chaps. 6–9. In Chap. 6, it is shown how ex- pressionsandstatementscanbecompiledintoanintermediatelanguage,alanguage thatisclosetomachinelanguagebuthidesmachine-specificdetails.InChap.7,it is discussed how the intermediate language can be converted into “real” machine code.Doingthiswellrequiresthattheregistersintheprocessorareusedtostorethe valuesofvariables,whichisachievedbyaregisterallocationprocess,asdescribed inChap.8.Uptothispoint,a“program”hasbeenwhatcorrespondstothebodyofa singleprocedure.Procedurecallsaddsomeissues,whicharediscussedinChap.9. Thebookusesstandardsetnotationandequationsoversets.Appendixcontains ashort summaryof these,whichmaybehelpfultothosethatneedtheseconcepts refreshed. TotheLecturer This book was written for use in the introductory compiler course at DIKU, the departmentofcomputerscienceattheUniversityofCopenhagen,Denmark. At times, standard techniques from compiler construction have been simplified forpresentationinthisbook.Insuchcasesreferencesaremadetobooksorarticles wherethefullversionofthetechniquescanbefound. Thebookaimsatbeing“languageneutral”.Thismeanstwothings: • Little detail is given about how the methods in the book can be implemented in any specific language. Rather, the description of the methods is given in the form of algorithm sketches and textual suggestions of how these can be imple- mentedinvarioustypesoflanguages,inparticularimperativeandfunctionallan- guages. • Thereisnosinglethrough-goingexampleofalanguagetobecompiled.Instead, different small (sub-)languages are used in various places to cover exactly the points that the text needs. This is done to avoid drowning in detail, hopefully allowingthereadersto“seethewoodforthetrees”. Each chapter has a section on further reading, which suggests additional read- ing material for interested students. Each chapter has a set of exercises. Few of theserequireaccesstoacomputer,butcanbesolvedonpaperorblack-board.Af- ter some of the sections in the book, a few easy exercises are listed as suggested exercises.It is recommendedthatthestudentattemptsto solvetheseexercisesbe- forecontinuingreading,astheexercisessupportunderstandingoftheprevioussec- tions. x Preface Teachingwiththisbookcanbesupplementedwithprojectwork,wherestudents write simple compilers. Since the book is language neutral, no specific project is given. Instead, the teacher must choose relevant tools and select a project that fits thelevelofthestudentsandthetimeavailable.Dependingontheamountofproject workandsupplementarymaterial,thebookcansupportcoursesizesrangingfrom 5to7.5ECTSpoints.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.