THE COMPUTER SCIENCE OF T X AND LAT X; E E COMPUTER SCIENCE COURSE 594, FALL 2004 VICTOREIJKHOUT DEPARTMENTOFCOMPUTERSCIENCE, UNIVERSITYOFTENNESSEE,KNOXVILLETN37996 DRAFTLECTURENOTES About this book. ThesearethelecturenotesofacourseItaughtinthefallof2004.Thiswasthefirsttime I taught the course, and the first time this course was taught, period. These lecture notes, therefore,areprobablyfullofinaccuracies,mildfibs,andgrosserrors.Ok,makethat‘are definitely full of &c’, because I know of several errors that time has prevented me from addressing. However, I would be interested in hearing any comments and suggestions you have, or errorsyouwouldliketopointout. VictorEijkhout [email protected] Knoxville,TN,december2004. Enjoy! Lex 56 2.7 Introduction 56 2.8 Structureofalex file 56 2.9 Definitionssection 57 2.10 Rulessection 57 Contents 2.11 Regularexpressions 60 2.12 Remarks 60 2.13 Examples 61 Yacc 65 Aboutthisbook 1 2.14 Introduction 65 1 TEXandLATEX 5 2.15 Structureofayacc LATEX 6 file 65 1.1 Documentmarkup 6 2.16 Motivatingexample 65 1.2 Theabsolutebasicsof 2.17 Definitionssection 67 LATEX 8 2.18 Lex Yacc 1.3 TheTEXconceptual interaction 67 modelof 2.19 Rulessection 69 typesetting 10 2.20 Operators;precedence 1.4 Textelements 11 andassociativity 69 1.5 Tablesandfigures 19 2.21 Furtherremarks 70 1.6 Math 19 2.22 Examples 73 1.7 References 21 Hashing 79 1.8 SomeTEXnical 2.23 Introduction 79 issues 23 2.24 Hashfunctions 79 1.9 CustomizingLATEX 23 2.25 Collisions 82 1.10 ExtensionstoLATEX 27 2.26 Otherapplicationsof TEXprogramming 30 hashing 86 TEXvisuals 31 2.27 Discussion 87 Projectsforthis Projectsforthis chapter 32 chapter 88 2 Parsing 33 3 Breakingthingsinto Parsingtheory 34 pieces 89 2.1 Levelsofparsing 34 Dynamic 2.2 Veryshort Programming 90 introduction 34 3.1 Someexamples 90 Lexicalanalysis 37 3.2 Discussion 96 2.3 Finitestateautomataand TEXparagraph regularlanguages 37 breaking 98 2.4 Lexicalanalysiswith 3.3 Theelementsofa FSAs 41 paragraph 98 Syntaxparsing 43 3.4 TEX’slinebreaking 2.5 Context-free algorithm 102 languages 43 NPcompleteness 110 2.6 Parsingcontext-free 3.5 Introduction 110 languages 45 3.6 Basics 111 2 CONTENTS 3 3.7 Complexity Inputfileencoding 172 classes 112 6.1 Historyand 3.8 NP-completeness 114 context 172 Pagebreaking 118 6.2 Unicode 175 3.9 Introduction 118 6.3 Moreaboutcharacter 3.10 TEX’spagebreaking setsandencodings 178 algorithm 118 6.4 CharacterissuesinTEX/ 3.11 Theoryofpage LATEX 180 breaking 120 Fontencoding 183 Projectsforthis 6.5 Basicterminology 183 chapter 128 6.6 Æsthetics 185 4 Fonts 129 6.7 Fonttechnologies 186 Beziercurves 130 6.8 FonthandlinginTEX 4.1 Introductiontocurve andLATEX 189 approximation 130 Inputandoutput 4.2 Parametriccurves 135 encodinginLATEX 191 4.3 Practicaluse 143 6.9 Thefontenc Curveplottingwith package 191 gnuplot 145 Projectsforthis 4.4 Introduction 145 chapter 192 4.5 Plotting 146 7 Software Rastergraphics 147 engineering 193 4.6 Vectorgraphicsand Literate rastergraphics 147 programming 194 4.7 Basicraster 7.1 TheWebsystem 194 graphics 147 7.2 Knuth’sphilosophyof 4.8 Rasterizingtype 150 program 4.9 Anti-aliasing 154 development 194 Projectsforthis Software chapter 157 engineering 195 5 TEX’smacrolanguage 7.3 Extremelybriefhistory –unfinished ofTEX 195 chapter 159 7.4 TEX’s Lambdacalculusin development 195 TEX 160 Markup 198 5.1 LogicwithTEX 160 7.5 History 198 6 Character Projectsforthis encoding 171 chapter 200 VictorEijkhout 4 CONTENTS TEX–LATEX–CS594 Chapter 1 TEX and LATEX Inthischapterwewilllearn • TheuseofLATEXfordocumentpreparation, • LATEXstylefileprogramming, • TEXprogramming. Handoutsandfurtherreadingforthischapter ForLATEXusethe‘NotsoshortintroductiontoLATEX’byOetikeretal.Forfurtherreading andfuturereference,itishighlyrecommendedthatyouget‘GuidetoLATEX’byKopkaand Daly[15].TheoriginalreferenceisthebookbyLamport[16].Whileitisafinebook,ithas notkeptupwithdevelopmentsaroundLATEX,suchascontributedgraphicsandotherpack- ages.AbookthatdoesdiscussextensionstoLATEXingreatdetailisthe‘LATEXCompanion’ byMittelbachetal.[17]. For the TEX system itself, consult ‘TEX by Topic’. The original reference is the book by Knuth[12],andtheultimatereferenceisthepublishedsource[11]. 5 6 CHAPTER1. TEXANDLATEX LATEX. 1.1 Documentmarkup If you are used to ‘wysiwyg’ (what you see is what you get) text processors, LATEX may seemlikeastrangebeast,primitive,andprobablyout-dated.Whileitistruethatthereisa longhistorybehindTEXandLATEX,andtheideasareindeedbasedonmuchmoreprimitive technologythanwhatwehavethesedays,theseideashaveregainedsurprisingvalidityin recenttimes. 1.1.1 Alittlebitofhistory Document markup dates back to the earliest days of computer typesetting. In those days, terminalswerestrictlycharacter-based:theycouldonlyrendermono-spacedbuilt-infonts. Graphicsterminalswereveryexpensive.(Someterminalscouldswitchtoagraphicalchar- acter set, to get at least a semblance of graphics.) As a result, compositors had to key in textonaterminal–orusingpunchedcardsinevenearlierdays–andonlysawtheresult whenitwouldcomeoutoftheprinter. Any control of the layout, therefore, also had to be through character sequences. To set textinboldface,youmayhavehadtosurrounditwith<B> .. the text .. </B>. Doesn’tthatlooklikesomethingyoustillencountereveryday? Such‘controlsequences’hadaseconduse:theycouldserveatemplatefunction,expanding to often used bits of text. For instance, you could imagine $ADAM$ expanding to ‘From ourcorrespondentinAmsterdam:’. LATEXworksexactlythesame.Therearecommandcontrolsequences;forinstance,youget boldtypebyspecifying\bf,etcetera.Therearealsocontrolsequencesthatexpandtobits oftext:youhavetotype\LaTeXtogetthecharacters‘LATEX’plusthecontrolcodesfor allthatshiftingupanddownandchangesinfontsize. \TeX => T\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX \LaTeX => L\kern -.36em {\sbox \z@ T\vbox to\ht \z@ {\hbox {\check@mathfonts \fontsize \sf@size \z@ \math@fontsfalse \selectfont A} \vss }}\kern -.15em\TeX 1.1.2 Macropackages Theoldtypesettingsystemswerelimitedintheircontrolsequences:theyhadafixedreper- toireofcommandsrelatedtotypesetting,andthereusuallywassomemechanismtodefin- ing ‘macros’ with replacement text. Formally, a macro is a piece of the input that gets replacedbyitsdefinitiontext,whichcanbeacombinationofliteraltextandmoremacros ortypesettingcommands. TEX–LATEX–CS594 1.1. DOCUMENTMARKUP 7 An important feature of many composition programs is the ability to designatebysuitableinputinstructionstheuseofspecifiedformats.Pre- viouslystoredsequencesofcommandsortextreplacetheinstructions, and the expanded input is then processed. In more sophisticated sys- tems,formatsmaysummonotherformats,includingthemselves[”Sys- tem/360 Text Processor Pagination/360, Application Description Man- ual,”FormNo.GE20-0328,IBMCorp.,WhitePlains,NewYork.]. That was the situation with commercial systems by manufacturers of typesetting equip- mentsuchasLinotype.Systemsdevelopedby(andfor!)computerscientists,suchScribeor nroff/troff,weremuchmorecustomizable.Infact,theysometimeswouldhavetheequiv- alent of a complete programming language on board. This makes it possible to take the basiclanguage,anddesignanewlanguageofcommandsontopofit.Sucharepertoireof commandsiscalledamacropackage. Inourcase,TEXisthebasicpackagewiththestrangemacroprogramminglanguage,and LATEX is the macro package1. LATEX was designed for typesetting scientific articles and books: it offers a number of styles, each with slightly different commands (for instance, there are no chapters in the article style) and slightly different layout (books need a title page, articles merely a title on the first page of the text). Styles can also easily be cus- tomized. For different purposes (art books with fancy designs) it is often better to write newmacrosinTEX,ratherthantobendtheexistingLATEXstyles. However,ifyouuseanexistingLATEXstyle,thewholeoftheunderlyingTEXprogramming languageisstillavailable,somanyextensionstoLATEXhavebeenwritten.Thebestplace tofindthemisthroughCTANhttp://wwww.ctan.org/. Exercise1. Discussthedifferencebetweenamacroandafunctionor procedureinanormalprogramminglanguage.Inaprocedurallanguage, loopingisimplementedwithgotoinstructions.Howwouldyoudothat inamacrolanguage?Isthereadifferenceinefficiency? 1.1.3 Logicalmarkup Macro packages are initially motivated as a labour-saving device: a macro abbreviates a commonlyusedsequenceofcommands.However,theyhaveanotherimportantuse:awell designedmacropackageletsyouusecommandsthatindicatethestructureofadocument ratherthantheformatting.Thismeansthatyouwouldwrite\section{Introduction} andnotworryaboutthelayout.Thelayoutwouldbedeterminedbyastatementelsewhere astowhatmacrostoload2.Infact,youcouldtakethesameinputandformatittwodifferent ways.Thisisconvenientincasessuchasanarticlebeingreprintedaspartofacollection, orabookbeingwrittenbeforethefinaldesigniscommissioned. In a well written document, there will be few explicit typesetting commands. Almost all macrosshouldbeofthetypethatindicatesthestructure,andanytypesettingistakencare 1. InthistutorialIwillsay‘TEX’whenastatementappliestothebasicsystem,and‘LATEX’ifitonlyapplies tothemacropackage. 2. ComparethistotheuseofCSSversusHTMLinwebpages. VictorEijkhout 8 CHAPTER1. TEXANDLATEX ofinthedefinitionofthese.Furthercontrolofthelayoutofthedocumentshouldbedone throughglobalparametersettingsinthepreamble. 1.2 TheabsolutebasicsofLATEX HereistheabsoluteminimumyouneedtoknowtouseLATEX. 1.2.1 Differentkindsofcharacters A TEX input file mostly contains characters that you want to typeset. That is, TEX passes themonfrominputtooutputwithoutanyactionotherthanplacementonthepageandfont choice. Now, in your text there can be commands of different sorts. So TEX has to know howtorecognizecommands.Itdoesthatbymakinganumberofcharactersspecial.Inthis sectionyouwilllearnwhichcharactershavespecialmeaning. • Anythingthatstartswithabackslashisacommandor‘controlsequence’.Acon- trolsequenceconsistsofthebackslashandthefollowingsequenceofletters–no digits,nounderscoresallowedeither–oronesinglenon-lettercharacter. • Spaces at the beginning and end of a line are ignored. Multiple spaces count as onespace. • Spaces are also ignored after control sequences, so writing \LaTeX is fun comes out as ‘LATEXis fun’. To force a space there, write \LaTeX{} is fun or\LaTeX\ is fun.Spacesarenotignoredaftercontrolsymbolssuchas\$, buttheyareagainafterthe‘controlspace’\ 3. • A single newline or return in the input file has no meaning, other than giving a spaceintheinput.Youcanusenewlinestoimprovelegibilityoftheinput.Two newlines(leadingtooneemptyline)ormorecauseaparagraphtoend.Youcan alsoattainthisparagraphendbythe\parcommand. • Braces {,} are mostly used for delimiting the arguments of a control sequence. Theotheruseisforgrouping.Aboveyousawanexampleoftheuseofanempty group;similarly\TeX{}ing is funcomesoutas‘TEXingisfun’. • Letters,digits,andmostpunctuationcanbetypednormally.However,abunchof characters mean something special to LATEX: %$&ˆ_#˜{}. Here are their func- tions: % comment:anythingtotheendoflineisignored. $,_,ˆ inlinemath(toggle),subscript,superscript.Seesection1.6. & columnseparatorintables. ˜ nonbreakingspace.(Thisiscalledan‘activecharacter’) {} Macroargumentsandgrouping. Inordertotypethesecharacters,youneedtoprecedethemwithabackslash,for instance\%toget‘%’.Thisiscalleda‘controlsymbol’.Exception:use$\backslash$ toget‘\’. 3. Thefunnybucketcharacterhereishowwevisualizethespacecharacter. TEX–LATEX–CS594 1.2. THEABSOLUTEBASICSOFLATEX 9 • Some letters do not exist in all styles. As the most commonly encountered ex- ample, angle brackets <> do not exist in the roman text font (note that they are inthetypewriterstylehere,inromanyouwouldget‘¡¿’),soyouneedtowrite, somewhatlaboriously$\langle$S$\rangle$toget‘hSi’4. Exercise 2. You read in a document ‘This happens only in 90rest of thetimeitworksfine.’Whathappenedhere?Therearearticlesinprint wheretheword‘From’hasanupsidedownquestionmarkinfrontofit. Trytothinkofanexplanation. 1.2.2 LATEXdocumentstructure EveryLATEXdocumenthasthefollowingstructure: \documentclass[ <class options> ]{ <class name> } <preamble> \begin{document} <text> \end{document} Typicaldocumentclassesarearticle,report,book,andletter.Asyoumayex- pect, that last one has rather different commands from the others. The class options are optional;exampleswouldbea4paper,twoside,or11pt. Thepreambleiswhereadditionalpackagesgetloaded,forinstance \usepackage{times} switchesthewholedocumenttotheTimesRomantypeface.Thisisalsotheplacetodefine newcommandsyourself(section1.9.2). 1.2.2.1 Title Tolisttitleandauthor,adocumentwouldstartwith \title{My story} \author{B.C. Dull} \date{2004} %leave this line out to get today’s date \maketitle Afterthetitleofanarticleandsuch,thereisoftenanabstract.Thiscanbespecifiedwith \begin{abstract} ... The abstract text ... \end{abstract} Thestretchofinputfrom\beginto\endiscalledan‘environment’;seesection1.4.1.2. 1.2.2.2 Sectioning Thedocumenttextisusuallysegmentedbycalls 4. That’swhatmacrosaregoodfor. VictorEijkhout