Table Of ContentTHE COMPUTER SCIENCE OF T X AND LAT X;
E E
COMPUTER SCIENCE COURSE 594, FALL 2004
VICTOREIJKHOUT
DEPARTMENTOFCOMPUTERSCIENCE,
UNIVERSITYOFTENNESSEE,KNOXVILLETN37996
DRAFTLECTURENOTES
About this book.
ThesearethelecturenotesofacourseItaughtinthefallof2004.Thiswasthefirsttime
I taught the course, and the first time this course was taught, period. These lecture notes,
therefore,areprobablyfullofinaccuracies,mildfibs,andgrosserrors.Ok,makethat‘are
definitely full of &c’, because I know of several errors that time has prevented me from
addressing.
However, I would be interested in hearing any comments and suggestions you have, or
errorsyouwouldliketopointout.
VictorEijkhout
eijkhout@cs.utk.edu
Knoxville,TN,december2004.
Enjoy!
Lex 56
2.7 Introduction 56
2.8 Structureofalex
file 56
2.9 Definitionssection 57
2.10 Rulessection 57
Contents
2.11 Regularexpressions 60
2.12 Remarks 60
2.13 Examples 61
Yacc 65
Aboutthisbook 1 2.14 Introduction 65
1 TEXandLATEX 5 2.15 Structureofayacc
LATEX 6 file 65
1.1 Documentmarkup 6 2.16 Motivatingexample 65
1.2 Theabsolutebasicsof 2.17 Definitionssection 67
LATEX 8 2.18 Lex Yacc
1.3 TheTEXconceptual interaction 67
modelof 2.19 Rulessection 69
typesetting 10 2.20 Operators;precedence
1.4 Textelements 11 andassociativity 69
1.5 Tablesandfigures 19 2.21 Furtherremarks 70
1.6 Math 19 2.22 Examples 73
1.7 References 21 Hashing 79
1.8 SomeTEXnical 2.23 Introduction 79
issues 23 2.24 Hashfunctions 79
1.9 CustomizingLATEX 23 2.25 Collisions 82
1.10 ExtensionstoLATEX 27 2.26 Otherapplicationsof
TEXprogramming 30 hashing 86
TEXvisuals 31 2.27 Discussion 87
Projectsforthis Projectsforthis
chapter 32 chapter 88
2 Parsing 33 3 Breakingthingsinto
Parsingtheory 34 pieces 89
2.1 Levelsofparsing 34 Dynamic
2.2 Veryshort Programming 90
introduction 34 3.1 Someexamples 90
Lexicalanalysis 37 3.2 Discussion 96
2.3 Finitestateautomataand TEXparagraph
regularlanguages 37 breaking 98
2.4 Lexicalanalysiswith 3.3 Theelementsofa
FSAs 41 paragraph 98
Syntaxparsing 43 3.4 TEX’slinebreaking
2.5 Context-free algorithm 102
languages 43 NPcompleteness 110
2.6 Parsingcontext-free 3.5 Introduction 110
languages 45 3.6 Basics 111
2
CONTENTS 3
3.7 Complexity Inputfileencoding 172
classes 112 6.1 Historyand
3.8 NP-completeness 114 context 172
Pagebreaking 118 6.2 Unicode 175
3.9 Introduction 118 6.3 Moreaboutcharacter
3.10 TEX’spagebreaking setsandencodings 178
algorithm 118 6.4 CharacterissuesinTEX/
3.11 Theoryofpage LATEX 180
breaking 120 Fontencoding 183
Projectsforthis 6.5 Basicterminology 183
chapter 128 6.6 Æsthetics 185
4 Fonts 129 6.7 Fonttechnologies 186
Beziercurves 130 6.8 FonthandlinginTEX
4.1 Introductiontocurve andLATEX 189
approximation 130 Inputandoutput
4.2 Parametriccurves 135 encodinginLATEX 191
4.3 Practicaluse 143 6.9 Thefontenc
Curveplottingwith package 191
gnuplot 145 Projectsforthis
4.4 Introduction 145 chapter 192
4.5 Plotting 146 7 Software
Rastergraphics 147 engineering 193
4.6 Vectorgraphicsand Literate
rastergraphics 147 programming 194
4.7 Basicraster 7.1 TheWebsystem 194
graphics 147 7.2 Knuth’sphilosophyof
4.8 Rasterizingtype 150 program
4.9 Anti-aliasing 154 development 194
Projectsforthis Software
chapter 157 engineering 195
5 TEX’smacrolanguage 7.3 Extremelybriefhistory
–unfinished ofTEX 195
chapter 159 7.4 TEX’s
Lambdacalculusin development 195
TEX 160 Markup 198
5.1 LogicwithTEX 160 7.5 History 198
6 Character Projectsforthis
encoding 171 chapter 200
VictorEijkhout
4 CONTENTS
TEX–LATEX–CS594
Chapter 1
TEX and LATEX
Inthischapterwewilllearn
• TheuseofLATEXfordocumentpreparation,
• LATEXstylefileprogramming,
• TEXprogramming.
Handoutsandfurtherreadingforthischapter
ForLATEXusethe‘NotsoshortintroductiontoLATEX’byOetikeretal.Forfurtherreading
andfuturereference,itishighlyrecommendedthatyouget‘GuidetoLATEX’byKopkaand
Daly[15].TheoriginalreferenceisthebookbyLamport[16].Whileitisafinebook,ithas
notkeptupwithdevelopmentsaroundLATEX,suchascontributedgraphicsandotherpack-
ages.AbookthatdoesdiscussextensionstoLATEXingreatdetailisthe‘LATEXCompanion’
byMittelbachetal.[17].
For the TEX system itself, consult ‘TEX by Topic’. The original reference is the book by
Knuth[12],andtheultimatereferenceisthepublishedsource[11].
5
6 CHAPTER1. TEXANDLATEX
LATEX.
1.1 Documentmarkup
If you are used to ‘wysiwyg’ (what you see is what you get) text processors, LATEX may
seemlikeastrangebeast,primitive,andprobablyout-dated.Whileitistruethatthereisa
longhistorybehindTEXandLATEX,andtheideasareindeedbasedonmuchmoreprimitive
technologythanwhatwehavethesedays,theseideashaveregainedsurprisingvalidityin
recenttimes.
1.1.1 Alittlebitofhistory
Document markup dates back to the earliest days of computer typesetting. In those days,
terminalswerestrictlycharacter-based:theycouldonlyrendermono-spacedbuilt-infonts.
Graphicsterminalswereveryexpensive.(Someterminalscouldswitchtoagraphicalchar-
acter set, to get at least a semblance of graphics.) As a result, compositors had to key in
textonaterminal–orusingpunchedcardsinevenearlierdays–andonlysawtheresult
whenitwouldcomeoutoftheprinter.
Any control of the layout, therefore, also had to be through character sequences. To set
textinboldface,youmayhavehadtosurrounditwith<B> .. the text .. </B>.
Doesn’tthatlooklikesomethingyoustillencountereveryday?
Such‘controlsequences’hadaseconduse:theycouldserveatemplatefunction,expanding
to often used bits of text. For instance, you could imagine $ADAM$ expanding to ‘From
ourcorrespondentinAmsterdam:’.
LATEXworksexactlythesame.Therearecommandcontrolsequences;forinstance,youget
boldtypebyspecifying\bf,etcetera.Therearealsocontrolsequencesthatexpandtobits
oftext:youhavetotype\LaTeXtogetthecharacters‘LATEX’plusthecontrolcodesfor
allthatshiftingupanddownandchangesinfontsize.
\TeX => T\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX
\LaTeX => L\kern -.36em {\sbox \z@ T\vbox to\ht \z@ {\hbox
{\check@mathfonts \fontsize \sf@size \z@ \math@fontsfalse
\selectfont A} \vss }}\kern -.15em\TeX
1.1.2 Macropackages
Theoldtypesettingsystemswerelimitedintheircontrolsequences:theyhadafixedreper-
toireofcommandsrelatedtotypesetting,andthereusuallywassomemechanismtodefin-
ing ‘macros’ with replacement text. Formally, a macro is a piece of the input that gets
replacedbyitsdefinitiontext,whichcanbeacombinationofliteraltextandmoremacros
ortypesettingcommands.
TEX–LATEX–CS594
1.1. DOCUMENTMARKUP 7
An important feature of many composition programs is the ability to
designatebysuitableinputinstructionstheuseofspecifiedformats.Pre-
viouslystoredsequencesofcommandsortextreplacetheinstructions,
and the expanded input is then processed. In more sophisticated sys-
tems,formatsmaysummonotherformats,includingthemselves[”Sys-
tem/360 Text Processor Pagination/360, Application Description Man-
ual,”FormNo.GE20-0328,IBMCorp.,WhitePlains,NewYork.].
That was the situation with commercial systems by manufacturers of typesetting equip-
mentsuchasLinotype.Systemsdevelopedby(andfor!)computerscientists,suchScribeor
nroff/troff,weremuchmorecustomizable.Infact,theysometimeswouldhavetheequiv-
alent of a complete programming language on board. This makes it possible to take the
basiclanguage,anddesignanewlanguageofcommandsontopofit.Sucharepertoireof
commandsiscalledamacropackage.
Inourcase,TEXisthebasicpackagewiththestrangemacroprogramminglanguage,and
LATEX is the macro package1. LATEX was designed for typesetting scientific articles and
books: it offers a number of styles, each with slightly different commands (for instance,
there are no chapters in the article style) and slightly different layout (books need a title
page, articles merely a title on the first page of the text). Styles can also easily be cus-
tomized. For different purposes (art books with fancy designs) it is often better to write
newmacrosinTEX,ratherthantobendtheexistingLATEXstyles.
However,ifyouuseanexistingLATEXstyle,thewholeoftheunderlyingTEXprogramming
languageisstillavailable,somanyextensionstoLATEXhavebeenwritten.Thebestplace
tofindthemisthroughCTANhttp://wwww.ctan.org/.
Exercise1. Discussthedifferencebetweenamacroandafunctionor
procedureinanormalprogramminglanguage.Inaprocedurallanguage,
loopingisimplementedwithgotoinstructions.Howwouldyoudothat
inamacrolanguage?Isthereadifferenceinefficiency?
1.1.3 Logicalmarkup
Macro packages are initially motivated as a labour-saving device: a macro abbreviates a
commonlyusedsequenceofcommands.However,theyhaveanotherimportantuse:awell
designedmacropackageletsyouusecommandsthatindicatethestructureofadocument
ratherthantheformatting.Thismeansthatyouwouldwrite\section{Introduction}
andnotworryaboutthelayout.Thelayoutwouldbedeterminedbyastatementelsewhere
astowhatmacrostoload2.Infact,youcouldtakethesameinputandformatittwodifferent
ways.Thisisconvenientincasessuchasanarticlebeingreprintedaspartofacollection,
orabookbeingwrittenbeforethefinaldesigniscommissioned.
In a well written document, there will be few explicit typesetting commands. Almost all
macrosshouldbeofthetypethatindicatesthestructure,andanytypesettingistakencare
1. InthistutorialIwillsay‘TEX’whenastatementappliestothebasicsystem,and‘LATEX’ifitonlyapplies
tothemacropackage.
2. ComparethistotheuseofCSSversusHTMLinwebpages.
VictorEijkhout
8 CHAPTER1. TEXANDLATEX
ofinthedefinitionofthese.Furthercontrolofthelayoutofthedocumentshouldbedone
throughglobalparametersettingsinthepreamble.
1.2 TheabsolutebasicsofLATEX
HereistheabsoluteminimumyouneedtoknowtouseLATEX.
1.2.1 Differentkindsofcharacters
A TEX input file mostly contains characters that you want to typeset. That is, TEX passes
themonfrominputtooutputwithoutanyactionotherthanplacementonthepageandfont
choice. Now, in your text there can be commands of different sorts. So TEX has to know
howtorecognizecommands.Itdoesthatbymakinganumberofcharactersspecial.Inthis
sectionyouwilllearnwhichcharactershavespecialmeaning.
• Anythingthatstartswithabackslashisacommandor‘controlsequence’.Acon-
trolsequenceconsistsofthebackslashandthefollowingsequenceofletters–no
digits,nounderscoresallowedeither–oronesinglenon-lettercharacter.
• Spaces at the beginning and end of a line are ignored. Multiple spaces count as
onespace.
• Spaces are also ignored after control sequences, so writing \LaTeX is fun
comes out as ‘LATEXis fun’. To force a space there, write \LaTeX{} is fun
or\LaTeX\ is fun.Spacesarenotignoredaftercontrolsymbolssuchas\$,
buttheyareagainafterthe‘controlspace’\ 3.
• A single newline or return in the input file has no meaning, other than giving a
spaceintheinput.Youcanusenewlinestoimprovelegibilityoftheinput.Two
newlines(leadingtooneemptyline)ormorecauseaparagraphtoend.Youcan
alsoattainthisparagraphendbythe\parcommand.
• Braces {,} are mostly used for delimiting the arguments of a control sequence.
Theotheruseisforgrouping.Aboveyousawanexampleoftheuseofanempty
group;similarly\TeX{}ing is funcomesoutas‘TEXingisfun’.
• Letters,digits,andmostpunctuationcanbetypednormally.However,abunchof
characters mean something special to LATEX: %$&ˆ_#˜{}. Here are their func-
tions:
% comment:anythingtotheendoflineisignored.
$,_,ˆ inlinemath(toggle),subscript,superscript.Seesection1.6.
& columnseparatorintables.
˜ nonbreakingspace.(Thisiscalledan‘activecharacter’)
{} Macroargumentsandgrouping.
Inordertotypethesecharacters,youneedtoprecedethemwithabackslash,for
instance\%toget‘%’.Thisiscalleda‘controlsymbol’.Exception:use$\backslash$
toget‘\’.
3. Thefunnybucketcharacterhereishowwevisualizethespacecharacter.
TEX–LATEX–CS594
1.2. THEABSOLUTEBASICSOFLATEX 9
• Some letters do not exist in all styles. As the most commonly encountered ex-
ample, angle brackets <> do not exist in the roman text font (note that they are
inthetypewriterstylehere,inromanyouwouldget‘¡¿’),soyouneedtowrite,
somewhatlaboriously$\langle$S$\rangle$toget‘hSi’4.
Exercise 2. You read in a document ‘This happens only in 90rest of
thetimeitworksfine.’Whathappenedhere?Therearearticlesinprint
wheretheword‘From’hasanupsidedownquestionmarkinfrontofit.
Trytothinkofanexplanation.
1.2.2 LATEXdocumentstructure
EveryLATEXdocumenthasthefollowingstructure:
\documentclass[ <class options> ]{ <class name> }
<preamble>
\begin{document}
<text>
\end{document}
Typicaldocumentclassesarearticle,report,book,andletter.Asyoumayex-
pect, that last one has rather different commands from the others. The class options are
optional;exampleswouldbea4paper,twoside,or11pt.
Thepreambleiswhereadditionalpackagesgetloaded,forinstance
\usepackage{times}
switchesthewholedocumenttotheTimesRomantypeface.Thisisalsotheplacetodefine
newcommandsyourself(section1.9.2).
1.2.2.1 Title
Tolisttitleandauthor,adocumentwouldstartwith
\title{My story}
\author{B.C. Dull}
\date{2004} %leave this line out to get today’s date
\maketitle
Afterthetitleofanarticleandsuch,thereisoftenanabstract.Thiscanbespecifiedwith
\begin{abstract}
... The abstract text ...
\end{abstract}
Thestretchofinputfrom\beginto\endiscalledan‘environment’;seesection1.4.1.2.
1.2.2.2 Sectioning
Thedocumenttextisusuallysegmentedbycalls
4. That’swhatmacrosaregoodfor.
VictorEijkhout