ebook img

SYNTAX ANALYSIS BY A PRODUCTION LANGUAGE by Arthur Evans, Jr. Submitted to the ... PDF

146 Pages·2003·6.18 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview SYNTAX ANALYSIS BY A PRODUCTION LANGUAGE by Arthur Evans, Jr. Submitted to the ...

SYNTAX ANALYSIS BY A PRODUCTION LANGUAGE by Arthur Evans, Jr. Submitted to the Carnegie Institute of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy Pittsburgh, Pennsylvania 1965 INTRODUCTION Over the years users of computers have been writing programs whose correctness could only be taken as a matter of faith. If the study of programming is to become more of a science and less of an art, however, it becomes necessary that algorithms be accompanied by proof of correct opera- tion. To quote McCarthy (1965), "The prize to be won ... is the elimination of debugging. Instead a programmer will present a computer-checked proof that a program has the desired properties." The present work is the one of many small steps which will have to be taken before this utopian goal may be achieved. The history of proving computer algorithms is rather sparse. Chapter 7 contains a brief discussion of some of the work that has been done, accompanied by a comparison with the present work. It appears safe to say that no previous work, to this author's knowledge, has attacked a programming algorithm of the complexity of that treated here. The present effort is concerned with proving the correctness of a specific and practical translation algorithm which translates a segment of an ALGOL-like language to postfix, or reverse Polish, form. It should be emphasized that specific algorithms are introduced and their properties analyzed, but that it is not the goal of this research to develop general proof schemes which can be applied to a class of translation algorithms. It would be desirable, of course, if it were possible to prove that an entire ALGOL translator was correct. Indeed, the initial goal of this research was to prove the correctness of the algorithm used in the ALGOL translator run- ning at the Carnegie Tech Computation Center. Unfortunately, this proved Acknowledgements I am deeply indebted to Professor Alan J. Perlis for his guidance in this work. He has made available to me much of his valuable time to provide the counsel and advice needed to bring this project to fruition. Much of the programming of the QWERT system, which played a valu- able part in checking algorithms before an attempt was made to prove them, has been done by Mrs. Carol H. Thompson, to whom I express my gratitude. Further, I am grateful to the "ALGOL crew" at the Computation Center who have implemented in a working translator many of the ideas which developed in connection with this research, particularly to Mrs. Janet W. Fierst, the leader of the group, and Mr. David M. Blocher, who implemented the produc- tion interpreter. I am also grateful to the entire staff of the Computation Center for providing smooth use of the computer, without which the work could not have been done. The final typing has been done quickly and accurately by Mrs. Edythe Simmons, to whom I express appreciation. Finally, I am particularly grateful to my wife, Betty, who, along with my children, has been most patient during a trying time. The research reported here was supported by the Advance Research Projects Agency under the Department of Defense under the Grant SD-146 to Carnegie Institute of Technology. iii translation rules) and the A-productions. We show that the algorithm defined by the productions produces precisely the translation given by the translation rules. In Chapter 5 we carry out the program of Chapter 4 for the B-productions and B-Grammar. In Chapter 6 we show that the A-productions are equivalent to another set of productions in that both accept the same set of input strings and produce the same output. The new productions are in a form more useful for certain applications. Chapter 7 contains a summary of the results and discussion of the relation of this work to other work in the field. The appendix contains a very brief discussion of a programming system for the production language. Sample computer outputs are included. vii . impractical for several reasons, not the least of which was the fact that the algorithm is not correct. (In the Carnegie Tech system, meanings are assigned to such non-ALGOLic constructions as A+BAC. One of the properties of a "correct" algorithm, as will be discussed later in some detail, is that it must reject any string which is not legal. Thus the Carnegie Tech ALGOL translator is not in this sense correct, since it accepts non-ALGOLic strings. Of course, this deficiency does not keep it from being useful.) Another reason for not considering the ALGOL productions is the sheer size of the effort involved: There are over 650 productions in the ALGOL trans- lator. At the present stage of developing techniques, it was not felt practical to undertake a proof of this magnitude. Instead, it was felt more appropriate to consider a smaller body which was more easily handled. The hope was that techniques could be developed which eventually might be applicable to larger tasks. An approach which might well be fruitful would be to merge the present techniques with those of London (1964), with the possible result of mechanically proving an entire ALGOL translator. This point will be discussed in the Summary in Chapter 7 in further detail. For the reasons given, it seemed appropriate to consider a subset of ALGOL assignment statements. Actually, two languages are considered in detail: the A-language and the B-language. The first is a very simple language used as an example as the techniques are developed in the first four chapters, and the second is then covered in Chapter 5. The B-language includes assignment statements with multiple left parts, both arithmetic and Boolean expressions on the right and expressional parentheses. Not included are procedures with parameters, subscripted variables, or the ALGOL • construction "if ... then ... else ...". Further, the only arithmetic opera- tors are plus and times - subtraction, division and exponentiation are not permitted. A few words of comment about these exclusions are in order, after a few preliminary comments. The proof techniques to be presented involve lengthy and complex case analysis. It seemed thus appropriate to select a grammar which was representative of the general problem but for which the proofs would not be excessively tedious. With the exception of the "if ... then ... else ..." construction, all of the omissions just listed are of aspects of ALGOL which are not felt to be critical. The techniques used to handle expressional parentheses could easily be adapted to subscripted variables and to proced- ure calls. Further, adding more operators would require no new techniques, although it would lengthen the proofs. Thus these omissions seemed consistent with the purpose of developing techniques. More important, certain complica- tions were deliberately included• These include the use of "+" both as a binary and as a unary operator, and permitting mixed arithmetic and Boolean expressions, as in the statement a _ b A d _ e * f ; The "if ... " construction was excluded to reduce the case analysis. Although this construction seems to be essentially different from any of the constructs which are presently accepted, it is felt that the bracketing techni- que introduced in Chapter I could be expanded to handle it. This point will be discussed further in the Summary in Chapter 7. One other simplification has been made in this work. It is assumed o that identifiers and constants, as used in ALGOL and other programming languages, have been "taken care of" in some earlier part of the processing. Thus the only operand treated in this work is the symbol "I" (mnemonic for identifier). Treating ALGOL-like identifiers introduces the problems of scanning, concatenation and internal machine representation, and these did not seem to be the linguistically important problems. Instead, the emphasis in this research is on syntactic analysis of source code and translation into another form. Floyd (1961-b) has shown a production scheme which processes identifiers and (some) constants, and it is presently planned to use such a scheme in the next version of the Carnegie Tech ALGOL Translator. This point will not be pursued further. It is clear that we cannot go about proving an algorithm unless we are able to say what the algorithm is to do. If, for example, we were setting out to prove a square root algorithm, we could say, "The algorithm delivers a number with the property that its square differs from the input by less than epsilon." For a translation algorithm, however, it is necessary first to define what translation it is that the algorithm is to do. It is not enough to say that the algorithm is to translate assignment statements into postfix, since the term "postfix" may mean different things to differ- ent people. Instead, we must define explicitly what translation is to be produced for any given input. But we must do more than tell what output is to be produced when the translator is supplied legal input. We want the translator to give an appropriate error signal if the input is invalid. Further, we want to be sure that for any finite input the translator does not loop forever or • otherwise act in a pathological manner• We will make a claim somewhat like the following: Any legal input sentence will be translated into the proper postfix, and any other input will be rejected as being invalid. Thus we have two tasks: We must specify just which strings, out of all possible strings, over an alphabet, are to be considered as "legal input", and we must define for each such string what translation is to be produced. The first task is easy, since techniques for this purpose are well known. We will base our work on the notation used in defining the language ALGOL-60, the notation referred to as Backus Normal Form, abbre- viated BNF. For the second task, we will append to BNF a notation which associates with each construct in the language a translation rule. We will show that this scheme associates with each legal string in the language a unique postfix representation. Thus, having accomplished the two tasks, we will have defined precisely what it is that the algorithm is supposed to do. Before continuing, we pause to make a few comments about the nota- tion we will introduce to define language• The technique we use to define a BNF grammar causes the constructs of the language to be sets of strings. This approach follows that of Floyd (1961), but differs from that frequently used by mathematical linguists. The latter usually define a grammar to be a set of rules ("productions") for generating legal strings. Since the problem in translator construction is to develop techniques for efficiently deciding which category (if any) a given string belongs to, our approach seems to be more appealing. We concern ourselves with whether a string belongs to a set (the set of legal strings) rather than whether the string can be generated by a given group of rules. , Our plan of attack may be described informally as follows, using notation and terminology that will not be used in the rest of the work. After defining carefully what we mean by a Backus Naur Form (BNF) Grammar and what it means for a BNF grammar to be unambiguous, we exhibit a specific BNF grammar G which defines a language L. Our first important result is the Theorem: The grammar G is unambiguous. We next will give a set of Floyd-Evans productions, PR. (We will not now distinguish between the productions and the algorithm which they define.) Given any string as input, PR must do one of three things: (I) It may halt, having produced an output string whose last character is other than "ERR*". (2) It may halt, having produced an output string whose last character is "ERR*". (3) It may cycle in a loop "forever", scan past the end of the input string, or exhibit other "pathological" behavior. In the first case, we say, "The productions have run to a successful conclu- sion." The string produced is called the resulting translation of the source string. In the second case, we say, "The productions have detected an error." The characters in the output string (other than the ERR*) are of no interest. The third category is described by the statement, "The productions fail." Now, for each legal sentence a in G, let TR(a) be the translation

Description:
Arthur Evans, Jr. Submitted to the Carnegie. Institute of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.