ebook img

Method and apparatus for universal parsing of language PDF

43 Pages·2013·3.65 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Method and apparatus for universal parsing of language

IJS005878385[\ United States Patent [19] [11] Patent Number: 5,878,385 Bralich et al. [45] Date of Patent: Mar. 2, 1999 [54] METHOD AND APPARATUS FOR OTHER PUBLICATIONS UNIVERSAL PARSING OF LANGUAGE Aho, A.V. et al., Compilers: Principles, Techniques, and [75] Inventors: Phillip A. Bralich, Honolulu, Hi.; Tools, Sections 2.4, 4.4—4.5, 4.7, 5.3, 5.6, 6.7 (Addison— Wesley Publishing Company 1986). Derek Bickerton, Talent, Oreg. Allen, J ., Natural Language Understanding, Chapters 3—6, Assignee: Ergo Linguistic Technologies 17 (The Benjamin/Cummings Publishing Company, Inc. 1995). Bralich, P., and D. Bickerton, “A Proposal for the Revision Appl. No.: 715,313 of the theory of P and P Syntax,” (seminar presentation Filed: Sep. 16, 1996 material 1992). Bralich, P., and D. Bickerton, “Anaphoric Reference,” Int. Cl.6 .................................................... .. G06F 17/27 (seminar presentation material 1992). US. Cl. ................................... .. 704/9; 704/8; 704/10; Bralich, P., and D. Bickerton, “Attach 0t,” (seminar presen 707/532 tation material 1992). Field of Search ................................ .. 704/9, 8, 10, 1; Bralich, P., and D. Bickerton, “Other Syntactic Phenomena,” 707/531, 532, 536, 533, 534 (seminar presentation material 1992). Cohen, R., “Analyzing the Structure of Argumentative Dis course,” Computational Linguistics, 13(1—2):11—24 (1987). References Cited GrosZ, B]. et al., “TEAM: An Experiment in the Design of Transportable Natural—Language Interfaces,” Arti?cial U.S. PATENT DOCUMENTS Intelligence, 32(2):173—243 (1987). 4,706,212 11/1987 Toma ........................................ .. 704/2 4,829,423 5/1989 Tennant et al. 704/8 (List continued on next page.) 4,864,503 9/1989 Tolin ............... .. 704/2 Primary Examiner—Joseph Thomas 4,868,750 9/1989 Kucera et al. .. 704/8 4,887,212 12/1989 Zamora et al. 704/8 [57] ABSTRACT 4,984,178 1/1991 Hemphill et al. . 704/255 4,994,966 2/1991 Hutchins . . . . . . . . . . . . . .. 704/9 A method and apparatus for natural language parsing are 5,091,950 2/1992 Ahmed ........ .. 704/277 described. The invention includes the steps of retrieving an 5,109,509 4/1992 Katayama et al. 704/9 input string, and performing a dictionary look-up for each 5,128,865 7/1992 Sadler ......... .. 704/2 Word in the input string to form a correspondence betWeen 5,157,606 10/1992 Nagashima ...... .. 704/2 5,222,187 6/1993 Doddington et al. . 704/200 each Word and a dictionary entry. The dictionary entry 5,287,429 2/1994 Watanabe ........ .. 704/239 provides lexical features of the Word. The invention includes 5,297,040 3/1994 Hu ............... .. 704/9 the additional step of processing the Words in the input string 5,321,606 6/1994 Kuruma et al. 704/9 beginning With a last Word in the input string and continuing 5,321,607 6/1994 Fukumochi et al. .. 704/4 toWard the ?rst Word in the input string. This step includes 5,355,493 10/1994 Silberbauer et al. . 395/701 the step of associating a selected Word in the input string 5,384,702 1/1995 T011 ........................................... .. 704/9 With a Word located to the left of the selected Word in the 5,416,696 5/1995 SuZuoka .................................... .. 704/2 input string to form a Word phrase. The associating step is 5,424,947 6/1995 Nagao et al. . 704/9 performed according to predetermined selection restriction 5,475,587 12/1995 Anick et al. .. 704/9 5,687,384 11/1997 Nagese .... .. 704/9 rules. The steps of processing the Words and associating a 5,721,938 2/1998 Stuckey ..................................... .. 704/4 selected Word are repeated until all Words of the input string have been processed. FOREIGN PATENT DOCUMENTS 2 211 639 7/1989 United Kingdom . 3 Claims, 12 Drawing Sheets 1 m nus ROUYINE. ATTADHM' ENT ls sDeOeNmE FROM maHmosT wean AM: we' \ \ WORKOURWAV LEFYWARD. Ms Rounuz IS PERFORMED IN EACH sPuT / 8% "mum (libY) susno3u ms3 3 46 4|‘ 9a HBMEPO SUBROUTINE lF monmosT worm [5 p0, spuns no»: 94 IMPOSSI, BIvU T3V 3 C H3E C3K 3 L \ CHECKo WrO IRMDPO SmStaBl ALIGTAIIENSS' T ‘LIST “k ATTACHIPREMDTETRIGH'I'TIKTSI') I ETLHIEMSIPNLAITYE \102 I ED vas i NO “81 , ELMNATE spur ‘ l REnucE3sPLrrs A LOCOOKM wBI RNAoTuI aOnN RFEOMUANIDN.I mNG SmPLIY IS.T AIFW SAAYM E ;l 106 l nus sPLlT IS DONE * MORE SPLITS vEs - 7 NO "IN TEND " 5,878,385 Page 2 OTHER PUBLICATIONS Kay, M., “The MIND System,” Natural Language Process ing, Courant Computer Science Symposium 8:Dec. 20—21, GrosZ, B.J., “The Representation and Use of Focus in a 1971, pp. 155—188 (NeW York: Algorithmics Press, Inc., R. System for Understanding Dialogs,” Readings in Natural Rustin ed. 1973). Language Processing, pp. 353—362 (Morgan Kaufmann Publishers, Inc. 1986. Litman, D.J., and J .F. Allen, “A Plan Recognition Model for GrosZ, B.J., and C. Sidner, “Attention, Intentions, and the Subdialogues in Conversations,” Cognitive Science, 11(2): Structure of Discourse,” Computational Linguistics, 163—200 (1987). 12(3):175—204 (1986). Reichman, R., “Conversational Coherency,” Cognitive Sci Hirschberg, J ., and DJ. Litman, “Empirical Studies on the ence, 2(4):283—327 (1978). Disambiguation of Cue Phrases,” Computational Linguis tics, 19(3):501—530 (1993). Reichman, R., Getting Computers to Talk Like You and Me, Kaplan, R.M., “A General Syntactic Processor” Natural Chapters 2, 5 and 8 (Cambridge, MA: The MIT Press 1985). Language Processing, Courant Computer Science Sympo Sidner, C.L., “Focusing in the Comprehension of De?nite sium 8: Dec. 20—21, 1971, pp. 193—241 (NeW York: Algo Anaphora,” Readings in Natural Language Processing, pp. rithmics Press, Inc., R. Rustin ed. 1973). 363—394 (Morgan Kaufmann Publishers, Inc. 1986). Kay, M., “Algorithm Schemata and Data Structures in Syntactic Processing,” Readings in Natural Language Pro Sidner, C.L., “Plan Parsing for Intended Response Recog cessing, pp. 35—70 (Morgan Kaufmann Publishers, Inc. nition in Discourse,” Computational Intelligence, 1(1):1—10 1986). (1985). U.S. Patent Mar. 2, 1999 Sheet 1 0f 12 5,878,385 54 £6 DATA 20\ STORAGE PROCESSOR RAM DEvICE I I .) F , 1s 22\ KEYBOARD -—> \ ROM \ CURSOR 24\ CONTROL swig“ DEVICE 10 26\ DISPLAY _ SCANNER DEvICE DEVICE \30 voICE 28\ INPUTI , voICE ,3? f RECOGNITION DEvICE U.S. Patent 5,878,385 Mar. 2, 1999 Sheet 2 0f 12 MAIN OVERVIEW ROUTINE DICTIONARY LOOKUP 60,000 WORDS, DICTIONARY SPLIT I MAIN SEQUENCE 1 EVALUATE PREPOSITION AND ATTACH TO THE X2 LEVEL HandleAnd (a&b) l MAIN SEQUENCE 2 EVALUATE Verb & Do SUBCATAGORIES HandleAnd (b&c) I MAIN SEQUENCE 3 CLEAN UP I FUNCTIONS 54@ U.S. Patent Mar. 2, 1999 Sheet 3 0f 12 5,878,385 DICTIONARY LOOKUP BEGIN WORD ROUTINE NOTE: LOOKUP IS DONE ON EACH 60 WORD IN THE SENTENCE I GET INPUT SENTENCE \ 62 Li It LOOK UP WORD 60 000 WORD DICTIONARY CONTAINS \64 33W‘ wrtATw??s'rek?m'v-wmul I I LOOK FOR A MATCH No SPLIT FOR MULTIPLE POSSIBILITIES, IF POSSIBLE / MAKE ROBUST LEXICAL 70\ GUESS - SPLIT FOR MULTIPLE GUESSES 72 YES WORDS IN THE INPUT SENTENCE U.S. Patent Mar. 2, 1999 Sheet 4 0f 12 5,878,385 MAIN SEQUENCE 1 84 BEGIN IN THIS ROUTINE, ATTACHMENT IS DONE FROM RIGHTMOST WORD AND WE WORK OUR WAY LEFTWARD. THIS ROUTINE IS PERFORMED IN EACH SPLIT HandleAnd (8&b) SUBROUTINE 4—6 I 90\ HandlePO SUBROUTINE lF RIGHTMOST WORD IS pO, SPLIT AS p0Ip3 I IMPOSSIBILITY CHECK CHECK WORD PAIR AGAINST "LIST OF IMPOSSIBILITIES" 110\ ATTACH/PROMOTE (RIGHTMOST) ELIMINATE THE SPLIT \102 118\ ELIMINATE SPLIT I REDUCE SPLITS LOOK THROUGH REMAINING SPLITS, IF SAME COMBINATION FOUND, THROW IT AWAY / 106 L 122w THIS SPLIT IS DONE U.S. Patent Mar. 2, 1999 Sheet 5 0f 12 5,878,385 MAIN SEQUENCE 2 BEGIN 150 IN THIS ROUTINE, ATTACHMENT IS DONE FROM RIGHTMOST ITEM AND WEE WORK OUR WAY LEFTWARD. THIS ROUTINE IS PERFORMED ON EACH SPLIT Q 152\ EVALUATE VERB 154 NO YEs 156\ A'ITACH SUBCATAGORIES 164 NO YES 166\ HandIeAnd (a&b) I 168\ ATTACH/PROMOTE (RIGHTMOST ITEM) I60 \ I THE TREE FAILED, REMOVE AND ELIMINATE YES U.S. Patent 5,878,385 Mar. 2, 1999 Sheet 6 0f 12 SPLITS A SPLIT IS WHEN WE TAKE A SNAPSHOT OF THE CURRENT SENTENCE STRUCTURE WE'RE WORKING ON AND ALTER IT TO TRY ANOTHER POSSIBILITY MAIN SEQUENCE 1 SPLIT nn22 dd00 aa00 pp30 fig; 63 MAIN SEQUENCE 2 SPLIT M/CX8J8 U.S. Patent Mar. 2, 1999 Sheet 7 0f 12 5,878,385 MAIN SEQUENCE 3 I90\ BEGIN 50 ‘CLEAN UP" PHASE LOOK FOR UNATI'ACHED SUBTREES / 192 195\ ATTACH/PROMOTE (TWO UNATI'ACHED ITEMS) 196 NO YES 198\ THE TREE FAILED, REMOVE AND ELIMINATE 2"" Fly 7 U.S. Patent Mar. 2, 1999 Sheet 8 0f 12 5,878,385 222 ATI'ACH-SUBCATAGORIES PARAMETER: A TREE SPLIT POSSIBILITY RETURN: SUCCESS OR FAIL 224\ FIND RIGHTMOST VERB 226\ LOOK FOR AND A'ITACH TO THETA PHRASES AS SPECIFIED IN VERB'S SUBCATAGORIES 228 ARE SUBCATAGORIES FULFILLED 9 YES 236\ LOOK TO LEFT FOR NEXT RIGHTMOST VERB 240\ RETURN (success) 240\ RETURN (FAIL) @234 FL? 8

Description:
input string to form a Word phrase. The associating Chapters 2, 5 and 8 (Cambridge, MA: The MIT Press 1985). Sidner, C.L FEATURES To VERB PHRASE. 306. DID WE .. Borland C++ compiler Version 4.5. HoWever, any
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.