@ MIT massachusetts institute of technology — artificial intelligence laboratory Using Analogy to Acquire Commonsense Knowledge from Human Contributors Timothy Chklovski AI Technical Report 2003-002 February 2003 ©2003 massachusetts institute of technology, cambridge, ma 02139 usa — www.ai.mit.edu Using Analogy To Acquire Commonsense Knowledge from Human Contributors by Timothy A. Chklovski Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2003 (cid:13)c Massachusetts Institute of Technology 2003. All rights reserved. Certified by: Patrick H. Winston Ford Professor of Artificial Intelligence and Computer Science Thesis Supervisor Accepted by: Arthur C. Smith Chairman, Department Committee on Graduate Students Using Analogy To Acquire Commonsense Knowledge from Human Contributors by Timothy A. Chklovski Submitted to the Department of Electrical Engineering and Computer Science on January 28, 2003, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering,Itakethefollowingfoursteps: (i)developaknowledgerep- resentation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based ana- logical reasoning algorithms over this representation, (iii) argue that cumulativeanalogyiswellsuitedforknowledgeacquisition(KA)based onatheoreticalanalysisofeffectivenessofKAwiththisapproach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empiri- cally, Learner, an open source system for KA by cumulative analogy has been implemented, deployed,1 and evaluated. Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analo- gies as natural language questions to human contributors. Suppose,forexample,thatbasedontheknowledgeabout“newspa- pers” already present in the knowledge base, Learner judges “news- paper” to be similar to “book” and “magazine.” Further suppose that assertions “books contain information” and “magazines contain infor- mation” are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether “newspapers contain information.” 1The site, “1001 Questions,” is publicly available at http://teach- computers.org/learner.htmlatthetimeofwriting. 2 Because similarity between topics is computed based on what is al- readyknownaboutthem,Learnerexhibitsbootstrappingbehavior— the quality of its questions improves as it gathers more knowledge. By summingevidenceforandagainstposinganygivenquestion,Learner alsoexhibitsnoisetolerance,limitingtheeffectofincorrectsimilarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficientamountofcorrelationintheknowledgebasetomotivateusing cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered af- firmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questionsgeneratedbycumulativeanalogy,contributorsanswered45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control,no-similaritycase8%ofquestionswereansweredaffirmatively, 60% negatively and 26% were marked as nonsensical. Thesis Supervisor: Patrick H. Winston Title: Ford Professor of Artificial Intelligence and Computer Science 3 Acknowledgments Manybrilliantandhardworkingpeoplehavehelpedmakethisabetter work. Ithankthecommittee—PatrickWinston,RandallDavis,Peter Szolovits and David Stork, for teaching me to be a serious scientist. Patrick Winston deserves particular thanks for making me always keepinmindthehigh-levelvisionofthiswork,RandallDavisforteach- ing me to be thorough and for being particularly thorough in the con- structive criticism of the work in progress, and Peter Szolovits for tak- ing extra time in the discussions that shaped the thoughts and themes underlying the thesis. I also wish to thank David Stork for his leadership in launching the broad Open Mind initiative, for his attention to this thesis beyond the call of duty and for his numerous helpful, thorough and insightful comments. I thank Ricoh Innovations for providing “Open Mind” T- shirts for rewarding exceptional contributors to Learner. I thank Marvin Minsky, advisor of my Master’s research, for en- couraging me to ignore what everyone else thinks and instead work on the problem that I believe to be the most important. I thank my wife, Tara Chklovski, for her boundless support, limit- less optimism, tireless willingness to hear about the thesis, and selfless proofreading. She has made the tough days easier and the good days amazing. I also thank my parents, Anatoli and Lucy Chklovski, whose lim- itless care and support have made this work possible. It is in con- versations with Anatoli that the approach of cumulative analogy was originallyoutlinedanditsmeritswereformulated. Ithankhimgreatly for all the invaluable in-depth discussions about the thesis, and the discussions on all the related and unrelated topics. IthankPushSinghformanydiscussionsofmethodsofpublicacqui- sition of knowledge, for sharing his experiences in implementing Open Mind Commonsense, and for making the data gathered with that ef- fortpubliclyavailable(asubsetofthatknowledgebaseformedtheseed knowledge base in the Learner). I deeply thank Matthew Fredette for contributing his brilliant pro- gramming. He has contributed much of the supporting and infrastruc- ture code for Learner on top of FramerD and has also contributed to theintegrationofexternalpackagesused,FramerDandLinkGrammar Parser. I thank Kenneth Haase, whose FramerD database Learner uses, for his development and support of FramerD. I also thank Davy Temperley,DanielSleator,JohnLaffertyforimplementingLinkGram- mar Parser and making it available for research purposes. 4 I thank Erik Mueller and Kathy Panton for making available some data on existing knowledge bases. Some data provided by them are reproduced in this thesis. I thank Rada Mihalcea for her comments on the thesis, advice on approachingtheissueoflexicalambiguity,andforprovidingsomedata onevaluationofOpenMindWordExpert. Mydiscussionofapproaches to WSD in Section 6.4.2 is inspired by Rada’s treatment of the topic. I thank Deb Roy for discussions of the relationship between natural language assertions and other possible representations of meanings of concepts. All of these people have helped make this thesis a better piece of work. Any inaccuracies or errors or shortcomings of this work are my responsibility. IwouldliketothanktheNationalScienceFoundationforthethree years of fellowship support they provided, and the MIT Electrical En- gineeringandComputerSciencedepartmentforsupportingmeforone termwiththeEdwinS.WebsterGraduateFellowshipinElectricalEn- gineering and Computer Science. IalsothankMarilynPierceandPeggyCarney,administratorsofthe EECS Graduate Students Department, for their attention and help far beyondthecallofdutywiththeadministrativeaspectsofthedegree,as wellasAnthonyZolnik,FernDeOliveiraandNiraManoharanfortheir administrative help in scheduling meetings with the thesis committee, reserving facilities, and so on. Finally,Ithankthecontributorsthathavecontributedtheirknowl- edge to the 1001 Questions web site, turning Learner from an exper- imental system into a source of commonsense knowledge. Particular thanksgoouttothecontributorswhohaveleftcomments,suggestions, and other feedback about the project as a whole. 5 Contents 1 Introduction 12 1.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . 12 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.1 The case for creating a commonsense knowledge base . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.2 Practical goals Learner helps achieve . . . . . . 16 1.2.3 A comparison of approaches to collecting com- monsense knowledge . . . . . . . . . . . . . . . . 18 1.2.4 Research in knowledge-based KA contributes to broader progress in AI . . . . . . . . . . . . . . . 22 1.3 Structure of this document . . . . . . . . . . . . . . . . 23 2 Examples of Acquiring Knowledge in Learner 25 3 Representation 29 3.1 Overall architecture . . . . . . . . . . . . . . . . . . . . 29 3.2 Form of accepted knowledge . . . . . . . . . . . . . . . . 31 3.3 Sentences and their signatures. . . . . . . . . . . . . . . 33 3.4 Phrases, properties and assertions. . . . . . . . . . . . . 34 4 Algorithms 37 4.1 Guiding principles . . . . . . . . . . . . . . . . . . . . . 37 4.2 Analogy . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.1 Select-NN . . . . . . . . . . . . . . . . . . . . . . 39 4.2.2 Map-Props . . . . . . . . . . . . . . . . . . . . . 45 4.2.3 Refinements in Select-NN . . . . . . . . . . . . . 48 4.2.4 Questions posed by cumulative analogy improve with acquisition of more knowledge . . . . . . . . 49 4.2.5 Other merits of cumulative analogy. . . . . . . . 50 4.3 Filtering questions with critics . . . . . . . . . . . . . . 52 6 4.3.1 Using a taxonomy to filter questions . . . . . . . 53 4.4 Measuring semantic similarity . . . . . . . . . . . . . . . 54 5 Interface 61 5.1 Interface description . . . . . . . . . . . . . . . . . . . . 61 5.2 Permitted multiple-choice answers . . . . . . . . . . . . 64 6 Ambiguity 67 6.1 Kinds of ambiguity . . . . . . . . . . . . . . . . . . . . . 67 6.2 Lexical ambiguity: impact on knowledge acquisition . . 69 6.3 Tasks and methods sensitive to ambiguity in the knowl- edge base . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4 Ambiguity of the acquired knowledge can be reduced later 74 6.4.1 Acquiring word sense information from human contributors . . . . . . . . . . . . . . . . . . . . . 74 6.4.2 Automatic word sense disambiguation . . . . . . 77 7 The Correlated Universe, or Why Reasoning by Anal- ogy Works 80 7.1 Overview of the knowledge base . . . . . . . . . . . . . . 81 7.2 Amount of similarity . . . . . . . . . . . . . . . . . . . . 84 7.3 Reach of analogy . . . . . . . . . . . . . . . . . . . . . . 88 7.4 Similarity of most similar . . . . . . . . . . . . . . . . . 91 7.4.1 Mismatches . . . . . . . . . . . . . . . . . . . . . 93 7.5 On the origin of similarity . . . . . . . . . . . . . . . . . 94 8 Results 99 8.1 Quality of questions: cumulative analogy vs. a baseline 100 8.2 Comparison of the resultant knowledge base to the seed knowledge base . . . . . . . . . . . . . . . . . . . . . . . 103 8.3 Classes of knowledge acquired . . . . . . . . . . . . . . . 106 8.3.1 Knowledge classification scheme . . . . . . . . . 106 8.3.2 Knowledge collected . . . . . . . . . . . . . . . . 111 8.3.3 Other knowledge bases . . . . . . . . . . . . . . . 112 8.4 Rate of contribution to Learner . . . . . . . . . . . . . 116 8.5 User feedback about Learner . . . . . . . . . . . . . . 120 8.5.1 Limitations of cumulative analogy . . . . . . . . 122 9 Discussion 124 9.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 124 9.1.1 Amount of commonsense knowledge . . . . . . . 125 9.1.2 Early expert systems . . . . . . . . . . . . . . . . 127 9.1.3 Forming expectations from existing knowledge . 128 7 9.1.4 Knowledge representation . . . . . . . . . . . . . 130 9.1.5 Machine learning: concepts and relationships . . 131 9.1.6 NLP: text mining and question answering . . . . 132 9.1.7 Gathering from contributors over the web . . . . 132 9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 134 9.2.1 Additional reasoning mechanisms . . . . . . . . . 137 9.2.2 Better use of collected assertions . . . . . . . . . 138 9.2.3 Better use of collected answers . . . . . . . . . . 139 9.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 139 A Link Grammar Parser 147 B FramerD 149 C Natural Language Generation 150 D Deriving the Amount of Correlation Due to Chance 153 8 List of Figures 2.1 Screenshot of acquiring knowledge about “newspaper.” . 27 3.1 OverallarchitectureoftheLearnerknowledgeacquisi- tion system. . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Example of a matrix encoding assertions about objects and their properties . . . . . . . . . . . . . . . . . . . . 36 4.1 Select-NN: Identifying all properties of “newspaper.” . 39 4.2 Select-NN: Identifying all assertions about known prop- erties of “newspaper” . . . . . . . . . . . . . . . . . . . 40 4.3 Select-NN:Usingpropertiesof“newspaper,”itsnearest neighbors (the most similar objects) are identified. . . . 41 4.4 Select-NN:Allthestepsfortheexampleof“newspaper” presented together . . . . . . . . . . . . . . . . . . . . . 42 4.5 Select-NN: Algorithm for selecting nearest neighbors. . . 43 4.6 Map-Props: All known properties of the similar objects are selected. . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.7 Map-Props: an example for “newspaper” . . . . . . . . 46 4.8 Map-Props: Algorithmformappingpropertiesfromnear- est neighbors onto O . . . . . . . . . . . . . . . . . 46 target 5.1 Screenshot of acquiring knowledge about “newspaper.” Reproduces Figure 2.1. . . . . . . . . . . . . . . . . . . . 62 5.2 Presenting to the contributor reasons for similarity of “newspaper” and “book”. . . . . . . . . . . . . . . . . . 63 5.3 Presentingtothecontributorthereasonsforformulating the question “newspapers contain information?” . . . . 64 6.1 OpenMindWordExpert(OMWE):Ascreenshotofcol- lecting knowledge about “children” . . . . . . . . . . . . 75 9
Description: