Using Analogy To Acquire Commonsense Knowledge from Human Contributors by Timothy A. Chklovski Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2003 © Massachusetts Institute of Technology 2003. All rights reserved. A u th o r ............. ................................................ Department of Electrical Engineering and Computer Science January 31, 2003 Certified by.... ............... Patrick H. Winston Ford Professor of Artificial Intelligence and Computer Science Thesis Supervisor Accepted by........... Arthur C. Smith Chairman, Department Committee on Graduate Students MASSACHUSETTS INSTITUTE OF TECHNOLOGY BARKER MAY 1 2 2003 LIBRARIES Using Analogy To Acquire Commonsense Knowledge from Human Contributors by Timothy A. Chklovski Submitted to the Department of Electrical Engineering and Computer Science on January 28, 2003, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The goal of the work reported here is to capture the commonsense knowledge of non- expert human contributors. Achieving this goal will enable more intelligent human- computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, LEARNER, an open source system for KA by cumulative analogy has been implemented, de- ployed,' and evaluated. LEARNER acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, LEARNER judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then LEAR- NER will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, LEARNER exhibits bootstrapping behavior - the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing 'The site, "1001 Questions," is publicly available at http://teach-computers.org/learner.html at the time of writing. 3 any given question, LEARNER also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, nega- tively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical. Thesis Supervisor: Patrick H. Winston Title: Ford Professor of Artificial Intelligence and Computer Science 4 Acknowledgments Many brilliant and hard working people have helped make this a better work. I thank the committee - Patrick Winston, Randall Davis, Peter Szolovits and David Stork, for teaching me to be a serious scientist. Patrick Winston deserves particular thanks for making me always keep in mind the high-level vision of this work, Randall Davis for teaching me to be thorough and for being particularly thorough in the constructive criticism of the work in progress, and Peter Szolovits for taking extra time in the discussions that shaped the thoughts and themes underlying the thesis. I also wish to thank David Stork for his leadership in launching the broad Open Mind initiative, for his attention to this thesis beyond the call of duty and for his numerous helpful, thorough and insightful comments. I thank Ricoh Innovations for providing "Open Mind" T-shirts for rewarding exceptional contributors to LEARNER. I thank Marvin Minsky, advisor of my Master's research, for encouraging me to ignore what everyone else thinks and instead work on the problem that I believe to be the most important. I thank my wife, Tara Chklovski, for her boundless support, limitless optimism, tireless willingness to hear about the thesis, and selfless proofreading. She has made the tough days easier and the good days amazing. I also thank my parents, Anatoli and Lucy Chklovski, whose limitless care and support have made this work possible. It is in conversations with Anatoli that the approach of cumulative analogy was originally outlined and its merits were formu- lated. I thank him greatly for all the invaluable in-depth discussions about the thesis, and the discussions on all the related and unrelated topics. I thank Push Singh for many discussions of methods of public acquisition of knowl- edge, for sharing his experiences in implementing Open Mind Commonsense, and for making the data gathered with that effort publicly available (a subset of that knowl- edge base formed the seed knowledge base in the LEARNER). I deeply thank Matthew Fredette for contributing his brilliant programming. He 5 has contributed much of the supporting and infrastructure code for LEARNER on top of FramerD and has also contributed to the integration of external packages used, FramerD and Link Grammar Parser. I thank Kenneth Haase, whose FramerD database LEARNER uses, for his development and support of FramerD. I also thank Davy Temperley, Daniel Sleator, John Lafferty for implementing Link Grammar Parser and making it available for research purposes. I thank Erik Mueller and Kathy Panton for making available some data on existing knowledge bases. Some data provided by them are reproduced in this thesis. I thank Rada Mihalcea for her comments on the thesis, advice on approaching the issue of lexical ambiguity, and for providing some data on evaluation of Open Mind Word Expert. My discussion of approaches to WSD in Section 6.4.2 is inspired by Rada's treatment of the topic. I thank Deb Roy for discussions of the relationship between natural language assertions and other possible representations of meanings of concepts. All of these people have helped make this thesis a better piece of work. Any inaccuracies or errors or shortcomings of this work are my responsibility. I would like to thank the National Science Foundation for the three years of fel- lowship support they provided, and the MIT Electrical Engineering and Computer Science department for supporting me for one term with the Edwin S. Webster Grad- uate Fellowship in Electrical Engineering and Computer Science. I also thank Marilyn Pierce and Peggy Carney, administrators of the EECS Grad- uate Students Department, for their attention and help far beyond the call of duty with the administrative aspects of the degree, as well as Anthony Zolnik, Fern DeO- liveira and Nira Manoharan for their administrative help in scheduling meetings with the thesis committee, reserving facilities, and so on. Finally, I thank the contributors that have contributed their knowledge to the 1001 Questions web site, turning LEARNER from an experimental system into a source of commonsense knowledge. Particular thanks go out to the contributors who have left comments, suggestions, and other feedback about the project as a whole. 6 Contents 1 Introduction 15 1.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . . . . . . 15 1.2 M otivation . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . 17 1.2.1 The case for creating a commonsense knowledge base . . . . . 17 1.2.2 Practical goals LEARNER helps achieve . . . . . . . . . . . . . 20 1.2.3 A comparison of approaches to collecting commonsense knowledge 22 1.2.4 Research in knowledge-based KA contributes to broader progress in A l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.3 Structure of this document . . . . . . . . . . . . . . . . . . . . . . . . 28 2 Examples of Acquiring Knowledge in Learner 29 3 Representation 33 3.1 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Form of accepted knowledge . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Sentences and their signatures . . . . . . . . . . . . . . . . . . . . . . 38 3.4 Phrases, properties and assertions . . . . . . . . . . . . . . . . . . . . 39 4 Algorithms 43 4.1 Guiding principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 A nalogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.1 Select-N N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.2 M ap-Props . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7 4.2.3 Refinements in Select-NN . . . . . . . . . . . . . . . 55 4.2.4 Questions posed by cumulative analogy improve with acquisi- tion of more knowledge . . . . . . . . . . . . . . . . . . . . . 57 4.2.5 Other merits of cumulative analogy . . . . . . . . . . . . . . 58 4.3 Filtering questions with critics . . . . . . . . . . . . . . . . . . . . . 60 4.3.1 Using a taxonomy to filter questions . . . . . . . . . . . . . 61 4.4 Measuring semantic similarity . . . . . . . . . . . . . . . . . . . . . 63 5 Interface 71 5.1 Interface description . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Permitted multiple-choice answers . . . . . . . . . . . . . . . . . . . . 74 6 Ambiguity 77 6.1 Kinds of ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.2 Lexical ambiguity: impact on knowledge acquisition . . . . . . . . . . 80 6.3 Tasks and methods sensitive to ambiguity in the knowledge base . . . 83 6.4 Ambiguity of the acquired knowledge can be reduced later . . . . . . 85 6.4.1 Acquiring word sense information from human contributors . . 86 6.4.2 Automatic word sense disambiguation . . . . . . . . . . . . . 89 7 The Correlated Universe, or Why Reasoning by Analogy Works 93 7.1 Overview of the knowledge base . . . . . . . . . . 95 7.2 Amount of similarity . . . . . . . . . . . . . . . . 99 7.3 Reach of analogy . . . . . . . . . . . . . . . . . . 102 7.4 Similarity of most similar . . . . . . . . . . . . . . 106 7.4.1 Mismatches . . . . . . . . . . . . . . . . . 108 7.5 On the origin of similarity . . . . . . . . . . . . . 109 8 Results 115 8.1 Quality of questions: cumulative analogy vs. a baseline . . . . . . . . 116 8.2 Comparison of the resultant knowledge base to the seed knowledge base119 8.3 Classes of knowledge acquired . . . . . . . . . . . . . . . . . . . . . . 122 8 8.3.1 Knowledge classification scheme . . . . . . . . . . . . . . . . . 122 8.3.2 Knowledge collected . . . . . . . . . . . . . . . . . . . . . . . 127 8.3.3 Other knowledge bases . . . . . . . . . . . . . . . . . . . . . . 128 8.4 Rate of contribution to LEARNER . . . . . . . . . . . . . . . . . . . . 132 8.5 User feedback about LEARNER . . . . . . . . . . . . . . . . . . . . . 136 8.5.1 Limitations of cumulative analogy . . . . . . . . . . . . . . . . 139 9 Discussion 143 9.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 9.1.1 Amount of commonsense knowledge . . . . . . . . . . . . . . . 144 9.1.2 Early expert systems . . . . . . . . . . . . . . . . . . . . . . . 147 9.1.3 Forming expectations from existing knowledge . . . . . . . . . 148 9.1.4 Knowledge representation . . . . . . . . . . . . . . . . . . . . 150 9.1.5 Machine learning: concepts and relationships . . . . . . . . . . 151 9.1.6 NLP: text mining and question answering . . . . . . . . . 152 9.1.7 Gathering from contributors over the web . . . . . . . . . 153 9.2 Future W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.2.1 Additional reasoning mechanisms . . . . . . . . . . . . . . . . 157 9.2.2 Better use of collected assertions . . . . . . . . . . . . . . . . 159 9.2.3 Better use of collected answers . . . . . . . . . . . . . . . . . . 160 9.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 A Link Grammar Parser 169 B FramerD 171 C Natural Language Generation 173 D Deriving the Amount of Correlation Due to Chance 177 9 10
Description: