Table Of Content

Paul R. Cohen Empirical Methods for Artificial Intelligence The MIT Press Cambridge, Massachusetts London, England Contents Preface ix Acknowledgments xv Empirical Research 1 1.1 AI Programs as Objects of Empirical Studies 2 1.2 Three Basic Research Questions 4 1.3 Answering the Basic Research Questions 5 1.4 Kinds of Empirical Studies 6 1.5 A Prospective View of Empirical Artificial Intelligence Exploratory Data Analysis 11 2.1 Data 12 2.2 Sketching a Preliminary Causal Model 18 2.3 Looking at One Variable 20 2.4 Joint Distributions 27 2.5 Time Series 55 2.6 Execution Traces 62 Basic Issues in Experiment Design 67 3.1 The Concept of Control 68 3.2 Four Spurious Effects 79 3.3 Sampling Bias 89 vj Contents 3.4 The Dependent Variable 92 3.5 Pilot Experiments 94 3.6 Guidelines for Experiment Design 96 3.7 Tips for Designing Factorial Experiments 97 3.8 The Purposes of Experiments 100 3.9 Ecological Validity: Making Experiments Relevant 101 3.10 Conclusion 103 Hypothesis Testing and Estimation 105 4.1 Statistical Inference 106 4.2 Introduction to Hypothesis Testing 106 4.3 Sampling Distributions and the Hypothesis Testing Strategy 110 4.4 Tests of Hypotheses about Means 117 4.5 Hypotheses about Correlations 130 4.6 Parameter Estimation and Confidence Intervals 132 4.7 How Big Should Samples Be? 137 4.8 Errors 140 4.9 Power Curves and How to Get Them 143 4.10 Conclusion 145 4.11 Further Reading 146 Computer-Intensive Statistical Methods 147 5.1 Monte Carlo Tests 150 5.2 Bootstrap Methods 153 5.3 Randomization Tests 165 5.4 Comparing Bootstrap and Randomization Procedures 175 5.5 Comparing Computer-Intensive and Parametric Procedures 177 5.6 Jackknife and Cross Validation 180 5.7 An Illustrative Nonparametric Test: The Sign Test 180 5.8 Conclusion 182 5.9 Further Reading 183 vii Contents Performance Assessment 185 6.1 Strategies for Performance Assessment 186 6.2 Assessing Performance in Batches of Trials 187 6.3 Comparisons to External Standards: The View Retriever 187 6.4 Comparisons among Many Systems: The MUC-3 Competition 199 6.5 Comparing the Variability of Performance: Humans vs. the View Retriever 205 6.6 Assessing Whether a Factor Has Predictive Power 207 6.7 Assessing Sensitivity: MYCIN'S Sensitivity to Certainty Factor Accuracy 208 6.8 Other Measures of Performance in Batches of Trials 210 6.9 Assessing Performance During Development: Training Effects in OTB 211 6.10 Cross-Validation: An Efficient Training and Testing Procedure 216 6.11 Learning Curves 219 6.12 Assessing Effects of Knowledge Engineering with Retesting 221 6.13 Assessing Effects with Classified Retesting: Failure Recovery in Phoenix 223 6.14 Diminishing Returns and Overfitting in Retesting 232 6.15 Conclusion 233 Appendix: Analysis of Variance and Contrast Analysis 235 Explaining Performance: Interactions and Dependencies 249 7.1 Strategies for Explaining Performance 250 7.2 Interactions among Variables: Analysis of Variance 251 7.3 Explaining Performance with Analysis of Variance 260 7.4 Dependencies among Categorical Variables: Analysis of Frequencies 267 7.5 Explaining Dependencies in Execution Traces 268 7.6 Explaining More Complex Dependencies 270 7.7 General Patterns in Three-Way Contingency Tables 279 7.8 Conclusion 287 viii Contents 7.9 Further Reading 287 Appendix: Experiment Designs and Analysis 287 8 Modeling 309 8.1 Programs as Models: Executable Specifications and Essential Miniatures 312 8.2 Cost as a Function of Learning: Linear Regression 316 8.3 Transforming Data for Linear Models 321 8.4 Confidence Intervals for Linear Regression Models 324 8.5 The Significance of a Predictor 327 8.6 Linear Models with Several Predictors: Multiple Regression 328 8.7 A Model of Plan Adaptation Effort 332 8.8 Causal Models 337 8.9 Structural Equation Models 342 8.10 Conclusion 347 8.11 Further Reading 347 Appendix: Multiple Regression 348 9 Tactics for Generalization 359 9.1 Empirical Generalization 362 9.2 Theories and "Theory" 366 9.3 Tactics for Suggesting and Testing General Theories 369 9.4 Which Features? 375 9.5 Finding the "Same" Behavior in Several Systems 376 9.6 The Virtues of Theories of Ill-Defined Behavior 378 References 385 Index 395 Preface When it was proclaimed thalthe Library contained all books, the first impression was one ofextravagant happiness. All men felt themselves to be the masters of an intact and secret treasure.... As was natural, this inordinate hope was followed by an excessive depression. The certitude thot some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable. A blasphemous sect suggested that the searches should cease and that all men should juggle letters and symbols until they had constructed, by an improboble gift ofchance, these canonical books. The authorities were obliged to issue severe orders. The sect disappeared, but in my childhood 1 have seen old men who, for long periods oftime, would hide in latrines with some metal disks in a forbidden dice cup andfeebly mimic the divine disorder. -Jorge Luis Borges, "The Tower ofBabel," from Labyrinths One writes a book for many reasons, some quite irrational. I will admit to three: First, I wanted a book on research methods because we have no curriculum in methods as other sciences do. My metaphor for this aspect of the book is the toolbox. Here are exploratory tools to help your eyes detect patterns in data, hypothesis-testing tools to help your data speak convincingly, and modeling tools to help you explain your data. Second, because our systems are increasingly embedded, complex, and sophisticated, we need a basis for designing new, more powerful research methods. I hope this book will convince you that statistics is one such basis. My metaphor for this aspect of the book is an imaginary class of "statistical microscopes" that disclose structures and behaviors in fairly complex systems. Third, it is time to revise some classical views of empirical artificial intelligence (AI) and to devote ourselves anew to others. For instance. it is no longer true that we can predict how a system will behave by looking at its code (unless it's very small); even if we could, let's remember that artificial intelligence once studied individual systems not for their own sake but in pursuit of general laws of intelligence. This goal has been maligned in the last two decades l f'Preface When it was proclaimed that the Ubrary contained all books, the first impression was one ofextravagant happiness. All men felt themselves to be the masters of an intact and secret treasure.... As was natural, this inordinate hope was followed by an excessive depression. The certitude thot some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable. A blasphemous sect suggested that the searches should cease and that all men should juggle letters and symbols until they had constructed, by an improbable gift ofchance, these canonical books. The authorities were obliged to issue severe orders. The sect disappeared, but in my childhood 1 have seen old men who, for long periods oftime, would hide in latrines with some metal disks in aforbidden dice cup andfeebly mimic the divine disorder. -Jorge Luis Borges, "The Tower ofBabel," from Labyrinths One writes a book for many reasons, some quite irrational. I will admit to three: First, I wanted a book on research methods because we have no curriculum in methods as other sciences do. My metaphor for this aspect of the book is the toolbox. Here are exploratory tools to help your eyes detect patterns in data, hypothesis-testing tools to help your data speak convincingly, and modeling tools to help you explain your data. Second, because our systems are increasingly embedded, complex, and sophisticated, we need a basis for designing new, more powerful research methods. I hope this book will convince you that statistics is one such basis. My metaphor for this aspect of the book is an imaginary class of "statistical microscopes" that disclose structures and behaviors in fairly complex systems. Third, it is time to revise some classical views of empirical artificial intelligence (AI) and to devote ourselves anew to others. For instance, it is no longer true that we can predict how a system will behave by looking at its code (unless it's very small); even if we could, let's remember that artificial intelligence once studied individual systems not for their own sake but in pursuit of general laws of intelligence. This goal has been maligned in the last two decades x Preface by the empirically inclined, and pursued abstractly by others. I think it's time for empirical researchers to resume the search. This book was intended originally for graduate students, undergraduates, and re searchers in artificial intelligence, but I discovered by teaching the material at the University of Massachusetts that it appeals to students in other areas of computer sci ence as well. This isn't very surprising, as few undergraduates in computer science learn research methods, unlike their counterparts in psychology, chemistry, biology, and so on. I didn't call the book Empirical Methods for Computer Science, though, because most of its case studies are from AI, and some of its methods are particular to AI, and it doesn't include methods particular to other areas of computer science such as queueing models for network analysis. Professors will want to allow one semester or two quarters to cover most of the material in lectures, although I expect it can be done in one quarter if advanced material in the appendixes is omitted. As to prerequisites, the book assumes nothing about the reader; the mathematical material is light and it is developed from first principles. As I prepared the book I came to realize that my own training was a bit warped, emphasizing statistical hypothesis testing to the exclusion of every other aspect of empirical research. Because I want to avoid this mistake here, the book doesn't in troduce hypothesis testing until Chapter 4, when it can be appreciated in the context of the broader empirical enterprise. A researcher should know how to look at data and encourage itto tell its story (Chapter 2) and how to design experiments to clarify and corroborate the story (Chapter 3) before submitting it to the blunt interrogation of hypothesis testing. I decided to present statistical hypothesis testing in two chapters, one devoted to classical, parametric methods (Chapter 4), the other to new, computer intensive statistical methods based on Monte Carlo sampling (Chapter 5). The last four chapters focus on research strategies and tactics, and while each introduces new statis tical methods, they do so in the context of case studies. Mathematical details are con fined to appendixes. Chapter 6 is about performance assessment, and Chapter 7 shows how interactions and dependencies among several factors can help explain perfor mance. Chapter 8 discusses predictive models of programs, including causal models. Finally, Chapter 9 asks what counts as a theory in artificial intelligence, and how can empirical methods-which deal with specific AI systems-foster general theories. Behind every experimental or analytical tool is a scientist who views his or her subject through the lens of a particular collection of beliefs. Some of these concern the subject and some concern the science itself. Among the things I no longer believe about artificial intelligence is that looking at a program tells you how it will behave: Preface Each new program that is built is an experiment. It poses a question to nature, and its behavior offers clues to an answer. Neither machines nor programs are black boxes; they are artifacts that have been designed, both hardware and software; and we can open them up and look inside. We can relate their structure to their behavior and draw many lessons from a single experiment. We don't have to build 100 copies of, say, a theorem prover, to demonstrate statistically that it has not overcome the combinatorial explosion ofsearch in the way hoped for. Inspection ofthe program in the light ofa few runs reveals the flaw and lets us proceed to the next attempt. (Newell and Simon, 1981, p. 36) Much about this influential passage is true in specific situations, but not generally true, as it was in 1975. Although we can open up our artifacts and look inside, we no longer find this an easy way to relate their structure to their behavior. Increasingly we do require 100 repetitions-not of the code itself but of problems in samples to demonstrate statistically that a program behaves as we hope. Increasingly, our characterizations of behavior are statistical, and the structures we induce to explain the behaviors are not programs but influence diagrams and other sorts of statistical models. Although it is true that we don't need statistics to tell us that a program is still crippled by the combinatorial explosion, we are unable to see subtler flaws unaided, and a few runs of the program might never reveal them. Let me relate an example: My students and I built a planner, called Phoenix, which maintains several internal representations of time. One of these representations is a variable called "time-in-seconds." In actuality, this variable stored the elapsed time in minutes, not seconds, because of a programmer's error. The planner worked fine-meaning the error wasn't obvious to anyone--even though some of its estimates were wrong by 'a factor of sixty. This was due in part to Phoenix's failure-recovery abilities: The error would cause a plan to fail, and Phoenix would fix it. Only when we looked at statistical patterns in the execution traces of failure recovery did we discover the error. Relating structure to behavior can no longer be done with the naked eye. Another belief I no longer have is that each new program is an experiment that poses a question to nature. Too often I ask, What is the question?, and receive no answer. Paraphrasing David Etherington, researchers are adept at saying what they are doing, much less so at saying what they are learning. Even so, the program as-experiment view has been influential and reiterated often, recently by Lenat and Feigenbaum (1987, p. 1177). Compared to Nature we suffer from a poverty of the imagination; it is thus much easier for us to uncover than to invent. Premature mathematization keeps Nature's surprises hidden .... This attitude leads to our central methodological hypothesis, our paradigm for AI research: Empirical Inquiry Hypothesis: Intelligence is still so poorly understood that Nature still holds most ofthe important surprises in store for us. So the most profitable way to investigate Al is to embody our hypotheses in programs, and gather data by running the programs. The xii Preface surprises usually suggest revisions that start the cycle over again. Progress depends on these experiments being able to falsify our hypotheses; i.e., these programs must be capable of behavior not expected by the experimenter. The empirical inquiry hypothesis is just that-a hypothesis. I want it to be true, but the evidence isn't encouraging. For example, when I surveyed 150 papers in the Proceedings ofthe Eighth National Conference on Artificial Intelligence (1990), I discovered that only 42 percent of the papers suggested a program had run on more than one example; just 30 percent demonstrated performance in some way; a mere 21 percent framed hypotheses or made predictions. Almost nobody "embodies hypotheses in programs," or "gathers data by running the programs." As to Nature and her surprises, very few papers reported negative or unexpected results. Programs are not experiments, but rather, the laboratory in which experiments are conducted. Questions to nature are answered in the laboratory; building the laboratory and running a few things through itdoes not suffice. The empirical inquiry hypothesis can be true, but first we have to give up the idea that running a program is somehow so irresistible to Nature that she will drop her veils. Which brings me to another belief I no longer have. I usect to think experimental work would never add up to much because each experiment answers a single yes-or no question, and it would take too long to understand anything this way. I was much influenced, as many were, by Allen Newell's "Twenty Questions" argument, which he addressed to a gathering of psychologists: I was going to draw a line on the blackboard and, picking one of the speakers of the day at random, note on the line when he got his PhD and the current time (in mid-career). Then, taking his total production ofpapers like those in the present symposium, I was going to compute a rate ofproductivity of such excellent work. Moving, finally, to the date ofmy chosen target's retirement, I was going to compute the totalfuture addition ofsuch papers to the (putative) end ofthis man's scientific career. Then I was going to pose, in my role as a discussant, a question: Suppose you had all these additional papers . .. where will psychology then be? Will we have achieved a science ofman adequate in power and commensurate with his complexity? And if so, how will this have happened via these papers I have just granted you? Or will we be asking for yet another quota ofpapers in the next dollop oftime? (Newell, 1973, pp. 283-284) Ifthis line ofargument applies equally well to artificial intelligence, then we should not rely unduly on the experiment as our engine ofprogress. But I don't think it necessarily applies to AI, or to psychology for that matter. First, the Twenty Questions argument, as its name implies, is directed to a particular class of empirical methods: statistical testing of mutually exclusive pairs of hypotheses-is system A equal to system R yes or no? Does learning improve performance, yes or no? There is a place for hypothesis-testing in empirical AI (and two chapters ofthis book are devoted to it), but

Description:

Computer science and artificial intelligence in particular have no curriculum in research methods, as other sciences do. This book presents empirical methods for studying complex computer programs: exploratory tools to help find patterns in data, experiment designs and hypothesis-testing tools to he

Empirical Methods for Artificial Intelligence (Bradford Books) PDF

542 Pages·1995·46.76 MB·English

by Paul R. Cohen

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Empirical Methods for Artificial Intelligence (Bradford Books)

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.