Table Of Content

https://ntrs.nasa.gov/search.jsp?R=19880020920 2018-04-30T13:27:34+00:00Z Quality Measures and Assurance for AI Software John Rushby CONTRACT NAS1-17067 OCTOBER 1988 NASA NASA Contractor Report 4187 Quality Measures and Assurance for AI Software John Rushby SRI Internationa 2 MenZo Park, CaZifomia Prepared for Langley Research Center under Contract NAS1-17067 National Aeronautics and Space Administration Scientific and Technical Information Division 1988 Contents 1 Introduction 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Motivation 1 1.2 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 2 I Quality Measures for Conventional Software 3 2 Software Engineering and Software Quality Assurance 5 2.1 Software Quality Assurance . . . . . . . . . . . . . . . . . . . 6 3 Software Reliability 8 3.1 The Basic Execution Time Reliability Model . . . . . . . . . 10 3.2 Discussion of Software Reliability . . . . . . . . . . . . . . . . 17 4 Size. Complexity. and Effort Metrics 19 4.1 Size Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Complexity Metrics . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Measures of Control Flow Complexity . . . . . . . . . 21 4.2.2 Measures of Data Complexity . . . . . . . . . . . . . . 22 4.3 Cost and Effort Metrics . . . . . . . . . . . . . . . . . . . . . 23 4.4 Discussion of Software Metrics . . . . . . . . . . . . . . . . . 26 5 Testing 29 5.1 Dynamic Testing . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1.1 Random Test Selection . . . . . . . . . . . . . . . . . . 30 5.1.2 Regression Testing . . . . . . . . . . . . . . . . . . . . 32 5.1.3 Thorough Testing . . . . . . . . . . . . . . . . . . . . 32 5.1.3.1 Structural Testing . . . . . . . . . . . . . . . 34 5.1.3.2 Functional Testing . . . . . . . . . . . . . . . 36 iii PRECEDING PAGE BLANR NOT FILMED Con tents 5.1.4 Symbolic Execution . . . . . . . . . . . . . . . . . . . 37 5.1.5 Automated Support for Systematic Testing Strategies 38 5.2 Static Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.1 Anomaly Detection . . . . . . . . . . . . . . . . . . . . 40 5.2.2 Structured Walk-Throughs . . . . . . . . . . . . . . . 41 5.2.3 Mathematical Verification . . . . . . . . . . . . . . . . 41 5.2.3.1 Executable Assertions . . . . . . . . . . . . . 44 5.2.3.2 Verification of Limited Properties . . . . . . 45 5.2.4 Fault-Tree Analysis . . . . . . . . . . . . . . . . . . . 46 5.3 Testing Requirements and Specifications . . . . . . . . . . . . 48 5.3.1 Requirements Engineering and Evaluation . . . . . . . 49 5.3.1.1 SREM . . . . . . . . . . . . . . . . . . . . . . 51 5.3.2 Completeness and Consistency of Specifications . . . . 55 5.3.3 Mathematical Verification of Specifications . . . . . . 55 5.3.4 Executable Specifications . . . . . . . . . . . . . . . . 56 5.3.5 Testing of Specifications . . . . . . . . . . . . . . . . . 57 5.3.6 Rapid Prototyping . . . . . . . . . . . . . . . . . . . . 57 5.4 Discussion of Testing . . . . . . . . . . . . . . . . . . . . . . . 58 I1 Application of Software Quality Measures to AI Soft- ware 65 6 Characteristics of AI Software 67 7 Issues in Evaluating the Behavior of AI Software 74 7.1 Requirements and Specifications . . . . . . . . . . . . . . . . 74 7.1.1 Service and Competency Requirements . . . . . . . . 75 7.1.2 Desired and Minimum Competency Requirements . . 75 7.2 Evaluating Desired Competency Requirements . . . . . . . . 76 7.2.1 Model-Based Adversaries . . . . . . . . . . . . . . . . 77 7.2.2 Competency Evaluation Against Human Experts . . . 77 7.2.2.1 Choice of Gold Standard . . . . . . . . . . . 78 7.2.2.2 Biasing and Blinding . . . . . . . . . . . . . 78 7.2.2.3 Realistic Standards of Performance . . . . . 79 7.2.2.4 Realistic Time Demands . . . . . . . . . . . 79 7.2.3 Evaluation against Linear Models . . . . . . . . . . . . 80 7.3 Acceptance of AI Systems . . . . . . . . . . . . . . . . . . . . 82 7.3.1 Identifying the Purpose and Audience of Tests . . . . 83 iv Con tents 7.3.2 Involving the User . . . . . . . . . . . . . . . . . . . . 84 7.3.2.1 Performance Evaluation of AI Software . . . 85 8 Testing of AI Systems 87 8.1 Dynamic Testing . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.1.1 Influence of Conflict-Resolution Strategies . . . . . . . 89 8.1.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . 89 8.1.3 Statistical Analysis and Measures . . . . . . . . . . . . 90 8.1.4 Regression Testing and Automated Testing Support . 91 8.2 Static Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.1 Anomaly Detection . . . . . . . . . . . . . . . . . . . . 92 8.2.2 Mathematical Verification . . . . . . . . . . . . . . . . 99 8.2.3 Structured Walk-Throughs . . . . . . . . . . . . . . . 104 8.2.4 Comprehension Aids . . . . . . . . . . . . . . . . . . . 105 9 Reliability Assessment and Metrics for AI Systems 107 I11 Conclusions and Recommendations for Research 109 10 Conclusions 111 10.1 Recommendations for Research . . . . . . . . . . . . . . . . . 117 Bibliography 119 V Chapter 1 Introduction This report is concerned with the application of software quality and evaluation measures to AI software and, more broadly, with the question of quality assurance for AI software. By AI software we mean software that uses techniques from the field of Artificial Intelligence. (Genesereth and Nilsson ([72]g ive an excellent modern introduction to such techniques; Har- mon and King [83] provide a more elementary overview.) We consider not only metrics that attempt to measure some aspect of software quality, but also methodologies and techniques (such as systematic testing) that attempt to improve some dimension of quality, without necessarily quantifying the extent of the improvement. The bulk of the report is divided into three parts. In Part I we review existing software quality measures-those that have been developed for, and applied to, conventional software. In Part 11, we consider the characteristics of AI software, the applicability and potential utility of measures and techniques identified in the first part, and we review those few methods that have been developed specifically for AI software. In Part I11 of this report, we present our assessment and recommendations for the further exploration of this important area. 1.1 Motivation It is now widely recognized that the cost of software vastly exceeds that of the hardware it runs on-software accounts for 80% of the total computer systems budget of the Department of Defense, for example. Furthermore, as much as 60% of the software budget may be spent on maintenance. Not only does software cost a huge amount to develop and maintain, but vast 2 Chapter 1. Introduction economic or social assets may be dependent upon its functioning correctly. It is therefore essential to develop techniques for measuring, predicting, and controlling the costs of software development and the quality of the software produced. The quality-assurance problem is particularly acute in the case of AI software-which for present purposes we may define as software that per- forms functions previously thought to require human judgment, knowledge, or intelligence, using heuristic, search-based techniques. As Parnas observes [ 1491: “The rules that one obtains by studying people turn out to be inconsistent, incomplete, and inaccurate, Heuristic programs are developed by a trial and error process in which a new rule is added whenever one finds a case that is not handled by the old rules. This approach usually yields a program whose behavior is poorly understood and hard to predict.” Unless compelling evidence can be adduced that such software can be “trusted” to perform its function, then it will not-and should not-be used in many circumstances where it would otherwise bring great benefit. In the following sections of this report, we consider measures and techniques that may provide the compelling evidence required. 1.2 Acknowledgments Alan Whitehurst of the Computer Science Laboratory (CSL) and Leonard Wesley of the Artificial Intelligence Center (AIC) at SRI contributed ma- terial to this report. It is also a pleasure to acknowledge several useful discussions with Tom Garvey of AIC, and the careful reading and criticism of drafts provided by Oscar Firschein of AIC and Teresa Lunt of CSL. The guidance provided by our technical monitors, Kathy Abbott and Wendell Ricks of NASA Langley Research Center, was extremely valuable. Part I Quality Measures for Conventional Software 3 Chapter 2 Software Engineering and Software Quality Assurance Before describing specific quality metrics and methods, we need briefly to review the Software Engineering process, and some of the terms used in Software Quality Assurance. One of the key concepts in modern software engineering is the system life-cycle model. Its premise is that development and implementation are carried out in several distinguishable, sequential phases, each performing unique, well-defined tasks, and requiring different skills. One of the outputs of each phase is a document that serves as the basis for evaluating the outcome of the phase, and forms a guideline for the subsequent phases. The life-cycle phases can be grouped into the following four major classes: Specification comprising problem definition, feasibility studies, system requirements specification, software requirements specification, and pre- liminary design. Development comprising detailed design, coding and unit testing, and the establishment of operating procedures. Implementation comprising integration and test, acceptance tests, and user training. Operation and maintenance. There have been many refinements to this basic model: Royce’s Water- fall model [163],fo r example, recognized the existence of feedback between 5 PRECEDING PAGE BLANR NOT PlLMED 6 Chapter 2. Software Engineering and Software Quality Assurance phases and recommended that such feedback should be confined to adjacent phases. There is considerable agreement that the early phases of the life-cycle are particularly important to the successful outcome of the whole process: Brooks, for example observes [33] “I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representa- tion. We still make syntax errors, to be sure, but they are fuzz compared with the conceptual errors in most systems. “The hardest single part of building a software system is deciding precisely what to build. No other part of the work so cripples the resulting system if done wrong. No other part is more difficult to rectify later.” The more phases of the life-cycle that separate the commission and detection of an error, the more expensive it is to correct. It is usually cheap and simple to correct a coding bug caught during unit test, and it is usually equally simple and cheap to insert a missed requirement that is caught during system requirements review. But it will be ruinously expensive to correct such a missed requirement if it is not detected until the system has been coded and is undergoing integration test. Software Quality Assurance comprises a collection of techniques and guidelines that endeavor to ensure that all errors are caught, and caught early. 2.1 Software Quality Assurance Software Quality Assurance (SQA) is concerned with the problems of ensur- ing and demonstrating that software (or, rather, software-intensive systems) will satisfy the needs and requirements of those who procure them. These needs and requirements may cover not only how well the software works now, but how well documented it is, how easy to fix if it does go wrong, how adaptable it is to new requirements, and other attributes that influence how well it will continue to satisfy the user’s needs in the future. In the case of military procurements, a number of standards have been established to gov- ern the practice of various facets of software development: MIL-S-52779A for software program requirements, DOD-STD-1679A and DOD-STD-2167 for software development, DOD-STD-2168 for software quality evaluation,

Description:

I1 Application of Software Quality Measures to AI Soft- ware. 6 5 . economic or social assets may be dependent upon its functioning correctly. It is therefore This approach usually yields a program whose behavior is .. simple: merely replace wo (the number of faults) by vo (the total number of.

Quality Measures and Assurance for AI Software PDF

141 Pages·2003·6.54 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Quality Measures and Assurance for AI Software

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.