ACQUISITION AND UNDERSTANDING OF PROCESS KNOWLEDGE USING PROBLEM SOLVING METHODS Studies on the Semantic Web Semantic Web has grown into a mature field of research. Its methods find innovative applications on and off the World Wide Web. Its underlying technologies have signi- ficant impact on adjacent fields of research and on industrial applications. This new book series reports on the state of the art in foundations, methods, and applications of Semantic Web and its underlying technologies. It is a central forum for the com- munication of recent developments and comprises research monographs, textbooks and edited volumes on all topics related to the Semantic Web. www.semantic-web-studies.net Editor-in-Chief: Pascal Hitzler Editorial Board: Fausto Giunchiglia, Carole Goble, Asunción Gómez Pérez, Frank van Harmelen, Riichiro Mizoguchi, Mark Musen, Daniel Schwabe, Steffen Staab, Rudi Studer Volume 007 - José Manuel Gómez-Pérez, Acquisition and Understanding of Process Knowledge using Problem Solving Methods Publications Vol. 001 Stephan Grimm, Semantic Matchmaking with Nonmonotonic Description Logics Vol. 002 Johanna Völker, Learning Expressive Ontologies Vol. 003 Raúl García Castro, Benchmarking Semantic Web Technology Vol. 004 Daniel Sonntag, Ontologies and Adaptivity in Dialogue for Question Answering Vol. 005 Rui Zhang, Relation Based Access Control Vol. 006 Jens Lehmann, Learning OWL Class Expressions (This book is also vol. XXII in the “Leipziger Beiträge zur Informatik“ series) Acquisition and Understanding of Process Knowledge using Problem Solving Methods José Manuel Gómez-Pérez Intelligent Software Components (iSOCO) S.A. Madrid, Spain José Manuel Gómez-Pérez Intelligent Software Components (iSOCO) S.A. 10 Pedro de Valdivia 28006 Madrid Spain [email protected] Bibliographic Information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie. Detailed bibliographic data are available on the Internet at http://dnb.d-nb.de. Publisher Distribution Akademische Verlagsgesellschaft Herold AKA GmbH Auslieferung und Service GmbH P.O. Box 10 33 05 Raiffeisenallee 10 69023 Heidelberg 82041 Oberhaching (München) Germany Germany Tel.: 0049 (0)6221 21881 Fax: 0049 (0)89 6138 7120 Fax: 0049 (0)6221 167355 [email protected] [email protected] www.aka-verlag.com © 2010, Akademische Verlagsgesellschaft AKA GmbH, Heidelberg All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior permission from the publisher. Reproduced from PDF supplied by the author Printer: buchbücher.de gmbh, Birkach Printed in Germany ISSN 1868-1158 ISBN 978-3-89838-639-5 (AKA) ISBN 978-1-60750-XXX-X (IOS Press) To my family and friends Special thanks to Oscar Corcho and Richard Benjamins, for their support and guidance. vi List of figures Figure 1: Process description languages ............................................................................................... 12 Figure 2: Process knowledge lifecycle .................................................................................................. 15 Figure 3: Sample Chemistry question ................................................................................................... 27 Figure 4: Schema of the platform-independent domain analysis ......................................................... 28 Figure 5: Distribution of verbs per domain........................................................................................... 29 Figure 6: Schema of the platform-specific knowledge engineering analysis ......................................... 30 Figure 7: Per-domain distribution of knowledge types......................................................................... 31 Figure 8: Average distribution of knowledge types .............................................................................. 31 Figure 9: Conceptual diagram of the process metamodel entities ........................................................ 33 Figure 10: TMDA modelling framework. .............................................................................................. 37 Figure 11: Adapted TMDA modelling framework for the process knowledge type (PCS)...................... 38 Figure 12: Distribution of process occurrences ..................................................................................... 39 Figure 13: Sample process identification and abstraction .................................................................... 40 Figure 14: PSM library for process modelling: main categories ............................................................ 41 Figure 15: Distribution of the process syllabus across PSM library methods ........................................ 42 Figure 16: PCS category Join ................................................................................................................. 44 Figure 17: PSM compare & interpret .................................................................................................... 44 Figure 18: PSM form by combination ................................................................................................... 45 Figure 19: PSM form by aggregation ................................................................................................... 45 Figure 20: PSM neutralize .................................................................................................................... 46 Figure 21: PCS category Split ................................................................................................................ 46 Figure 22: PSM consume ...................................................................................................................... 47 Figure 23: PSM decompose .................................................................................................................. 47 Figure 24: PSM decompose & combine ................................................................................................ 48 Figure 25: PSM replicate ...................................................................................................................... 49 Figure 26: PCS category Modify............................................................................................................ 49 Figure 27: PSM transform .................................................................................................................... 50 Figure 28: PSM situate & combine ....................................................................................................... 51 Figure 29: PSM balance ........................................................................................................................ 51 Figure 30: PCS category Locate ............................................................................................................ 52 Figure 31: PSM situate ......................................................................................................................... 52 Figure 32: PSM oscillate ....................................................................................................................... 53 Figure 33: PSM rearrange .................................................................................................................... 53 Figure 34: PSM accumulate & consume ............................................................................................... 54 Figure 35: Process Modelling in DarkMatter ........................................................................................ 55 Figure 36: Process metamodel and domain entity ............................................................................... 56 Figure 37: Process Validation ............................................................................................................... 59 Figure 38: Pre and post states of sample atomic action ....................................................................... 62 Figure 39: A muscle contraction process .............................................................................................. 64 Figure 40: Process module hierarchy ................................................................................................... 69 Figure 41: Correlation between classes of actions in the process metamodel and the rule types of the process KR&R formalism ............................................................................................................. 71 Figure 42: Rule for estimation of jump length ...................................................................................... 74 Figure 43: Rule implementing predicate enough_energy_for_contraction........................................... 75 Figure 44: Break down of an iterative action into a succession of atomic actions ................................ 77 Figure 45: Subclasses of transition rules associated to iterative actions .............................................. 77 Figure 46: Prime catalogue method interaction view ........................................................................... 93 Figure 47: Prime catalogue method knowledge flow view ................................................................... 93 Figure 48: Prime catalogue method decomposition view ..................................................................... 95 Figure 49: The KOPE PSM metamodel .................................................................................................. 96 Figure 50: Overall KOPE architecture ................................................................................................... 97 Figure 51: PASOA interaction p-assertion data model.......................................................................... 99 vii Figure 52: The PSM-driven process matching algorithm .................................................................... 101 Figure 53: A twig join example in KOPE .............................................................................................. 102 Figure 54: Breakdown of process resources and relations in their main types ................................... 108 Figure 55: Overall distribution of the PSM library .............................................................................. 110 Figure 56: SME-rated utility of processes ........................................................................................... 112 Figure 57: Brain atlas workflow. ........................................................................................................ 116 Figure 58: Brain Atlas domain ontology. ............................................................................................ 117 Figure 59: Catalogue PSM library roles. ............................................................................................. 117 Figure 60: Analysis of the brain atlas creation in terms of the prime catalogue method .................... 120 Figure 61: Precision and recall per abstraction level of the prime catalogue method ........................ 120 viii List of Tables Table 1: An example of a group of questions related to the verb rank, with their correct answers, possible justifications and references to the knowledge required to solve them ........................ 29 Table 2: An example of a set of questions with their corresponding verb and the type of task (knowledge type) to be performed in order to solve them ......................................................... 30 Table 3: Process resources, actions, and conditional forks ................................................................... 35 Table 4: Allowed relations between process entities .......................................................................... 57 Table 5: Types of process rules per kind of process action ................................................................... 70 Table 6: Summary of Physics knowledge bases .................................................................................. 105 Table 7: Summary of Biology knowledge bases .................................................................................. 106 Table 8: Summary of Chemistry knowledge bases ............................................................................. 106 Table 9: Summary of the process knowledge type ............................................................................. 107 Table 10: Occurrences of process metamodel entities ....................................................................... 107 Table 11: PSMs per process................................................................................................................ 109 Table 12: Issues raised by SMEs about processes in the different domains ........................................ 110 Table 13: SUS scores per SME and domain ......................................................................................... 111 Table 14: OntoBroker reasoning configurations ................................................................................. 113 Table 15: C1 and C2 compared with standard C0 ............................................................................... 114 ix Contents List of figures ............................................................................................................................... vi List of Tables .............................................................................................................................. viii 1. Introduction .......................................................................................................................... 1 2. State of the Art...................................................................................................................... 5 2.1. The Knowledge Acquisition Bottleneck ...................................................................... 5 2.2. From Mining to Modelling: The Knowledge Level ..................................................... 5 2.3. Ontologies and Problem Solving Methods in the Knowledge Acquisition Modelling Paradigm ................................................................................................................................... 7 2.4. Knowledge Acquisition by Knowledge Engineers ...................................................... 8 2.5. Knowledge Acquisition by Subject Matter Experts ..................................................... 9 2.6. Process Knowledge and Subject Matter Experts ....................................................... 11 2.7. The Process Knowledge Lifecycle............................................................................. 14 2.8. Conclusions ................................................................................................................ 15 3. Work Objectives ................................................................................................................. 17 3.1. Goals and Open Research Problems .......................................................................... 17 3.2. Contributions to the State of the Art .......................................................................... 19 3.3. Work Assumptions, Hypotheses, and Restrictions ..................................................... 20 4. Acquisition of Process Knowledge by SMEs ..................................................................... 24 4.1. Introduction ................................................................................................................ 24 4.1.1. Knowledge Acquisition and Formulation by SMEs in the Halo Project ................... 26 4.2. Knowledge Types in Scientific Disciplines ............................................................... 27 4.2.1. Domain Analysis ........................................................................................................ 28 4.2.2. A Comprehensive Set of Knowledge Types in Scientific Disciplines ....................... 30 4.3. The Process Metamodel ............................................................................................. 32 4.3.1. Process Entities in the Process Metamodel ................................................................ 33 4.4. Problem Solving Methods for the Acquisition of Process Knowledge ..................... 35 4.4.1. A PSM Modelling Framework for Processes ............................................................. 36 4.4.2. A Method to Build a PSM Library of Process Knowledge ........................................ 39 4.4.3. A PSM Library for the Acquisition of Process Knowledge....................................... 41 4.5. Enabling SMEs to Formulate Process Knowledge .................................................... 54 4.5.1. The DarkMatter Process Editor ................................................................................. 55 4.6. Related Work ............................................................................................................. 59 5. Representing and Reasoning with SME-authored Process Knowledge ............................. 61 5.1. A Formalism for Representing and Reasoning with Process Knowledge ................. 61 5.2. F-logic as Process Representation and Reasoning Language .................................... 65 5.3. The Process Frame ..................................................................................................... 67 5.4. Code Generation for Process Knowledge .................................................................. 69 Synthesis of precedence rules for data flow management .................................................. 75 5.5. Code Synthesis for Iterative Actions.......................................................................... 76 5.6. Soundness and Completeness of Process Models ...................................................... 79 5.7. Optimization of the Synthesized Process Code ......................................................... 81 5.8. Reasoning with Process Models ................................................................................ 83 6. Analysis of Process Executions by SMEs .......................................................................... 89 6.1. Towards Knowledge Provenance in Process Analysis .............................................. 89 6.2. Problem Solving Methods for the Analysis of Process Executions ........................... 92 6.3. A Knowledge-oriented Provenance Environment ..................................................... 96 6.4. An Algorithm for Process Analysis Using PSMs ...................................................... 99 7. Evaluation ......................................................................................................................... 103 x 7.1. Evaluation of the DarkMatter Process Component for Acquisition of Process Knowledge by SMEs ............................................................................................................ 103 7.1.1. Evaluation Syllabus ................................................................................................. 103 7.1.2. Distribution of the Formulated Processes across the Evaluation Syllabus .............. 105 7.1.3. Utilization of the PSM Library and Process Metamodel ......................................... 107 7.1.4. Usage Experience of the SMEs with the Process Editor ......................................... 110 7.1.5. Performance Evaluation of the Process Component ................................................ 113 7.2. Evaluation of KOPE for the Analysis of Process Executions by SMEs .................. 114 7.2.1. Evaluation Settings .................................................................................................. 115 7.2.2. Evaluation Metrics ................................................................................................... 117 7.2.3. Evaluation Results ................................................................................................... 119 7.3. Evaluation Conclusions ........................................................................................... 121 8. Conclusions and Future Research..................................................................................... 127 8.1. Conclusions .................................................................................................................... 127 8.2. Future Research Problems ............................................................................................. 129 REFERENCES .......................................................................................................................... 133 Appendix. Sample F-logic Code for a Process Model .............................................................. 142