FLEXIBLE TAXONOMIES AND ONTOLOGY EXTRACTION FROM TEXT IN THE CONTEXT OF A HELPDESK APPLICATION Vivek Nair w1110831 5/3/2011 Supervisor: Dr. Epaminondas Kapetanios This report is submitted in partial fulfilment of the requirements for the BSc (Hons) Computer Science Degree at the University of Westminster Abstract The final year project will consist of a system that incorporates the concept of a flexible taxonomy and ontology extraction from text. In essence, a system that can crawl through text data such as problem reports, and attempts to classify extract taxonomies, probably in a tree like structure, such that, once extracted, the appropriate expert will get a quick overview of the thematic areas of the problems reported, discussed or resolved. The intention of the system is to manage the flow of communication between problem reporters and experts, as effective as possible, so that the right experts can be allocated to the right type of problems as quickly as possible. 2 | P age Table of Contents 1 Introduction .................................................................................................................................... 6 1.1 Motivation ............................................................................................................................... 6 1.2 Aims and Objectives ................................................................................................................ 7 2 Background ..................................................................................................................................... 9 2.1 Helpdesk .................................................................................................................................. 9 2.2 Ontology and Taxonomy ....................................................................................................... 11 2.3 Software ................................................................................................................................ 13 2.4 Hardware .............................................................................................................................. 22 2.5 Summary ............................................................................................................................... 22 3 Requirements Specification .......................................................................................................... 23 3.1 Functional Requirements ...................................................................................................... 23 4 Design ............................................................................................................................................ 26 4.1 System Name and Logo Design ............................................................................................. 26 4.2 System Architecture .............................................................................................................. 26 4.3 Database Design .................................................................................................................... 27 4.4 Web Application Design ........................................................................................................ 30 4.5 Front End Design ................................................................................................................... 34 4.6 Web Application Structure .................................................................................................... 35 4.7 Summary ............................................................................................................................... 35 5 Implementation ............................................................................................................................ 36 5.1 Front End ............................................................................................................................... 36 5.2 Web Application .................................................................................................................... 38 5.3 Database Implementation .................................................................................................... 39 5.4 System Operation ................................................................................................................. 41 5.5 Summary ............................................................................................................................... 54 6 Testing ........................................................................................................................................... 55 6.1 Black Box Testing .................................................................................................................. 55 7 Evaluation ..................................................................................................................................... 66 7.1 System Aims Evaluation ........................................................................................................ 66 7.2 System Requirements Evaluation ......................................................................................... 67 8 Conclusion ..................................................................................................................................... 74 3 | P age 8.1 Review of Aims ...................................................................................................................... 74 8.2 Revisions to the design & implementation ........................................................................... 74 8.3 Further Work ......................................................................................................................... 75 8.4 Summary ............................................................................................................................... 76 Acknowledgments ................................................................................................................................. 77 References ............................................................................................................................................ 78 Appendices ............................................................................................................................................ 81 Appendix A: NLP Test ........................................................................................................................ 81 Appendix B: Database Entities Schema ............................................................................................ 86 Appendix C: Front End Mock-up ....................................................................................................... 87 Appendix D: Server Methods ............................................................................................................ 88 Appendix E: Page Methods ............................................................................................................... 89 Appendix F: Front End Methods ....................................................................................................... 93 Appendix G: Testing Ontology .......................................................................................................... 94 Appendix H: Files Directory ............................................................................................................... 96 Appendix I: Project Proposal ............................................................................................................. 99 4 | P age List of Figures Figure 1: General Helpdesk Process ........................................................................................................ 9 Figure 2: Summary Table of the types of support ................................................................................ 10 Figure 3: Web Services Summary ......................................................................................................... 12 Figure 4: Summary of Rich Text Editors [12] ......................................................................................... 14 Figure 5: Comparison of Potential Languages ...................................................................................... 14 Figure 6: Summary of Potential Databases [13] ................................................................................... 16 Figure 7: Sample JSON document ......................................................................................................... 17 Figure 8: Sample relational database model ........................................................................................ 18 Figure 9:Waterfall Model Adapted, source: [26] .................................................................................. 19 Figure 10: Spiral Model, source: [23] .................................................................................................... 20 Figure 11: Agile Development, source: [25] ......................................................................................... 21 Figure 12: Logo for System ................................................................................................................... 26 Figure 13: System Architecture ............................................................................................................. 26 Figure 14: Pipes and Filter process of Amekhania ................................................................................ 27 Figure 15: High level entity design ........................................................................................................ 28 Figure 16 High level entity design ......................................................................................................... 29 Figure 17:Process of Managing Communication .................................................................................. 30 Figure 19: Process of Ontology Extraction ............................................................................................ 31 Figure 18: Process Diagram for Ontology Extraction ............................................................................ 31 Figure 20: Mock-up of taxonomy integrated into search facility ......................................................... 32 Figure 21: Process of directing call to category .................................................................................... 33 Figure 22: Mock-up’s of front end for user ........................................................................................... 34 Figure 23: Web Application Structure................................................................................................... 35 Figure 24: default web page being the login screen ............................................................................. 36 Figure 25: Validation Table ................................................................................................................... 37 Figure 26: Example of Database Entity ................................................................................................. 39 Figure 27: Sequence diagram................................................................................................................ 40 Figure 28: Login/Registration Test Cases .............................................................................................. 56 Figure 29: Login/Registration Validation Testing .................................................................................. 58 Figure 30: Expert Test Cases ................................................................................................................. 61 Figure 31: User Test Cases .................................................................................................................... 64 Figure 32: Front End Test Cases ............................................................................................................ 65 5 | P age 1 Introduction An important factor in any organization in an information age is the ability to support itself technically, this is where technical support/helpdesk departments have a role, providing help and assisting in the resolution of technical issues. The main ways in which helpdesk are contacted are via email or via web applications that provide functionality that allow both the user and support to communicate to solve the problem. 1.1 Motivation The motivation for this project was derived from my working experience in a commercial environment; as part of my industrial placement year I was employed as an application support engineer, where part of the role involved supporting helpdesk, to help resolve issues with in-house developed software. To clarify, this would mean on a daily basis I was involved in fixing bugs and implementing patches. The helpdesk application I was using was an in-house written ASP.Net web application, where the general helpdesk process should there be a “technical” problem would require an employee, or “problem reporter” to e-mail a mailbox, the web application would then pick this up and display the contents of the email on to a web page on the application. A member of the helpdesk would check the webpage for this email or “call” [2], and would log it into its correct category. Although quite a medial task, it was necessary so that the team could work on calls specific to their skill sets, however I feel that this is where my project could potentially add value because I am demonstrating the practical application of semantic web technology. Also from a technical point of view, semantic web technology which is the main concept underlying my project, is part next phase of the “web 3.0” era so investigating and trying to make use of new concepts and then to try and implement these new technologies is very exciting. The hope is someday my application can provide a blueprint to show how communication can be managed effectively via the use of semantic web technology. 6 | P age 1.2 Aims and Objectives 1.2.1 Aims Provide a platform in which users can communicate a technical problem such that it can be addressed by the appropriate expert(s) or department A user may come across a problem with a specific application they are using, and there could be many technical people available but in order to solve the problem it would need to be directed to the most appropriate expert. This could require many levels of conversation between different members within the technical department, and is ultimately counter-productive because this time could be spent on other valuable tasks. Directing the problems automatically to the person who may have the most extensive experience, so that they can fix the bug or problem is a much more effective method. Provide functionality such that the problem reports can get automatically logged and directed to the appropriate expert with the use of ontology extraction. In order for the problem report to be directed to the appropriate expert, the problem report will need to be analyzed for keywords and concepts with the use of natural language processing tools, the application can then attempt to match the skills of an expert or group of experts to the problem report so that they can deal with the problem. Provide a dynamic view of the problematic areas in the form of a tree structure. Problems will be categorised and grouped; the aim is that if the problem has been diverted successfully to the appropriate expert, you can categorize the report by the keywords used and use information about the user and the expert it has been directed to. The problems can then be categorised and an overall diagrammatical representation of the thematic areas of problems can be reported. 1.2.2 Objectives In order to meet the aims, below are the objectives of the project: 1. Produce a helpdesk web application, whereby the user and expert can communicate the problem via this platform 2. Make use of existing ontology extract technologies (possibly Alchemy API or Open Calais), or create own extract technology, to help direct the problem to an appropriate expert 7 | P age 3. Make the web application searchable, so users can search for similar or past problems 4. Provide a summary page which provides an overview of the problematic areas 5. Provide a statistical summary of performance of the expert dealing with problem. 8 | P age 2 Background This chapter discusses the topic of ontology and taxonomy extraction, and evaluates existing natural language processing tools that could potentially be used for the project. It will also discuss software and hardware technologies that are suitable for the proposed system. 2.1 Helpdesk 2.1.1 What is helpdesk? The “Helpdesk” [1] is a department that provides information and helps to resolve issues; it is the first port of call where an employee has a technical problem, and they inform the helpdesk of the problem, illustrated in figure 1 the general process: 1) User Reports 2)Identify and Problem log the problem 4) Try resolve and 3) Analyze the communicate problem back to the user Figure 1: General Helpdesk Process 1. User reports the problem via phone or email; they describe the problem they are facing. 2. A member of helpdesk attempts to identify the problem by reading and extracting information about it, then make a decision as to categorizing the problem 3. Once the problem has been identified and logged to the appropriate department or person, they analyze the problem. 4. Once analyzed, they can then attempt to resolve the problem and communicate that the problem has been solved. 9 | P age In the report I will refer to: The “user” being the entity that reports the problem The “expert” being the one that solves the problem The “helpdesk call” or “call” will be the email and/or problem report. 2.1.2 Helpdesk Categories Helpdesk calls will fall into different categories depending on what has been reported as a problem, in this project I will define three typical categories of support; these are desktop support, application support, and infrastructure support. Support Responsibilities Desktop Provide support for desktop applications. Resolve hardware/software issues with laptops/handheld devices. Application Provides support for in-house applications. Infrastructure Provide support for infrastructure related issues Figure 2: Summary Table of the types of support In figure 2, the responsibilities of each department/categories have been defined; it is on these definitions of categories that a judgment can be made defining the category a helpdesk call. 10 | P age