University of Sheffield NLP Module 2: Introduction to IE and ANNIE © The University of Sheffield, 1995-2010 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. University of Sheffield NLP About this tutorial This tutorial comprises the following topics: Introduction to IE ANNIE Multilingual tools in GATE Evaluation and Corpus Quality Assurance In Module 3, you’ll learn how to use JAPE, the pattern matching language that many PRs use University of Sheffield NLP Tutorial outline 09:45– What is information extraction? 11:15 Examples of IE systems ANNIE Basic lexico-syntactic PRs 11:15– BREAK 11:45 11:45– Gazetteers, transducers, coreference 13:15 Modifying ANNIE Multilingual IE 13:15– LUNCH 14.15 Evaluation 14:15– Annotation Diff 15:45 Corpus Quality Assurance 15:45– COFFEE 16:15 16:15– INVITED TALK – GATE for Lifesciences 17:15 University of Sheffield NLP What is information extraction? University of Sheffield NLP IE is not IR • IR pulls documents from large text collections (usually the Web) in response to specific keywords or queries. You analyse the documents. • IE pulls facts and structured information from the content of large text collections. You analyse the facts. University of Sheffield NLP IE for Document Access With traditional query engines, getting the facts can be hard and slow Where has the Queen visited in the last year? Which airports are currently closed due to the volcanic ash? Which search terms would you use to get thess? How can you specify you want to see someone’s home page? IE returns information in a structured way IR returns documents containing the relevant information somewhere University of Sheffield NLP IE as an alternative to IR IE returns knowledge at a much deeper level than traditional IR It allows you to specify your query in a more structured way Constructing a database through IE and linking it back to the documents can provide a valuable alternative search tool Even if results are not always accurate, they can be valuable if linked back to the original text University of Sheffield NLP What is IE used for? IE is an enabling technology for many other applications: Text Mining Semantic Annotation Question Answering Opinion Mining Decision Support Rich information retrieval and exploration and so on.. University of Sheffield NLP Two main types of IE systems Knowledge Engineering Learning Systems rule based use statistics or other machine learning developed by experienced language engineers developers do not need LE expertise make use of human intuition require large amounts of annotated training data require only small amount of training data some changes may require re-annotation of the development can be very entire training corpus time consuming some changes may be hard to accommodate University of Sheffield NLP Named Entity Recognition: the cornerstone of IE Traditionally, NE is the identification of proper names in texts, and their classification into a set of predefined categories of interest Person Organisation (companies, government organisations, committees, etc) Location (cities, countries, rivers, etc) Date and time expressions Various other types are frequently added, as appropriate to the application, e.g. newspapers, ships, monetary amounts, percentages etc.
Description: