ebook img

GATE, JAPE, ANNIE PDF

57 Pages·2008·0.23 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview GATE, JAPE, ANNIE

Information Extraction - GATE, JAPE, ANNIE - Presentation for the advanced seminar „Endliche Automaten“ (PD Dr. Karin Haenelt) Sommersemester 2008 Universität Heidelberg 23.06.2008 Ching-Yi Sabrina Lin, Shajy Valiath, Torsten Hopp Outline 1. Introduction • What is information extraction? 2. GATE • Architecture, Design Goals • Functionality 3. JAPE • Functionality • Examples 4. ANNIE • Components and how they work • Walk Trough Example 5. Summary 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 2 ► Introduction What is Information Extraction? Definition (Wikipedia.org): In natural language processing, information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information […] from unstructured machine-readable documents. □ Domain-specific information from free text □ Searching and structuring □ Filtering of irrelevant information Reference: [Wik08a] 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 3 ► Introduction What is Information Extraction? □ What is relevant? (cid:1) Predefined by domain specific lexicons or rules □ Core functionality of a IE-system (cid:1) Input: • Specification of the type of relevant information (templates) (cid:2) Set of attributes • Set of free text documents (cid:1) Output: Set of instantiated attributes (cid:2) • filled with identified and normalized text fragments Reference: [Neu01] 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 4 Outline 1. Introduction • What is information extraction? 2. GATE • Architecture, Design Goals • Functionality 3. JAPE • Functionality • Examples 4. ANNIE • Components and how they work • Walk Trough Example 5. Summary 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 5 GATE □ General Architecture for Text Engineering □ Framework + graphical development environment □ Current Version: 4.0 (July 2007) Reference: [Cun+02] 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 6 ► GATE Design Goals □ Separate low-level-tasks from language processing algorithms and structures □ Automating performance-measurement of language processing components □ Providing standard mechanisms to communicate data about language using standards (cid:1) Java, XML □ Providing baseline set of language processing components Reference: [Cun+02] 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 7 ► GATE Architecture □ Architecture: (cid:1) Defines organisation of LE system (cid:1) Ensures component interactions □ Framework: (cid:1) Reusable design for LE software (cid:1) Prefabricated components □ Development environment: (cid:1) Helps users building LE systems (cid:1) Debugging mechanisms Reference: [Cun+02] 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 8 ► GATE Components □ Language Resources (LR) (cid:1) lexicons, corpora, ontologies □ Processing Resources (PR) (cid:1) algorithms, e.g. parsers, generators, ngram modellers □ Visual Resources (VR) (cid:1) visualization + editing (GUI) Reference: [Cun+02] 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 9 ► GATE Creole □ Collection of REusable Objects for Language Engineering (cid:1) repository XML-File (Name, implementing class, parameters, icons) (cid:1) searched by framework to discover available ressources Reference: [Cun+02] 23.06.2008 Hauptseminar Endliche Automaten, SS 2008 10

Description:
ANNIE. • Components and how they work. • Walk Trough Example. 4. Summary. 5. Introduction ▻What is Information Extraction? In natural
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.