Table Of Content

FEATURE-DRIVEN QUESTION ANSWERING WITH NATURAL LANGUAGE ALIGNMENT by Xuchen Yao A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland July, 2014 To My Mother’s Father Abstract Question Answering (QA) is the task of automatically generating answers to natural language questions from humans, serving as one of the primary research areas in natural language human-computer interaction. This dissertation focuses on En- glish fact-seeking (factoid) QA, for instance: when was Johns Hopkins founded?.1 The key challenge in QA is the generation and recognition of indicative signals for answer patterns. In this dissertation I propose the idea of feature-driven QA, a machine learning framework that automatically produces rich features from linguistic annotations of answer fragments and encodes them in compact log-linear models. These features are further enhanced by tightly coupling the question and answer snippets via monolingual alignment. In this work monolingual alignment helps question answering in two aspects: aligning semantically similar words in QA sentence pairs (with the ability to recognize paraphrases and entailment) and aligning natural language words with knowledge base relations (via web-scale data mining). With the help of modern search engines, database and machine learning tools, the proposed method is able to efficiently search through billions of facts in the web space and optimize from millions of linguistic signals in the feature space. QA is often modeled as a pipeline of the form: question (input) → information retrieval (“search”) → answer extraction (from either text or knowledge base) → answer (output). This dissertation demonstrates the feature-driven approach applied through- 1January 22, 1876 iii Abstract out the QA pipeline: the search front end with structured information retrieval, the answer extraction back end from both unstructured data source (free text) and structured data source (knowledge base). Error propagation in natural language processing (NLP) pipelines is contained and minimized. The final system achieves state-of-the-art performance in several NLP tasks, including answer sen- tencerankingandanswerextractionononeQAdataset, monolingualalignmenton two annotated datasets, and question answering from Freebase with web queries. This dissertation shows the capability of a feature-driven framework serving as the statistical backbone of modern question answering systems. Primary Advisor: Benjamin Van Durme Secondary Advisor: Chris Callison-Burch iv Acknowledgments To my dissertation committee as a whole, Benjamin Van Durme, Chris Callison- Burch, David Yarowsky and Dekang Lin. Thank you for your time and advice. I am very grateful to the following people: Benjamin Van Durme, my primary advisor: Ben admitted me to Hopkins and completely changed my life forever. He wrote me thousands of emails during the course of my study, met with me every week, advised me, helped me, encouraged me and never blamed me a single time for my wrongdoing. He was always there whenever I needed help and he gave me a lot of freedom. Ben is a great person with extraordinary leadership, integrity, fairness, and management skills. I have learned more than enough from him. Thank you, Ben. Chris Callison-Burch, my secondary advisor: Chris is extremely kind and gen- erous with his students. He read the whole dissertation word by word front to back and marked every page with detailed comments. Chris has taught me things beyond research: networking, entrepreneurship, and artistic thinking. Thank you, Chris. Peter Clark, who was my mentor when I interned at Vulcan (now his group is part of the Allen Institute of Artificial Intelligence). Pete is the one who inspired me to do a dissertation on question answering. His group also funded two and a half years of my PhD study. He is such a gentleman with an encouraging and supportive heart. Thank you, Pete. Dekang Lin, who was my mentor when I interned at Google Research on their v Acknowledgments question answering project. Dekang reshaped my mind in problem solving and finding a balanced point between research and industry. He will have a profound impact on how I work in the future, just like what his research has influenced the entire community. Thank you, Dekang. Jason Eisner, for whose Natural Language Processing class I was the teaching assistant at Hopkins for two years. Jason helped me with a deep understanding of log-linear models, which are the statistical backbone of the entire dissertation. He is a great person, an intellectual giant, and he treats everyone equally. Thank you, Jason. David Yarowsky, for whose Information Retrieval class I was the teaching assistant at Hopkins. David’s focus on research novelty and originality heavily influenced this dissertation. He also set a good example that finishing a PhD (with high quality) in less than four years was possible. Thank you, David. Professors and researchers who taught me, mentored me or helped me at grad- uate school: John Wierman, Mark Dredze, Kyle Rawlins, David Chiang, Liang Huang, Adam Lopez, Matt Post, Sanjeev Khudanpur, Paul McNamee, Phil Har- rison, Shane Bergsma, Veselin Stoyanov, and others. Thank you. Colleagues and friends at the Center for Language and Speech Processing and JHU: Adam Teichert, Adithya Renduchintala, Andong Zhan, Ann Irvine, Aric Velbel, Brian Kjersten, Byung Gyu Ahn, Carl Pupa, Cathy Thornton, Chan- dler May, Chunxi Liu, Courtney Napoles, Da Zheng, Darcey Riley, Debbie De- ford, Delip Rao, Ehsan Variani, Feipeng Li, Frank Ferraro, Hainan Xu, Hong Sun, Jason Smith, Jonathan Weese, Juri Ganitkevitch, Katharine Henry, Keisuke Sakaguchi, Matt Gormley, Michael Carlin, Michael Paul, Nanyun Peng, Naomi Saphra, Nicholas Andrews, Olivia Buzek, Omar Zaidan, Pegah Ghahremani, Pe- ter Schulam, Pushpendre Rastogi, Rachel Rudinger, Ruth Scally, Ryan Cotterell, Samuel Thomas, Scott Novotney, Sixiang Chen, Svitlana Volkova, Tim Vieira, Travis Wolfe, Vijayaditya Peddinti, Xiaohui Zhang, and Yiming Wang. Thank vi Acknowledgments you. My very best Chinese friends at Hopkins: Cao Yuan, Chen Guoguo, Huang Shuai, Sun Ming, and Xu Puyang. All together we went through so much in grad school. Thank you. Finally, thank you to my family. I wouldn’t have been myself without your support. vii Contents Abstract iii Acknowledgments v List of Tables xv List of Figures xviii 1. Introduction 1 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2. Main Idea: Feature-driven Question Answering . . . . . . . . . . . . 9 1.2.1. QA on Text . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.2. QA on KB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.3. With Hard Alignment on Text . . . . . . . . . . . . . . . . . 14 1.2.4. With Soft Alignment on KB . . . . . . . . . . . . . . . . . . 16 1.3. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4. How to Read this Dissertation . . . . . . . . . . . . . . . . . . . . . 23 1.5. Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2. 50 Years of Question Answering 27 2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2. Conferences and Evaluation . . . . . . . . . . . . . . . . . . . . . . 31 2.2.1. TREC (Text REtrieval Conference) QA Track . . . . . . . . 31 viii Contents 2.2.2. QA@CLEF (Cross Language Evaluation Forum) . . . . . . . 35 2.2.3. Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . 38 2.2.3.1. Precision, Recall, Accuracy, F for IR/QA . . . . . 38 β 2.2.3.2. MAP, MRR for IR . . . . . . . . . . . . . . . . . . 40 2.2.3.3. Precision-RecallCurveforIR/QA:DrawnVeryDif- ferently . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.3.4. Micro F vs. Macro F vs. Averaged F for QA . . 42 1 1 1 2.2.3.5. Permutation Test . . . . . . . . . . . . . . . . . . . 44 2.3. Significant Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.1. IR QA: Document and Passage Retrieval . . . . . . . . . . . 46 2.3.2. NLP QA: Answer Extraction . . . . . . . . . . . . . . . . . 48 2.3.2.1. Terminology for Question Analysis . . . . . . . . . 48 2.3.2.2. Template Matching . . . . . . . . . . . . . . . . . . 49 2.3.2.3. Answer Typing and Question Classification . . . . 51 2.3.2.4. Web Redundancy . . . . . . . . . . . . . . . . . . . 53 2.3.2.5. Tree/Graph Matching . . . . . . . . . . . . . . . . 54 2.3.3. IR4QA: Structured Retrieval . . . . . . . . . . . . . . . . . . 55 2.3.4. KB QA: Database Queries . . . . . . . . . . . . . . . . . . . 60 2.3.4.1. Early Years: Baseball, Lunar and 15+ More . . 60 2.3.4.2. Statistical Semantic Parsing . . . . . . . . . . . . . 63 2.3.5. Hybrid QA (IR+NLP+KB) . . . . . . . . . . . . . . . . . . 67 2.3.5.1. IBM Watson . . . . . . . . . . . . . . . . . . . . . 69 2.4. A Different View: Linguistic Features vs. Machine Learning . . . . 78 2.4.1. Linguistics: Word, POS, NER, Syntax, Semantics and Logic 80 2.4.2. Learning: Ad-hoc, Small Scale and Large Scale . . . . . . . 82 2.4.3. Appendix: Publications Per Grid . . . . . . . . . . . . . . . 84 3. Feature-driven QA from Unstructured Data: Text 86 ix Contents 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.2. Tree Edit Distance Model . . . . . . . . . . . . . . . . . . . . . . . 90 3.2.1. Cost Design and Edit Search . . . . . . . . . . . . . . . . . . 91 3.2.2. TED for Sentence Ranking . . . . . . . . . . . . . . . . . . . 94 3.2.3. QA Sentence Ranking Experiment . . . . . . . . . . . . . . 97 3.3. Answer Extraction as Sequence Tagging . . . . . . . . . . . . . . . 98 3.3.1. Sequence Model . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.3.2. Feature Design . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.3.3. Overproduce-and-vote . . . . . . . . . . . . . . . . . . . . . 104 3.3.4. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.3.4.1. QA Results . . . . . . . . . . . . . . . . . . . . . . 105 3.3.4.2. Ablation Test . . . . . . . . . . . . . . . . . . . . . 109 3.3.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.4. Structured Information Retrieval for QA . . . . . . . . . . . . . . . 110 3.4.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.4.2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.4.3. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.4.4. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 3.4.4.1. Data . . . . . . . . . . . . . . . . . . . . . . . . . . 122 3.4.4.2. Document Retrieval . . . . . . . . . . . . . . . . . 125 3.4.4.3. Passage Retrieval . . . . . . . . . . . . . . . . . . . 125 3.4.4.4. Answer Extraction . . . . . . . . . . . . . . . . . . 128 3.4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 3.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4. Discriminative Models for Monolingual Alignment 134 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 x

Description:

Jason Eisner, for whose Natural Language Processing class I was the .. With gold-standard answers, factoid question answering can be Jeopardy! quiz show, but it will not be a “show stopper” (in a negative sense) how this general purpose architecture compares with the expert system, Watson,.

Feature-driven Question Answering with Natural Language Alignment PDF

338 Pages·2015·5.3 MB·English

by Xuchen Yao

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Feature-driven Question Answering with Natural Language Alignment PDF Free - Full Version

by Xuchen Yao| 2015| 338 pages| 5.3| English

Download Feature-driven Question Answering with Natural Language Alignment by Xuchen Yao in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Feature-driven Question Answering with Natural Language Alignment

Detailed Information

Author:	Xuchen Yao
Publication Year:	2015
Pages:	338
Language:	English
File Size:	5.3
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Feature-driven Question Answering with Natural Language Alignment Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Feature-driven Question Answering with Natural Language Alignment PDF?

Yes, on https://PDFdrive.to you can download Feature-driven Question Answering with Natural Language Alignment by Xuchen Yao completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Feature-driven Question Answering with Natural Language Alignment on my mobile device?

After downloading Feature-driven Question Answering with Natural Language Alignment PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Feature-driven Question Answering with Natural Language Alignment?

Yes, this is the complete PDF version of Feature-driven Question Answering with Natural Language Alignment by Xuchen Yao. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Feature-driven Question Answering with Natural Language Alignment PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.