Answer Selection in Question/Answering Hugo Patinho Rodrigues Dissertation for the Degree of Master of Information Systems and Computer Engineering Jury President: Prof. Doctor Joaquim Armando Pires Jorge Supervisor: Prof. Doctor Maria Lu´ısa Torres Ribeiro Marques da Silva Coheur Member: Prof. Doctor Bruno Emanuel da Gra¸ca Martins July 2012 Resumo Esta tese aborda o problema da selec¸c˜ao da resposta na ´area dos sistemas de Per- gunta/Resposta. A selec¸c˜ao da resposta ´e uma das principais etapas destes sistemas, tendo como objectivo escolher a resposta a devolver com base num conjunto de candidatos. Para tal propomos o AnSelMo (ANSwering SELection MOdule), um m´odulo de selec¸c˜ao de res- posta. A sua abordagem ´e baseada no contexto onde o candidato de resposta se encontra, medindoasdistˆanciasentreostermosdaperguntaedarespostaeusandomedidasdesimilari- dade para comparar passagens relacionadas com a pergunta com passagens relacionadas com a resposta. Outra abordagem explorada ´e baseada em espa¸cos semˆanticos, usando An´alise Semˆantica Latente. O AnSelMo foi testado em trˆes cen´arios distintos: com dados do ‘Quem Quer Ser Milion´ario’ (WWBM), o famoso concurso de perguntas de escolha mu´ltipla, no contexto da avalia¸c˜ao conjunta Question Answering for Machine Reading Evaluation (QA4MRE), uma tarefa de compreens˜ao escrita do Cross Language Evaluation Forum, e no Just.Ask, o sistema de Per- gunta/RespostadoL2F.OsresultadosparaoWWBMultrapassamoestadodaarte,enquanto que para o QA4MRE os resultados s˜ao melhores que grande parte dos obtidos pelos sistemas participantes em 2011. Tamb´em conseguimos melhorar a exactid˜ao do Just.Ask, atrav´es da integra¸c˜ao do AnSelMo. Abstract This thesis addresses the problem of answer selection in Question Answering (QA) systems. Answer selection is one of the main steps of those systems and has as goal to choose the answer to be returned based on a set of candidate answers. For that, we propose AnSelMo, an ANSwering SELection MOdule. Its approach is based on the context where candidate answers appear, by measuring distances between question and answer terms and by using similarity measures to compare passages related with the question with passages related with theanswer. AnotherapproachexploredisbasedinSemanticSpaces,byusingLatentSemantic Analysis. AnSelMowastestedinthreedifferentscenarios: in‘WhoWantstoBeMillionaire?’(WWBM), the famous contest of multiple-answer questions, in Question Answering for Machine Reading Evaluation (QA4MRE), a Cross Language Evaluation Forum reading comprehension task, and with Just.Ask, the L2F QA system. Results for WWBM surpass the state of the art, while for QA4MRE results are better than most of the 2011 competing systems’ results. We were also able to improve Just.Ask in terms of accuracy, by integrating AnSelMo on it. Palavras Chave Keywords Palavras Chave Pergunta/Resposta Selec¸ca˜o de Resposta Contexto Proximidade de Palavras Medidas de Similaridade Espa¸cos Semaˆnticos Keywords Question Answering Answer Selection Context Word Proximity Similarity Measures Semantic Spaces Acknowledgements First of all, I want to thank my advisor Prof. Lu´ısa Coheur for the opportunity to work with her in such amazing topic, and for providing me, not only a fantastic work environment, but also the motivation and fruitful advises that made this work possible. I want to thank to all L2F people as well, specially Ana Mendes, Ricardo Ribeiro and David Matos, for their help, ideas and enlightning discussions. I am also really grateful to my family, that made me who I am today, for their weariless support. I could not forget my friends, namely Andr´e Carvalho, C´atia Pereira and Diogo Godinho, who were so important throughout these years and were always there when I needed. And last, but not least, a special thanks to my friend Pedro Mota, who was my companion during this journey, had put up with me when anything gone wrong and gave me his insight in all matters. Lisboa, July 2012 Hugo Rodrigues
Description: