Table Of Content

USE OF WEB PAGE CREDIBILITY INFORMATION IN INCREASING THE ACCURACY OF WEB-BASED QUESTION ANSWERING SYSTEMS ASAD ALI SHAH THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENT FOR THE DEGREE OF DOCTOR OF PHILOSOPHY FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR 2017 UNIVERSITY OF MALAYA ORIGINAL LITERARY WORK DECLARATION Name of Candidate: Asad Ali Shah Registration/Matric No: WHA120030 Name of Degree: Doctor of Philosophy in Computer Science Title of Project Paper/Research Report/Dissertation/Thesis (“this Work”): Use of Web Page Credibility Information in Increasing the Accuracy of Web-Based Question Answering Systems Field of Study: Information Systems I do solemnly and sincerely declare that: (1) I am the sole author/writer of this Work; (2) This Work is original; (3) Any use of any work in which copyright exists was done by way of fair dealing and for permitted purposes and any excerpt or extract from, or reference to or reproduction of any copyright work has been disclosed expressly and sufficiently and the title of the Work and its authorship have been acknowledged in this Work; (4) I do not have any actual knowledge nor do I ought reasonably to know that the making of this work constitutes an infringement of any copyright work; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (“UM”), who henceforth shall be owner of the copyright in this Work and that any reproduction or use in any form or by any means whatsoever is prohibited without the written consent of UM having been first had and obtained; (6) I am fully aware that if in the course of making this Work I have infringed any copyright whether intentionally or otherwise, I may be subject to legal action or any other action as may be determined by UM. Candidate’s Signature Date: 3rd Aug 2017 Subscribed and solemnly declared before, Witness’s Signature Date: Name: Designation: ii ABSTRACT Question Answering (QA) systems offer an efficient way of providing precise answers to questions asked in natural language. In the case of Web-based QA system, the answers are extracted from information sources such as Web pages. These Web-based QA systems are effective in finding relevant Web pages but either they do not evaluate credibility of Web pages or they evaluate only two to three out of seven credibility categories. Unfortunately, a lot of information available over the Web is biased, false and fabricated. Extracting answers from such Web pages leads to incorrect answers, thus decreasing the accuracy of Web-based QA systems and other system relying on Web pages. Most of the previous and recent studies on Web-based QA systems focus primarily on improving Natural Language Processing and Information Retrieval techniques for scoring answers, without conducting credibility assessment of Web pages. This research proposes a credibility assessment algorithm for evaluating Web pages and using their credibility score for ranking answers in Web-based QA systems. The proposed credibility assessment algorithm uses seven categories for scoring credibility, including correctness, authority, currency, professionalism, popularity, impartiality and quality, where each category consists of one or more credibility factors. This research attempts to improve accuracy in Web-based QA systems by developing a prototype Web- based QA system, named Optimal Methods QA (OMQA) system, which uses methods producing highest accuracy of answers, and improving the same by adding a credibility assessment module, called Credibility-based OMQA (CredOMQA) system. Both OMQA and CredOMQA systems have been evaluated with respect to accuracy of answers, using two quantitative evaluation metrics: 1) Percentage of queries correctly answered and 2) Mean Reciprocal Rank evaluation metrics. Extensive quantitative experiments and analyses have been conducted on 211 factoid questions taken from TREC QA track from iii 1999, 2000 and 2011 and a random sample of 21 questions from CLEF QA track for comparison and conclusions. Results from methods and techniques evaluation show that some techniques improved accuracy of answers retrieved more than others performing the same function. In some cases, combination of different techniques produced higher accuracy of answers retrieved than using them individually. The inclusion of Web pages credibility score significantly improved accuracy of the system. Among the seven credibility categories, four categories including correctness, professionalism, impartiality and quality had a major impact on accuracy of answer, whereas authority, currency and popularity played a minor role. The results conclusively establish that proposed CredOMQA performs better than other Web-based QA systems. Not only that, it also outperforms other credibility-based QA systems, which employ credibility assessment partially. It is expected that these results will help researchers/experts in selecting Web-based QA methods and techniques producing higher accuracy of answers retrieved, and evaluate credibility of sources using credibility assessment module to improve accuracy of existing and future information systems. The proposed algorithm can also help in designing credibility-based information systems in the areas of education, health, stocks, networking and media, requiring accurate and credible information, and would help enforce new Web-publishing standards, thus enhancing overall Web experience. iv ABSTRAK Sistem soal jawab (QA) menawarkan cara yang cekap untuk memberikan jawapan yang tepat kepada soalan-soalan yang ditanya dalam bahasa asli. Dalam kes sistem QA berasaskan Web, jawapan diambil daripada sumber-sumber maklumat seperti laman Web. Sistem QA berasaskan Web ini berkesan dalam mencari laman Web yang berkaitan tetapi tidak menilai kredibiliti laman Web tersebut atau hanya menilai dua hingga tiga daripada tujuh kategori kredibiliti. Malangnya, kebanyakan maklumat yang disediakan melalui laman Web adalah berat sebelah, palsu dan fabrikasi. Pengekstrakan jawapan dari sistem QA berasaskan Web tersebut menunjukan jawapan yang kurang tepat, sejurusnya mengurangkan ketepatan sistem QA berasaskan Web dan sistem lain yang bergantung kepada laman Web. Kebanyakan kajian sistem QA berasaskan Web yang lepas dan yang terbaru pada asasnya tertumpu dalam memperbaiki teknik pemprosesan bahasa asli dan teknik capaian maklumat untuk pemarkahan jawapan, tanpa membuat penilaian kredibiliti laman Web. Kajian ini mencadangkan satu algorithm penilaian kredibiliti untuk menilai laman Web dan menggunakan skor kredibiliti untuk kedudukan jawapan dalam sistem QA berasaskan Web. Model penilaian kredibiliti yang dicadangkan menggunakan tujuh kategori untuk menjaringkan kredibiliti, termasuk ketepatan, kuasa, mata wang, profesionalisme, populariti, kesaksamaan dan kualiti, di mana setiap kategori terdiri daripada satu atau lebih faktor kredibiliti. Kajian ini cuba meningkatkan ketepatan dalam sistem QA berasaskan Web dengan membangunkan prototaip sistem QA berasaskan Web yang dinamakan Optimal Methods QA (OMQA), yang menggunakan kaedah menghasilkan ketepatan tertinggi jawapan, dan meningkatkannya dengan penambahan penilaian modul kredibiliti, yang dipanggil sistem Credibility-based OMQA (CredOMQA). Kedua-dua sistem OMQA dan CredOMQA telah dinilai dari segi ketepatan jawapan, menggunakan dua metrik penilaian kuantitatif: 1) Peratusan v pertanyaan yang dijawab dengan betul dan 2) metrik penilaian Mean Reciprocal Rank. Eksperimen kuantitatif dan analisis yang meluas telah dijalankan ke atas 211 soalan factoid dari trek TREC QA tahun 1999, 2000 dan 2011 dan sampel rawak 21 soalan daripada trek CLEF QA untuk perbandingan dan kesimpulan. Hasil daripada kaedah dan teknik penilaian menunjukkan bahawa beberapa teknik meningkatkan ketepatan jawapan lebih daripada teknik lain yang melaksanakan fungsi yang sama. Dalam beberapa kes, gabungan teknik yang berbeza menghasilkan ketepatan jawapan yang lebih tinggi daripada menggunakan mereka secara individu. Kemasukan kredibiliti skor laman Web meningkatkan ketepatan sistem dengan ketara. Antara tujuh kategori kredibiliti, lima kategori termasuk ketepatan, profesionalisme, kesaksamaan dan kualiti mempunyai kesan yang besar kepada ketepatan jawapan, manakala kuasa, populariti dan mata wang memainkan peranan yang kecil. Keputusan muktamad membuktikan bahawa cadangan CredOMQA lebih berkesan daripada sistem QA berasaskan Web yang lain. Bukan sekadar itu, ia juga mengatasi sistem QA berdasarkan kredibiliti yang menggunakan sebahagian penilaian kredibiliti. Ia dijangka bahawa keputusan ini akan membantu penyelidik/pakar-pakar dalam memilih kaedah QA berasaskan Web dan teknik menghasilkan ketepatan yang lebih tinggi dalam pengekstrakan jawapan, dan menilai kredibiliti sumber menggunakan algorithm penilaian kredibiliti untuk meningkatkan ketepatan yang sedia ada dan sistem maklumat kelak. Model yang dicadangkan juga boleh membantu dalam merekabentuk sistem maklumat berasaskan kredibiliti termasuk bidang pendidikan, kesihatan, saham, rangkaian dan media, yang memerlukan maklumat yang tepat serta boleh dipercayai, dan membantu vi menguatkuasakan piawaian Web-penerbitan baharu, sekali gus meningkatkan keseluruhan pengalaman Web. vii ACKNOWLEDGEMENTS First and foremost, thanks to Allah for bestowing me the knowledge and guiding me in pursuing Ph.D. Accomplishing anything requires both moral and technical guidance. For technical guidance I will like to thank my supervisor Dr. Sri Devi Ravana for always being cooperative and providing the necessary assistance whenever it was required. I would also like to thank my co-supervisors, Dr. Suraya Hamid and Dr. Maizatul Akmar Binti Ismail, for also giving advice on improving my work. A man can only achieve a little without moral support, for that all credit goes to my better half, my wife Arooj, who always has been encouraging me to give my best and has always been supporting me whenever I needed it the most. My daughter has also been a blessing for me during my PhD, every time I looked at her I knew what needed to be done and that kept me pushing forward. Lastly, my parents, in-laws and family members back home who have been supporting and guiding me throughout my research. viii TABLE OF CONTENTS Abstract ............................................................................................................................ iii Abstrak .............................................................................................................................. v Acknowledgements ........................................................................................................ viii Table of Contents ............................................................................................................. ix List of Figures ................................................................................................................ xiv List of Tables................................................................................................................. xvii List of Symbols and Abbreviations ................................................................................ xxi CHAPTER 1: INTRODUCTION .................................................................................. 1 Motivation................................................................................................................ 3 1.1.1 Web-based QA systems methods and techniques ...................................... 8 1.1.2 Credibility assessment ................................................................................ 9 Research questions................................................................................................. 11 Research objectives ............................................................................................... 11 Contributions ......................................................................................................... 12 Overview of research ............................................................................................. 13 Structure of the thesis ............................................................................................ 15 CHAPTER 2: LITERATURE REVIEW .................................................................... 17 Web-based QA systems ......................................................................................... 17 2.1.1 QA systems types and characterization .................................................... 17 2.1.2 Web-based QA systems vs state-of-the-art QA systems .......................... 21 2.1.3 Web-based QA system model .................................................................. 22 2.1.4 Methods and techniques in Web-based QA systems ................................ 23 2.1.4.1 Question analysis ....................................................................... 29 ix 2.1.4.2 Answer extraction ..................................................................... 31 2.1.4.3 Answer scoring .......................................................................... 38 2.1.4.4 Answer aggregation ................................................................... 41 2.1.5 Web-based QA systems summary ............................................................ 42 Web credibility ...................................................................................................... 43 2.2.1 Defining credibility .................................................................................. 43 2.2.2 Perceiving Web credibility and difficulties faced .................................... 44 2.2.3 Credibility categories ............................................................................... 49 2.2.3.1 Correctness ................................................................................ 50 2.2.3.2 Authority ................................................................................... 51 2.2.3.3 Currency .................................................................................... 52 2.2.3.4 Professionalism ......................................................................... 54 2.2.3.5 Popularity .................................................................................. 56 2.2.3.6 Impartiality ................................................................................ 57 2.2.3.7 Quality ....................................................................................... 58 2.2.3.8 Credibility categories-summary ................................................ 60 2.2.4 Web credibility evaluation ....................................................................... 61 2.2.4.1 Evaluation techniques by humans ............................................. 62 2.2.4.2 Evaluation techniques using computers .................................... 76 2.2.4.3 Issues in the existing Web credibility evaluation approaches ... 98 Credibility assessment in Web-based QA systems ................................................ 99 Research gap ........................................................................................................ 105 CHAPTER 3: RESEARCH METHOLODY ............................................................ 109 Research flow ...................................................................................................... 109 3.1.1 Web credibility assessment .................................................................... 109 3.1.2 Develop a Web-based QA system .......................................................... 110 x

Description:

ASAD ALI SHAH .. 2.1.5 Web-based QA systems summary . et al., 2001b; Nakamura et al., 2007; Popat, Mukherjee, Strötgen, & Weikum, and for the Web page itself factual correctness, expert popularity, citations, The credibility-based Web QA system was developed using PHP 5.6.1 Web

use of web page credibility information in increasing the accuracy of web-based question ... PDF

305 Pages·2017·6.35 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Download use of web page credibility information in increasing the accuracy of web-based question ... PDF Free - Full Version

by Unknow| 2017| 305 pages| 6.35| English

Download use of web page credibility information in increasing the accuracy of web-based question ... by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About use of web page credibility information in increasing the accuracy of web-based question ...

Detailed Information

Author:	Unknown
Publication Year:	2017
Pages:	305
Language:	English
File Size:	6.35
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free use of web page credibility information in increasing the accuracy of web-based question ... Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download use of web page credibility information in increasing the accuracy of web-based question ... PDF?

Yes, on https://PDFdrive.to you can download use of web page credibility information in increasing the accuracy of web-based question ... by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read use of web page credibility information in increasing the accuracy of web-based question ... on my mobile device?

After downloading use of web page credibility information in increasing the accuracy of web-based question ... PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of use of web page credibility information in increasing the accuracy of web-based question ...?

Yes, this is the complete PDF version of use of web page credibility information in increasing the accuracy of web-based question ... by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download use of web page credibility information in increasing the accuracy of web-based question ... PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.