ebook img

Muhammad Ali Norozi PDF

226 Pages·2014·2.21 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Muhammad Ali Norozi

Muhammad Ali Norozi The Contextual Features in Schema-Agnostic Environment Thesis for the degree of Philosophiae Doctor Trondheim, March 2014 Norwegian University of Science and Technology Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science NTNU Norwegian University of Science and Technology Thesis for the degree of Philosophiae Doctor Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science © Muhammad Ali Norozi ISBN 978-82-326-0120-2 (printed ver.) ISBN 978-82-326-0121-9 (electronic ver.) ISSN 1503-8181 Doctoral theses at NTNU, 2014:98 Printed by NTNU-trykk To Bibi Zahra, Maeda, Sani-E-Zahra and Sultana Ali Norozi. Abstract “Like your body your mind also gets tired so refresh it by wise sayings.” – ALI Relevance scoring and estimation deals with both finding the relevant set of answers and ordering them according to the degree of their relevance to the user-intent. Thetraditionalinformationretrieval(IR)systemssuccessfullyfindand ordertherelevantdocumentsandleavethemtotheusers, whothenhavetolocate the relevant information embedded somewhere within the document. In contrast, estimating relevance in semi-structured retrieval means not only retrieving and ordering the relevant documents but also locating the relevant information within the document as well. When it comes to semi-structured retrieval, the traditional IR style retrieval is simply insufficient. The main focus of this thesis is estimating relevance in a schema-agnostic environ- ment. Here, “schema-agnostic” means that the schema or the structure exists ex- plicitlywithinthedocumentsbuttheuserdoesnotorneednotknowthatschema. In such an environment, the structure is generally defined loosely, which means: (a) it can evolve over time, (b) it can constitute a large part of the data, and (c) it might exist seamlessly within the document. The natural question that comes into mind is, why is such a structure there at all? The structure in a schema- agnostic environment is there to be used by retrieval systems for several useful tasks. This thesis is about unveiling the capabilities of the structural constructs within semi-structured documents in schema-agnostic settings. Structural constructs can form what we call the structural context of the relevant item. Astructuralcontextbuildsuptheinternalandexternalcontextualfeaturesof a semi-structured document. These contextual features help with a series of tasks. The work presented in this thesis contributes towards understanding and utilizing the contextual features in the retrieval of focused information in schema-agnostic settings. During the course of this study we have identified, implemented and experimented with several intuitive types of contextual features in semi-structured retrieval set- tings. Contextualization isthegenericprocessofutilizingfeaturesinthestructural context of the retrievable units in relevance scoring. The proposed retrieval ap- proaches, based mainly on contextual features, exhibited notable improvements in retrieval effectiveness, during empirical analyses. i Theevaluationsandempiricalanalysesareperformedinseveraltasks,spreadacross different phases of this study. The tasks are performed by looking at different aspectsandchallengesofthesemi-structuredretrievaldomain. Thefollowingtasks areperformedatdifferentphasesofthisstudy: ad-hoctasks,granulationtasks,and standard tasks offered by INitiative for the Evaluation of Xml retrieval (INEX). The contributions of this thesis are also grouped by these tasks. ii Preface “Often your utterances and expressions of your face leak out the secrets of your hidden thoughts.” – ALI This doctoral thesis was submitted to the Norwegian University of Science and Technology (NTNU) in partial fulfillment of the requirements for the de- gree of philosophiae doctor. The work has been performed at the Department of ComputerandInformationScience,NTNU,TrondheimandCentrumWiskuned& Informatica (CWI), Amsterdam, the Netherlands. The doctoral program has been supervised by Dr. Øystein Torbjørnsen (Microsoft Development Center Norway), Professor Jon Atle Gulla, and Professor Svein-Olaf Hvasshovd. At CWI, the work has been mentored by Professor Arjen P. de Vries. During the process, the candi- date has been supported by the Information Access Disruptions (iAd) project and funded by the Research Council of Norway and NTNU. iii Acknowledgements “The value of each man depends upon the art and skill which he has attained.” – ALI Crafting the Ph.D. is like writing a book of poetry. Like poetry, you have to get in love with a certain area, and, like poetry, from that loved area, somethingnewandcreativeshouldsuddenlycomeintoyourmind. Eachandevery sentenceiscraftedlikeaverseinpoetry. Hence,itisthebestofbothascienceand an art. Science comes into play when you ascertain your imagination, everything elseisart. Theartofideasandtheartofputtingthoseideasintoalimitednumber of pages, with the help of science. When you start a Ph.D., a hard piece of rock in the name of the Ph.D. topic is handed over to you. Throughout the years you have to carve and craft that hard rock, tirelessly. By the end of this tiresome journey you have to have one beautiful and monolithic sculpture out of that hard rock, which will automatically reflect your creativity. In this thesis, I exhibit the sculpture out of this work. The thought provoking, formalized curiosity, and mind boggling yet creative expe- rience of a Ph.D. is certainly not the work of one individual, the Ph.D. candidate. Unfortunately, the names of those great people and things who made this work possible should not be mentioned on the cover of this book, rather only in this section. I name them as follows: First and foremost, I thank my supervisors Dr. Øystein Torbjørnsen and Prof. Jon Atle Gulla, for their endless support and encouragements. Our meetings were an important source of knowledge and experience, which helped me to discover, structure and clarify the ideas presented in this thesis. The iAD project is also to be thanked for the annual meetings and get-togethers. They were fruitful and enlightening. My deepest and heartfelt gratitude goes to my good friend and co- author Dr. Paavo Arvola for his kindness, courage and easy going personality. I alsoextendmywarmgratitudetoProf.ArjenP.deVriesforgivingmeanamazing opportunity to stay and learn from the energetic group at the Centrum Wiskunde & Informatica, Amsterdam. Finally, Iwouldextendmydeepestloveandgratitudetomyfamilyandfriendsfor their support, encouragement, and patience, which always kept me going, despite hardships. Without your enduring trust, love, and support throughout all these years, I never would have completed this work. Muhammad Ali Norozi Trondheim 02.03.2014 v

Description:
Here, “schema-agnostic” means that the schema or the structure exists ex- The work presented in this thesis contributes towards understanding and utilizing The contributions of this thesis are also grouped by these tasks. ii
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.