ebook img

Evaluation of Expressions with Uncertainty in Databases PDF

66 Pages·2013·0.45 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Evaluation of Expressions with Uncertainty in Databases

University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 5-18-2007 Evaluation of Expressions with Uncertainty in Databases Moginraj Mohandas University of New Orleans Follow this and additional works at:http://scholarworks.uno.edu/td Recommended Citation Mohandas, Moginraj, "Evaluation of Expressions with Uncertainty in Databases" (2007).University of New Orleans Theses and Dissertations.Paper 535. This Thesis is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UNO. It has been accepted for inclusion in University of New Orleans Theses and Dissertations by an authorized administrator of ScholarWorks@UNO. The author is solely responsible for ensuring compliance with copyright. For more information, please [email protected]. Evaluation of Expressions with Uncertainty in Databases A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of Masters of Science in The Department of Computer Science by Moginraj Mohandas B.Tech, Lal Bahadur Shastri College of Engineering, India, 2003 May, 2007 Copyright 2006, Moginraj Mohandas ii Acknowledgement I wish to express my gratitude to a number of people who became involved with this thesis, one way or another. I am deeply indebted to my supervisor, Dr. Nauman Chaudhry whose help, stimulating discussions; suggestions and encouragement helped me in all the time of research and for writing of this thesis. In addition, I would like to thank Dr. Mahdi Abdelguerfi and Dr. Golden Richard III for being on my thesis defense committee. Finally, I would like to thank my family and friends for the continuous support and help they provided to me. I would like to dedicate this thesis to my parents. iii Table of Contents List of Tables.......................................................................................................................v List of Figures…………………………………………………………………………….vi Abstract…………………………………………………………………………………..vii 1 Introduction………………………………………………………………………..1 2 Expression Syntax and Terminology……………………………………………...5 2.1 Syntax of Crisp Expressions……...…………………………….....5 2.2 Syntax for Modeling Uncertainty in Expressions and Data Items……………………................................................................9 3 Related Work…………………………………………………………………….14 3.1 Previous Work on Expressions: The Oracle Method…………….14 3.1.1 How Oracle stores Expressions………………….14 3.1.2 Evaluation of Expressions(EVALUATE operator)………………………………………….15 3.1.3 Expression Data Type and Handling: Oracle Approach………………………………………...16 3.2 Uncertainty Handling by different Systems……………………..17 3.2.1 Automated Ranking……………………………...17 3.2.2 Uncertainty in Publish/Subscribe Systems………21 3.2.3 Trio(ULDB)……………………………………..23 4 Semantics of Matching Uncertain Expressions and Uncertain Data………………………...………………………………………………….....25 4.1 Evaluating Crisp Expressions against Crisp Data Items…………25 4.1.1 Semantics of Matching at the Predicate Level……………………………….…………….25 4.1.2 Formal Model for Evaluating Crisp Expressions against Crisp Data Items…………………………27 4.1.2.1Variable to describe Evaluation cases……………………………...27 4.1.2.2Categorizing Cases of Crisp Expression-Data Item Evaluation..28 4.2 Semantics of Matching Uncertain Predicates Against Uncertain Data Terms……………………………………………………….29 4.3 Evaluation Semantics for Expressions and Data Items with Approximate Matching and Uncertainty………………….……..38 4.3.1 Crisp Data Items, Crisp Expressions……..……...38 4.3.2 Uncertain Data Items, Uncertain Expressions…...42 5 Supporting Uncertain Expressions in a DBMS………………………………….46 5.1 Expression Data Type……………………………………………46 5.2 Details of Implementation……………………………………….46 6 Conclusion and Future Work………………………………………………….....51 References………………………………………………………………………………..53 Appendix.………………………………………………………………………………...55 Vita……………………………………………………………………………………….58 iv List of Tables Table 2.1 Example of an Expression Table…………………………………………….7, 8 Table 2.2 Example of a Conceptual Table with Data Items as column values……………9 Table 2.3 Example of Table with Expressions(with Uncertainty)…………………...11, 12 Table 3.1 Consumer Table…………………………………………………………...14, 15 Table 3.2 Example Table for ULDB……………………………………………………..24 Table 3.3 Example Table for ULDB……………………………………………………..24 Table 4.1 Example Table to show the various categorizing cases…………………...28, 29 Table 4.2 Comparison Table for Evaluation(without Uncertainty)………………….38, 39 Table 4.3 Comparison Table for Evaluation(with Uncertainty)……………………..42, 43 v List of Figures Figure 5.1 Screenshot of Insert and Select in Postgresql……………………………….48 Figure 5.2 Screenshot of Evaluation of Expressions against Data Items using Evaluate operator…………………………………………………………………………………50 vi Abstract Expressions are used in a range of applications like Publish/Subscribe, Ecommerce, etc. Integrating support for expressions in a database management system (DBMS) provides an efficient and scalable platform for applications that use Expressions. Support from uncertain data and expressions can be beneficial but not currently provided for. In this thesis, we investigate how expressions with uncertainty can be integrated in a DBMS like other data. We describe the underlying theory and implementation of UNXS (UNcertain eXpression System), a system that we have developed to handle uncertainty in expressions and data. We develop a theoretical model to compare and contrast different previous work in supporting uncertainty in DBMS and Publish/Subscribe systems. We extend the existing approaches to propose new techniques for matching uncertain expressions to uncertain data in UNXS. We then describe an implementation that integrates this support in Postgresql DBMS, which to our knowledge is the first such implementation. . vii Chapter 1 Introduction Expressions are a good way to model the interests of a user in expected data. Expressions have been used in various application domains, such as, Publish/Subscribe [2, 13], Ecommerce [11], Website Personalization [12], etc. In such application domains, users want to specify their interest in expected data in terms of expressions defined on this data. The application needs to persistently maintain these expressions and match data with these expressions to inform users of items of their interest. For example, in a Publish/Subscribe system [13, 2] for a “Real Estate” application, subscriptions correspond to the interests of the user defined over various attributes of houses. These subscriptions are matched against publications of houses being sold that are published by an information provider. Example: Subscription: Notify me of houses with price less than forty thousand that have a garage and have more than two bedrooms. This subscription can formally be modeled as the following Expression: price<40000 AND garage=yes AND bedrooms>2 This expression, as well as expressions defined by other users, would be matched with Publications that are given by the Real Estate Agents, who are the information providers. Many times users find it easier to express their interest in uncertain or vague terms, rather than in precise terms [2], [9]. The need to process uncertain data [7] has also become very prominent in database management system research, especially in recent years because of the wide varieties of sources that data come from [13]. However, currently most database management systems as well as publish/subscribe systems do not support uncertain data. Such data models, that do not support uncertain data, are called crisp data models. But, lets look at an example from the application “Real Estate”. A user may be certain that he 1 can pay up to $40,000 for a house and the house must have more than 3 bedrooms. Additionally, even though he would like to have the house to have a garden, but he is willing to consider houses without a garden. Likewise, he would like the house to have a garage, but gives greater priority to a garage than a garden. In case of publications too, a Real Estate Agent might not be absolutely certain as to whether a particular house has a garden. Such Uncertainties may occur because the information may not be available in its entirety at a given point of time. This can happen in a variety of application domains as the information may be gathered from multiple sources, some of which may have more complete information than others. We can modify the previous example to illustrate the need for modeling uncertainty. Example: Subscription : Notify me of houses with price less than forty thousand that have two bedrooms. In addition, it would be good if these houses had a garage, even though it is not a required criterion. This subscription can be formally modeled via the following Expression that supports Uncertainty: price<40000 1.0 AND garage=yes 0.3 AND bedrooms>2 1.0 Notice that we have attached Confidence values with individual conditions in the expression. The conditions on price and bedrooms have a Confidence value of 1.0 that expresses the user’s requirement that any house of interest to them must meet these conditions. The condition on garage has a Confidence of 0.3 and this means that meeting this condition has less priority than the other conditions. Therefore, even if an Agent is not completely certain whether a particular house has a garage, that house should be considered for this subscription. However, matching Uncertain data with Uncertain expressions requires that we have an appropriate theoretical model for determining the match. We will develop such a theoretical model in this thesis. 2

Description:
In the first table SAW, the X-tuple signifies that Cathy either saw a Ford or a Honda, and if she saw a Ford, she was 50% sure it was a Ford; and if she saw a Honda, . Alpha: The Confidence for each Data Term (Alpha1, Alpha2 etc). Beta: The Confidence for each Predicate (Beta1, Beta2 etc). Example:
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.