Optimized Database Management System for Agro Informatics Submitted in Partial fulfillment of the requirement for the award of the degree of DOCTOR OF PHILOSOPHY IN Computer Science BY Narendra Kumar Gupta (ID No. : 06PHCOMP001) FACULTY OF ENGINEERING AND TECHNOLOGY SAM HIGGINBOTTOM UNIVERSITY OF AGRICULTURE, TECHNOLOGY AND SCIENCES (FORMERLY ALLAHABAD AGRICULTURAL INSTITUTE) NAINI ALLAHABAD-211007 2016 DECLARATION I, Narendra Kumar Gupta declare that the work presented in this thesis/dissertation entitled 'Optimized Database Management System for Agro Informatics' submitted to the Department of Computer Science and Information Technology in Shepherd Institute of Engineering and Technology, Sam Higginbottom University of Agriculture, Technology and Sciences, Naini, Allahabad, for the award of the Doctor of Philosophy in Computer Science, is an original work. I have neither plagiarized nor submitted the same work for the award of any other degree. In case this undertaking is found incorrect, my degree may be withdrawn unconditionally by the University. Place: Allahabad Date: Narendra Kumar Gupta (ID. No. 06PHCOMP001) Prof. (Dr.) R. K. Isaac , Ph.D., M.Tech (IT), MS, Engg (Ag) Professor, Faculty of Engineering & Technology CERTIFICATE OF ORIGINAL WORK This is to certify that Mr. Narendra Kumar Gupta (I.D. 06PHCOMP001) has conducted the studies reported in the present thesis during 2006-2016 under my guidance and supervision. The results reported by him are genuine and the candidate himself has written script of this thesis. The thesis entitled “Optimized Database Management System for Agro Informatics” is therefore being forwarded for fulfillment of the requirements for the award of the degree of the Doctor of Philosophy in Computer Science under the Faculty of Engineering and Technology, Sam Higginbottom University of Agriculture, Technology & Sciences (Formerly Allahabad Agriculture Institute), Allahabad. Prof. (Dr.) R. K. Isaac Advisor ABSTRACT In agriculture domain huge amount of information is required to store & manipulate as per requirements. So it is needed such type of data model which will be best fitted in this scenario. Information/Data retrieval and storage is a time taking job while using large amount of data, as well as if information is shared across multiple platforms and domains, the task is much challenging. We have plenty of databases to store information, even though the information is required across platform and in optimized way. After study it is found that Native XML database will be better for said problem. XML is widely acceptable as per standards of W3C, and platform independent so that it can be used in any technical domain for information storage, retrieval and traversal. Different XML approaches has been studied for storage purposes and other databases were also used for comparative study. Different technologies are used for information retrieval & storage through XML like DOM (Document Object Model), XQuery, XPath, LINQ (Language Integrated Query, basically used to query data stored in native xml database in Microsoft .Net platform) etc. By using above mentioned approaches and technologies, a agro information system (AIS) is developed for information retrieval /Storage in optimized way using NXD. Gathered information is further optimized by using Artificial Neural Network tool to get best result. Keywords: XML, Optimization, Agro informatics, AIS ACKNOWLEDGEMENT I would like to express my deep and sincere gratitude to my supervisor, Prof. (Dr) R. K. Isaac, Faculty of Engineering & Technology of Sam Higginbottom University of Agriculture, Technology & Sciences, Allahabad, India. His wide knowledge and logical way of thinking have been of great value for me. His knowledge, encouragement and personal guidance have provided a good basis for me to continue my research. I would also like to thank to my former SAC members Late Prof. (Dr.) Wilson Kispotta, Dept. of Agricultural Economics & ABM for their valuable guidance and motivation , Prof. (Dr.) Ajit Paul, Dept. of Mathematics & Statistics for their continuous support and motivation and Dr. Hari Mohan Singh, Dept. of C. S & I. T for their technical help and support. I wish to express my warm and sincere thanks to Prof.(Dr.) R. B. Lal, Hon'ble Vice Chancellor, Sam Higginbottom University of Agriculture, Technology & Sciences, Allahabad, for Providing opportunity and environment to accomplish the research. I would also wish to express my warm and sincere thanks to Prof. (Dr.) A. K. A. Lawrence, Faculty Dean and Er. Deepak Lal, Dean, Faculty of Engineering & Technology, Sam Higginbottom University of Agriculture, Technology & Sciences, Allahabad. For his kind support and guidance. I am deeply grateful to Prof. (Dr.) W. Jeberson, Professor, Dept. of Computer Science and & IT, Sam Higginbottom University of Agriculture, Technology & Sciences, Allahabad, for his spiritual, moral and technical support throughout my work. I am deeply grateful to Er. R. Dileep Kumar, Assistant Professor, Dept. of Computer Science and & IT, Sam Higginbottom University of Agriculture, Technology & Sciences, Allahabad, for his sincere support throughout this work. This thesis would not have been possible without the support of all Teaching, Non Teaching and Administrative staff of Dept. of Computer Science and & IT, SHUATS. I wish to extend my warmest thanks to all those people who supported me directly or indirectly to accomplish this task. It is an honor for me to express my sincere thanks to Director, Indian Institute of Sugarcane Research, Lucknow, who permitted me to use important data for this research. I would also like to thanks to Dr. Syed Sarfaraj Hassan, Principle Scientist, ISCRI for his support and guidance that have been of great value for my work. Last but not the least I owe my loving thanks to my parents Sri Dharmendra Nath Gupta, and Smt. Shanti Gupta, brothers Late Sri Kaushalendra Nath Gupta and Sri Rajiv Kumar Gupta, sister Smt. Sapna Gupta, wife Smt. Ruby Gupta and my loving children Shivani Gupta and Shivam Gupta for continuous support, encouragement throughout this work. Narendra Kumar Gupra TABLE OF CONTENTS Title Page No. TITLE PAGE UNDERTAKING CERTIFICATE OF ORIGINALWORK ABSTRACT i ACKNOWLEDGEMENT ii TABLE OF CONTENTS iv LIST OF TABLES vii LIST OF FIGURES x LIST OF ABBREVIATIONS xiv CHAPTER I INTRODUCTION 1-14 1.1 Overview 1 1.1.1 Structured, Semi structured and Unstructured Data 4 1.2 Motivation 6 1.3 Existing System 7 1.4 Issues and Challenges 11 1.5 Scope of Work and contribution 13 1.6 Objectives 14 CHAPTER II REVIEW OF LITERATURE 15-71 2.1 Analysis of Agricultural data 16 2.2 Management of unstructured, structured and semi structured data 18 2.3 Semi Structured Data And Its Management 20 2.3.1 Nature of XML Data 20 2.3.2 XML Data Storage 22 2.3.3 Database management and data retrieval schemes 24 2.4 Native XML Databases 37 2.4.1 Data versus Documents 40 2.5 Database and Query optimization 41 2.6. A Query Processing Approach for XML Database Systems 50 2.7 Optimization by using Artificial Neural Network (ANN) 68 CHAPTER III MATERIALS AND METHODS 72-129 3.1 Work Contribution 72 3.2 Technology used 72 3.2.1 Hardware Platform used 73 3.2.2 Software Platform used 73 3.2.3 Software Tools used 74 3.3 Methodology 83 3.3.1 Data Acquisition 86 3.3.2 Data Analysis 88 3.3.3 Designing Attribute Hierarchy Schemes 89 3.3.4. Data Framework in RDBMS 91 3.3.5. Data Framework in XML 92 3.3.6. Query Complexity parameters 94 3.3.7. Method for optimization of agro data repository 95 3.3.7.1 Module wise structure representation developed 98 3.3.7.2 Representation of Aggregated Module structure in XML 100 3.3.7.3 Tree structure of agro data repository 102 3.3.7.4 Instance of aggregate module 103 3.3.8. Interface Design 105 3.3.9. Refinement of Query processing 107 3.3.10 Artificial Neural Network Optimization by 122 Levenberg-Marquardt Method CHAPTER IV RESULT AND DISCUSSIONS 130-190 4.1 Approaches for maintaining Agro Data 130 4.2 Developing optimized Database Management System 131 4.2.1 Response time of all types of queries obtained by 133 different approaches (with dataset size 780 records) 4.2.2 Response time of all types of queries obtained by 135 different approaches excluding SQL (with dataset size 780 records) 4.2.3 Response time of all types of queries obtained by 140 different approaches (with dataset size 1560 records) 4.2.4 Response time of all types of queries obtained by 145 different approaches (with dataset size 3120 records) 4.2.5 Response time of all types of queries obtained by 150 different approaches (with dataset size 6240 records) 4.2.6 Consolidated Response Time Of Varying Data Set Size With All The Approaches: 4.3 Artificial Neural Network Modeling and performance optimization 157 4.3.1 Simple Query time optimization and validation 158 4.3.2 Moderate Query time optimization and validation 167 4.3.3 Complex Query time optimization and validation 179 CHAPTER V SUMMARY AND CONCLUSIONS 191-194 5.1 Summary 191 5.2 Conclusion 193 REFERENCES 195-204 APPENDIX-A SCREEN SHOTS OF AIS & TESTING REPORT 205-215 APPENDIX-B RDBMS TABLE SCHEMAS 216-220 PAPERS PUBLISHED 221 LIST OF TABLES Table Title Page No. Table 2.1. Summarizing the comparisons LOB, OR, and the native XML 45 storage approaches. Table 2.2. Query processing abstraction levels 51 Table 2.3. XSL Important characteristics of various XML query languages 53 Table 3.1. Hardware technology used 73 Table 3.2. Software Technology used 73 Table 3.3 Advantages of XML 80 Table 3.4. Data Acquisition Sources 87 Table 3.5. Attribute arrangement scheme 1 89 Table 3.6. Attribute arrangement scheme 2 90 Table 3.7. SQL Simple Queries (Query performed on single ) 107 Table 3.8. SQL Moderate Queries (Query performed on two tables) 107 Table 3.9. SQL Complex Queries (Query performed on More than Two 108 table) Table 3.10. Simple Queries executed using XPath 109
Description: