ebook img

Data Science for Genomics PDF

314 Pages·2022·13.627 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Science for Genomics

Data Science for Genomics This page intentionally left blank Data Science for Genomics Edited by Amit Kumar Tyagi Department of Fashion Technology, National Institute of Fashion Technology, New Delhi, India Ajith Abraham Director, Machine Intelligence Research Labs, United States Academic PressisanimprintofElsevier 125London Wall,LondonEC2Y5AS,UnitedKingdom 525BStreet,Suite1650,SanDiego,CA92101,UnitedStates 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates TheBoulevard,Langford Lane,Kidlington,OxfordOX5 1GB,UnitedKingdom Copyright©2023ElsevierInc.Allrightsreserved. Nopart ofthispublicationmay bereproduced ortransmitted inanyform orbyanymeans, electronicor mechanical,including photocopying, recording,oranyinformation storageandretrieval system,withoutpermission inwritingfromthepublisher. Details onhowtoseek permission, furtherinformation aboutthePublisher’spermissions policiesandourarrangements withorganizations suchastheCopyrightClearance CenterandtheCopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions. Thisbookandtheindividual contributionscontainedinitareprotected undercopyrightbythePublisher (otherthanasmay benotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging. As newresearchandexperiencebroadenourunderstanding, changesinresearch methods,professional practices,ormedical treatmentmay becomenecessary. Practitionersandresearchers mustalwaysrelyontheir ownexperience andknowledgeinevaluatingandusingany information,methods,compounds,orexperiments describedherein. Inusingsuchinformation ormethodsthey shouldbe mindfuloftheirown safetyandthesafetyofothers,including partiesforwhom theyhaveaprofessional responsibility. Tothefullestextentofthelaw,neither thePublishernortheauthors,contributors, oreditors, assumeany liabilityforany injuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligence orotherwise,or fromanyuseor operation ofanymethods,products, instructions,or ideascontainedinthematerialherein. ISBN:978-0-323-98352-5 Forinformation onallAcademic Presspublications visitourwebsite at https://www.elsevier.com/books-and-journals Publisher:Mara E.Conner Acquisitions Editor: ChrisKatsaropoulos EditorialProjectManager:TomMearns ProductionProjectManager:Omer Mukthar CoverDesigner: Mark Rogers TypesetbyTNQTechnologies Contents 3. Machine learning in genomics: Contributors xi Preface xiii identification and modeling of Acknowledgment xv anticancer peptides 1. Genomics and neural networks in Girish Kumar Adari, Maheswari Raja electrical load forecasting with and P. Vijaya computational intelligence 1. Introduction 25 Prasannavenkatesan Theerthagiri 2. Materials and methods 26 2.1 Google Colaboratory 26 1. Introduction 1 2.2 Data sets 26 2. Methodology 2 2.3 Pfeature package 26 2.1 RNN 2 2.4 Feature extraction functions 28 2.2 Long short-term memory 4 2.5 Machine learning implementation 29 3. Experiment evaluation 6 2.6 Conclusion 66 3.1 Testing methods effectiveness for References 67 PGVCL data 6 3.2 Testing methods effectiveness for 4. Genetic factor analysis for an early NYISO data 8 diagnosis of autism through 4. Conclusion 9 machine learning References 9 A. Chaitanya Kumar, J. Andrew John, 2. Application of ensemble Maheswari Raja and P. Vijaya learninge based classifiers for 1. Introduction 69 genetic expression data 2. Review of literature 70 classification 3. Methodology 71 Saumendra Kumar Mohapatra, Abhishek Das 3.1 Using KNIME software 71 and Mihir Narayan Mohanty 3.2 Data set analysis through ML algorithms 72 1. Introduction 11 3.3 Naive Bayes learner 72 2. Ensemble learningebased classifiers for 3.4 Fuzzy rule learner 73 genetic data classification 12 3.5 Decision tree learner 73 2.1 Bagging 13 3.6 RProp MLP learner 74 2.2 Boosting 13 3.7 Random forest learner 74 2.3 Stacking 13 3.8 SVM learner 75 3. Stacked ensemble classifier for leukemia 3.9 K-nearest neighbors learner 75 classification 14 3.10 Gradient boosted trees learner 76 3.1 Proposed classification model 14 3.11 K-means clustering 76 3.2 Deep-stacked ensemble classifier 14 4. Results 77 3.3 SVM meta classifier 15 4.1 Graphs obtained 77 3.4 Gradient boosting meta classifier 16 4.2 Inference 82 4. Results and discussion 17 5. Conclusion 82 5. Conclusion 21 Appendix 83 References 21 References 83 v vi Contents 5. Artificial intelligence and data 2. Materials and method 109 science in pharmacogenomics- 2.1 Target protein preparation 110 based drug discovery: future of 2.2 Ligand preparation 110 medicines 2.3 Binding site/catalytic site prediction 110 2.4 Structure minimization 110 Vikas Jhawat, Sumeet Gupta, Monika Gulia 2.5 Grid generation 110 and Anroop Nair 2.6 Molecular docking of proteineligand 1. Introduction 85 using Autodock software 111 2. Artificial intelligence 86 2.7 Hydrogen bond interaction using 3. Artificial intelligence in drug research 88 LigPlot software 111 4. Drug discovery 88 2.8 Screening of compounds for drug 4.1 Drug screening 88 likeness 111 4.2 Drug designing 89 2.9 Screening of compounds for 4.3 Drug repurposing 89 toxicity 111 4.4 ADME prediction 89 3. Results and discussion 111 4.5 Dosage form and delivery system 89 4. Conclusion 125 4.6 PK/PD correlation 89 Declaration 125 5. Pharmacogenomics 90 Nomenclature 125 6. Pharmacogenomics and AI 92 Acknowledgments 126 7. Integration of pharmacogenomics and AI 92 References 126 8. Pharmacogenomic-based clinical 8. Toward automated machine evaluation and AI 95 learning for genomics: evaluation 9. Discussion 95 and comparison of state-of-the-art 10. Conclusion 95 AutoML approaches Abbreviations 96 References 96 Akbar Ali Khan, Prakriti Dwivedi, Sareeta 6. Recent challenges, opportunities, Mugde, S.A. Sajidha, Garima Sharma and and issues in various data analytics Gulshan Soni 1. Into the world of genomics 129 Kannadhasan Suriyan and Nagarajan 2. Need and purpose of analytics in Ramalingam genomics 129 1. Introduction 99 3. Literature review 129 2. Big data 99 4. Research design 131 3. Data analytics 100 4.1 Research design methodology 131 4. Challenges in data analytics 101 4.2 AutoML tools used: PyCaret and 5. Various sectors in data analytics 102 AutoViML 133 6. Conclusion 105 5. AutoML 133 References 105 5.1 Why AutoML and why it should be democratized 133 7. In silico application of data science, 5.2 Architectural design of AutoML 134 genomics, and bioinformatics in 5.3 Democratization of AutoML and screening drug candidates against beyond 134 COVID-19 6. Research outcome 135 Rene Barbie Browne, Jai Narain Vishwakarma, 6.1 Exploratory data analysis 135 Vedant Vikrom Borah, Raj Kumar Pegu and 6.2 Analysis using PyCaret 137 Jayanti Datta Roy 6.3 Analysis using AutoViML 140 6.4 Model comparison: PyCaret and 1. Introduction 107 AutoViML 143 1.1 A brief overview of SARS-CoV-2 108 7. Business implications 148 1.2 Compounds reported with antiviral 8. Conclusion 151 activities 109 References 151 1.3 Herb extracts with antiviral property in Further reading 152 India 109 Contents vii 9. Effective dimensionality reduction 2.3 PIN diodes uses and advantages 171 model with machine learning 2.4 PIN photodiode applications 171 classification for microarray gene 3. Results and simulations 171 expression data 3.1 Effect of light on a PIN photodiode 171 3.2 Procedure to design and observe the Yakub Kayode Saheed effect of light 171 1. Introduction 153 3.3 VeI characteristic of a PIN photodiode 174 2. Related work 154 4. Conclusion 176 3. Materials and methods 155 Appendix (Silvaco Code) 176 3.1 Feature selection 155 Effect of light on the characteristics of pin 3.2 Principal component diode code 176 analysis 155 Effect of light on the characteristics of SDD 3.3 Logistic regression 157 diode code 177 3.4 Extremely randomized trees References 177 classifier 157 11. One step to enhancement the per- 3.5 Ridge classifier 157 formance of XGBoost through GSK 3.6 Adaboost 157 for prediction ethanol, ethylene, 3.7 Linear discriminant analysis 157 ammonia, acetaldehyde, acetone, 3.8 Random forest 157 and toluene 3.9 Gradient boosting machine 157 3.10 K-nearest neighbors 158 Samaher Al-Janabi, Hadeer Majed and Saif 3.11 Data set used for analysis 158 Mahmood 4. Results and discussion 158 1. Introduction 179 4.1 Experimental analysis on 10-fold 2. Related work 180 cross-validation 158 3. Main tools 181 4.2 Experimental analysis on 3.1 Internet of Things (IoTs) 181 eightfold cross-validation 159 3.2 Optimization techniques 181 4.3 Comparison of our findingswith 3.3 Prediction techniques 184 some earlier studies 160 5. Conclusion and future work 160 4. Result of implementation 194 References 161 4.1 Description of dataset 194 4.2 Result of preprocessing 194 10. Analysis the structural, electronic 4.3 Checking missing values 195 and effect of light on PIN 5. Conclusions 201 photodiode achievement through References 202 SILVACO software: a case study 12. A predictive model for classifying Samaher Al-Janabi, Ihab Al-Janabi and Noora colorectal cancer using principal Al-Janabi component analysis 1. Introduction 165 Micheal Olaolu Arowolo, Happiness Eric 1.1 Photodiode 165 Aigbogun, Precious Eniola Michael, Marion 1.2 Effect of light on the IeV characteristics Olubunmi Adebiyi and Amit Kumar Tyagi of photodiodes 165 1.3 IeV characteristics of a photodiode 167 1. Introduction 205 2. Related works 206 1.4 Types of photodiodes 168 3. Methodology 207 1.5 Modes of operation of a photodiode 168 1.6 Effect of temperature on IeV char of 3.1 Experimental dataset 208 3.2 Dimensionality reduction tool 208 photodiodes 168 3.3 Classification 209 1.7 Signal-to-noise ratio in a photodiode 169 3.4 Research tool 210 1.8 Responsivity of a photodiode 169 3.5 Performance evaluation metrics 210 1.9 Responsivity versus wavelength 169 2. PIN photodiode 170 4. Results and discussions 210 5. Conclusion 215 2.1 Operation of PIN photodiode 170 References 215 2.2 Key PIN diode characteristics 170 viii Contents 13. Genomic data science systems 15. Genomic privacy: performance of Prediction and prevention of analysis, open issues, and future pneumonia from chest X-ray images research directions using a two-channel dual-stream convolutional neural M. Shamila, K. Vinuthna and network Amit Kumar Tyagi 1. Introduction 249 Olalekan J. Awujoola, Francisca N. Ogwue- 1.1 Genome data 249 leka, Philip O. Odion, Abidemi E. Awujoola 1.2 Genomic dataversus other types of and Olayinka R. Adelegan data 250 1. Introduction 217 2. Related work 251 2. Review of literature 218 3. Motivation 252 2.1 Introduction 218 4. Importance of genomic data/privacyin 2.2 Convolutional neural networks (CNNs) 219 real life 252 3. Materials and methods 220 5. Techniques for protecting genetic 3.1 Dataset 220 privacy 254 3.2 The proposed architecture: 5.1 Controlled access 254 two-channel dual-stream CNN 5.2 Differential privacy preservation 254 (TCDSCNN) model 220 5.3 Cryptographic solutions 254 3.3 Performance matrix for classification 223 5.4 Otherapproaches 255 4. Result and discussion 224 5.5 Some useful suggestions for protecting 4.1 Visualizing the intermediate layer genomic data 255 output of CNN 224 6. Genomic privacy: use case 255 4.2 Model feature map 224 7. Challenges in protecting genomic data 256 4.3 Model accuracy 224 8. Opportunities in genomic data privacy 258 5. Conclusion and future work 224 9. Arguments about genetic privacy with References 227 several other privacy areas 259 10. Conclusion with future scope 260 14. Predictive analytics of genetic Appendix A 260 variation in the COVID-19 genome Authors’ contributions 262 sequence: a data science Acknowledgments 262 perspective References 262 V. Kakulapati, S. Mahender Reddy, Sri Sai 16. Automated and intelligent systems Deepthi Bhrugubanda and Sriman Naini for next-generation-based smart 1. Introduction 229 applications 1.1 Objectives 231 2. Related work 231 H.R. Deekshetha and Amit Kumar Tyagi 3. The COVID-19 genomic sequence 232 1. Introduction 265 3.1 The relevance of genome sequences 2. Background work 265 to disease analyses 233 3. Intelligent systems for smart 3.2 Utilization of COVID-19 genome applications 266 sequencing for processing 233 4. Automated systems for smart 4. Methodology 235 applications 266 4.1 Implementation analysis 240 5. Automated and intelligent systems for Lung epithelial similarity 241 smart applications 266 5. Discussion 243 6. Machine learning and AI technologies for 6. Conclusion 243 smart applications 267 7. Future outlook 245 7. Analytics for advancements 267 References 245 8. Cloud strategies: hybrid, containerization, Further reading 247 serverless, microservices 267 Contents ix 9. Edge intelligence 268 20. Conclusion and future scope 274 10. Data governance and quality for smart Acknowledgments 274 applications 268 References 274 11. Digital Ops including DataOps, AIOps, Further reading 276 and CloudSecOps 269 17. Machine learning applications for 12. AI in healthcaredfrom data to COVID-19: a state-of-the-art review intelligence 270 13. Big data analytics in IoT-based smart Firuz Kamalov, Aswani Kumar Cherukuri, Hana applications 271 Sulieman, Fadi Thabtah and Akbar Hossain 14. Big data applications in a smart city 271 15. Big data intelligence for cyber-physical 1. Introduction 277 systems 272 2. Forecasting 278 16. Big data science solutions for real-life 3. Medical diagnostics 280 applications 272 4. Drug development 283 17. Big data analytics for cybersecurity and 5. Contact tracing 284 privacy 272 6. Conclusion 286 18. Data analytics for privacy-by-design in References 287 smart health 273 19. Case studies and innovative applications 273 19.1 Innovative bioceramics 273 Index 291

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.