ebook img

Big Data Analytics in Chemoinformatics and Bioinformatics: With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology PDF

484 Pages·2022·20.012 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Big Data Analytics in Chemoinformatics and Bioinformatics: With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology

Big Data Analytics in Chemoinformatics and Bioinformatics Big Data Analytics in Chemoinformatics and Bioinformatics With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology Edited by Subhash C. Basak Department of Chemistry and Biochemistry, University of Minnesota, Duluth, MN, United States Marjan Vracˇko Theory Department, Kemijski insˇtitut/National Institute of Chemistry, Ljubljana, Slovenia Preface “Weadorechaosbecausewelovetoproduceorder.” —M.C.Escher “...shallwestayourupwardcourse?InthatblessedregionofFourDimensions, shallwelingeratthethresholdoftheFifth,andnotentertherein?Ah,no!Letus ratherresolvethatourambitionshallsoarwithourcorporalascent.Then,yielding toourintellectualonset,thegatesoftheSixDimensionshallflyopen;afterthata Seventh,andthenanEighth...” —EdwinAbbott,In:Flatland “I’mtiredofsailingmylittleboat Farinsideoftheharborbar; Iwanttobeoutwherethebigshipsfloat— OutonthedeepwheretheGreatOnesare!... —DaisyRinehart InsciencethereisandwillremainaPlatonicelementwhichcouldnotbetaken awaywithoutruiningit.Amongtheinfinitediversityofsingularphenomena sciencecanonlylookforinvariants. —JacquesMonod We are currently living in an age when many spheres of science and life are flushed with the explosion of big data. We are familiar with the term “data is the newoil,”butoftenhearaboutinformationoverload ordatadeluge. Weneedtosys- tematically manage, model, interpret, visualize, and use such data in diverse decision-support systems inbasic research, technology,health care,andbusiness,to namejustafew. If we look at the main focus of this book—applications of big data analytics in chemoinformatics, bioinformatics, new drug discovery, and hazard assessment of environmentalpollutants—,itisevidentthatdatainallthesefieldsareexploding. Regarding the size of chemical space, the GDB-17 database contains 166.4 bil- lion molecules containing up to 17 atoms of C, N, O, S, and halogens which fall within the size range containing many drugs and are typical for druggable lead compounds. The sequence data on DNA, RNA, and proteins are increasing each day by new depositions by researchers worldwide. A simple combinatorial exercise ofsequencepossibilityfora100-residuelongproteinsuggests20100differentpossi- ble sequences (considering 20 frequently occurring natural amino acids). Modern computer software can calculate many hundreds, sometimes thousands of xx Preface descriptors for a molecule or a macromolecular sequence. The Vs of big data, viz., validity, vulnerability, volatility, visualization, volume, value, velocity, variety, veracity,andvariability,increasethecomplexityofbigdataanalyticsimmensely. Here, we come face to face with the stark reality of the curse of dimensionality in the big data space of chemistry and biology. Following the parsimony principle, we need to be careful in feature selection and use of robust validation techniques in model building. Finally, analysis and visualization of models to understand their meaningandderive actionableknowledgefromthe vastinformation spacefor prac- tical implementation in the decision-support systems of science and society are of paramountimportance. The first section, General Section, of the book has three chapters. Chapter 1 brieflytracesthehistoryofthedevelopmentofchemodescriptorsandbiodescriptors spanning three centuries—from the eighteenth century to the present. It is observed by the author that the initial characterization of structures, both chemical and bio- logical,werequalitativewhichwasgraduallyfollowedbythedevelopmentofquan- titative chemodescriptors and biodescriptors. The author concluded that in the sociallyandeconomicallyimportantareasofnewdrugdiscoveryandhazardassess- ment of chemicals use of a combined set of chemodescriptors and biodescriptors for model building using big data would be a useful and practical paradigm. Chapter 2 deals with the problem of robust model building from noisy high- dimensional data, focusingprimarily on the robustness aspects against data contam- ination. The author also demonstrates the utility of his method in the prediction of salmonella mutagenicity of a set of amines, a class priority pollutants. Chapter 3 delves into the ethical issues associated with the landscape of desirable qualities such as fairness, transparency, privacy, and robustness of currently used machine learning(ML)methodsofbigdataanalysis. The second section, Chemistry and Chemoinformatics Section, of the book has nine chapters. Chapter 4 discusses the use of big data in the characterization of adverse outcome pathways (AOPs), a novel paradigm in toxicology. The author integrated “big data”—the(cid:1)omics and high-throughput (HT) screening data—to derive AOPs for chemical carcinogens. Chapter 5 discusses the latest progress in the use of ML and DL (deep learning) methods in creating systems that automati- cally mine patterns and learn from data. The author also discuss the challenges and usefulness of DL for quantitative structure(cid:1)activity relationship (QSAR) modeling. Chapter 6 describes retrosynthetic planning and analysis of organic compounds in the synthetic space using big data sets and in silico algorithms. Chapter 7 discusses that the vast amount of historical chemical information is not only a rich source of data, but also a useful tool for studying the evolution of chemistry, chemoinfor- matics, and bioinformatics through a computational approach to the history of chemistry. The author exemplifies that by a case study of recent results on the computational analysis of the evolution of the chemical space. Chapter 8 gives a detailed description of combinatorial techniques useful in studying large data sets with hypercubes and halocarbons as the main focus. Quantum chemical techniques discussed here can generate electronic parameters that have potential for use in QSAR for toxicity prediction of big data sets. Chapter 9 deals with the use of Preface xxi computed high-level quantum chemical descriptors derived from the density func- tional theory in the prediction of property/toxicity of chemicals. Chapter 10 covers the important area of the use of computed pharmacophores in practical drug design from analysis of large databases. Chapter 11 uses ML based classification methods for the detection of hot spots in protein(cid:1)protein interactions and prediction of new hotspots. Chapter 12 discusses applications of decision tree methods like recursive partitioning, phylogenetic-like trees, multidomain classification, and fuzzy cluster- ing within the context of small molecule drug discovery from analysis of large databases. The third section, Bioinformatics and Computatioanl Toxicology Section, of the book has seven chapters. Chapter 13 discusses their contributions in the emerging area of mathematical proteomics approach in developing biodescriptors for the characterization of bioactivity and toxicity of drugs and pollutants. Chapter 14 dis- cusses the important role of efficient computational frameworks developed to cata- log and navigate the protein space to help the drug discovery process. Chapter 15 discusses applications of ML and DL approaches to HT sequencing data in the developmentofprecision medicine usingsingle-nucleotidepolymorphismsasatool of reference. Chapter 16 discusses the development and use of a new class of sequence comparison methods based on alignment-free sequence descriptors in the characterizationofemergingglobal pathogens likethe Zika virusandcoronaviruses (SARS, MERS, and SARS-CoV-2). Chapter 17 discusses the important and emerg- ing issue of different ways of building QSARs from large and diverse data sets that can be continuously updated and expanded over time. The importance of modular- ity in scalable QSAR system development is also discussed. Chapter 18 deals with the applications of network analysisand bigdata tostudyinteractions of drugs with their targets in the biological systems. The authors point out that a paradigm shift integrating big data and complex network is needed to understand the expanding universe of drug molecules, targets, and their interactions. Finally, Chapter 19 reports the use ofML approaches consistingof supervisedand unsupervised techni- ques inthe analysis ofRNAsequence data ofbreast cancer toderive important bio- logical insights.Theywere abletopinpointsomedisease-related genesandproteins inthebreastcancernetwork. Finally, we would like to speciallymention that indrug research and toxicology, we are witnessing an explosion of data, which are expressed by four principal Vs— volume, velocity, variety, and veracity. However, the data per se is useless, the real challenge is the transition to the last two steps on the three-step path to knowledge: data (cid:1) information (cid:1) knowledge. When we talk about big data in drug research and toxicology, we often think of omics data and in vitro data derived from HT screening. On the other hand, a pool of high-quality “small” data exists, which has been collected in the past. Under the label “small data” we have the standard toxi- cological data based on well-defined toxic effects. A future challenge for us is to integratebothdataplatforms—bigandsmall—intoanewandintegratedknowledge extractionsystem. SubhashC.Basak MarjanVracˇko List of contributors Anshika Agarwal In silico Research Laboratory, Eminent Biosciences, Indore, MadhyaPradesh,India Sarah Albogami Department of Biotechnology, College of Science, Taif University,Taif,SaudiArabia Nandadulal Bairagi Department of Mathematics, Centre for Mathematical BiologyandEcology,JadavpurUniversity,Kolkata,West Bengal,India Krishnan Balasubramanian School of Molecular Sciences, Arizona State University,Tempe,AZ,UnitedStates Subhash C. Basak Department of Chemistry and Biochemistry, University of Minnesota,Duluth,MN,UnitedStates Emilio Benfenati Laboratory of Environmental Chemistry and Toxicology, Istituto diRicerche FarmacologicheMarioNegriIRCCS,Milano,Italy Apurba K. Bhattacharjee Department of Microbiology and Immunology, Biomedical Graduate Research Organization, School of Medicine, Georgetown University,Washington,DC,UnitedStates Anushka Bhrdwaj In silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh, India; Department of Bioinformatics, Computer Aided Drug Designing and Molecular Modeling Lab, Alagappa University, Karaikudi, Tamil Nadu,India SumanK.Chakravarti MultiCASEInc.,Beachwood,OH,UnitedStates Pratim Kumar Chattaraj Department of Chemistry, Indian Institute of TechnologyKharagpur,Kharagpur,WestBengal,India Samrat Chatterjee Complex Analysis Group, Translational Health Science and TechnologyInstitute,NCRBiotechScienceCluster,Faridabad,Haryana,India xvi Listofcontributors Ramana V. Davuluri Department of Preventive Medicine, Division of Health and Biomedical Informatics, Northwestern University Feinberg School of Medicine, Chicago,IL,UnitedStates Tathagata Dey Centre for Interdisciplinary Research and Education, Kolkata, West Bengal, India; Department of Computer Science & Engineering, Indian InstituteofTechnologyBombay,Mumbai,Maharashtra,India AbhikGhosh IndianStatisticalInstitute,Kolkata,WestBengal,India Indira Ghosh School of Computational & Integrative Sciences, Jawaharlal Nehru University,NewDelhi,Delhi,India Giuseppina Gini Politecnico di Milano, DEIB, Piazza Leonardo da Vinci, Milano, Italy Lima Hazarika In silico Research Laboratory, Eminent Biosciences, Indore, MadhyaPradesh,India Guang Hu Department of Bioinformatics, Center for Systems Biology, School of BiologyandBasicMedicalSciences,SoochowUniversity,Suzhou,P.R.China Chiakang Hung Politecnico di Milano, DEIB, Piazza Leonardo da Vinci, Milano, Italy Tajamul Hussain Biochemistry Department, College of Science, King Saud University, Riyadh, Saudi Arabia; Center of Excellence in Biotechnology Research, CollegeofScience,KingSaudUniversity,Riyadh,SaudiArabia Yanrong Ji Department of Preventive Medicine, Division of Health and Biomedical Informatics, Northwestern University Feinberg School of Medicine, Chicago,IL,UnitedStates Isha Joshi In silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh,India Taushif Khan Immunology and Systems Biology Department, OPC-Sidra Medicine,Ar-Rayyan,Doha,Qatar Ravina Khandelwal In silico Research Laboratory, Eminent Biosciences, Indore, MadhyaPradesh,India Pawan Kumar National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi,Delhi,India Listofcontributors xvii Shivam Kumar Complex Analysis Group, Translational Health Science and TechnologyInstitute,NCRBiotechScienceCluster,Faridabad,Haryana,India Min Li Department of Bioinformatics, Center for Systems Biology, School of BiologyandBasicMedicalSciences,SoochowUniversity,Suzhou,P.R.China Jie Liao Department of Pathology, Northwestern University Feinberg School of Medicine,Chicago,IL,UnitedStates Claudiu N. Lungu Department of Chemistry, Faculty of Chemistry and Chemical Engineering, Babes-Bolyai University, Cluj, Romania; Department of Surgery, Faculty of Medicine and Pharmacy, University of Galati, Galati, Romania Subhabrata Majumdar AI Vulnerability Database, Seattle, WA, USA; Bias Buccaneers,Seattle,WA,USA Rama K. Mishra Department of Biochemistry and Molecular Genetics, Feinberg SchoolofMedicine,NorthwesternUniversity,Chicago,IL,UnitedStates Manju Mohan In silico Research Laboratory, Eminent Biosciences, Indore, MadhyaPradesh,India Ashesh Nandy Centre for Interdisciplinary Research and Education, Kolkata, West Bengal,India Anuraj Nayarisseri In silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh, India; Department of Bioinformatics, Computer Aided Drug Designing and Molecular Modeling Lab, Alagappa University, Karaikudi, Tamil Nadu, India; Biochemistry Department, College of Science, King Saud University, Riyadh, Saudi Arabia; Bioinformatics Research Laboratory, LeGene Biosciences PvtLtd,Indore,MadhyaPradesh,India ShahulH.Nilar GlobalBloodTherapeutics,SanFrancisco,CA,UnitedStates Ranita Pal Advanced Technology Development Centre, Indian Institute of TechnologyKharagpur,Kharagpur,WestBengal,India Aditi Pande In silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh,India Guillermo Restrepo Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig,Germany

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.