Table Of Content

Machine Learning in Bioinformatics of Protein Sequences Algorithms, Databases m o and Resources for c ntific. MBoiodinefronr Pmraottiecisn e ci s d orl w w. w w m o d fr e d a o nl w o D TTThhhiiisss pppaaagggeee iiinnnttteeennntttiiiooonnnaaallllllyyy llleeefffttt bbblllaaannnkkk m o c c. ntifi e ci s d orl w w. w w m o d fr e d a o nl w o D 1 MEEPQSDPSV EPPLSQETFS DLWKLLPENN VLSPLPSQAM DDLMLSPDDI EQWFTEDPGP 61 DEAPRMPEAA PPVAPAPAAP TPAAPAPAPS WPLSSSVPSQ KTYQGSYGFR LGFLHSGTAK 121 SVTCTYSPAL NKMFCQLAKT CPVQLWVDST PPPGTRVRAM AIYKQSQHMT EVVRRCPHHE 181 RCSDSDGLAP PQHLIRVEGN LRVEYLDDRN TFRHSVVVPY EPPEVGSDCT TIHYNYMCNS 241 SCMGGMNRRP ILTIITLEDS SGNLLGRNSF EVRVCACPGR DRRTEEENLR KKGEPHHELP 301 PGSTKRALPN NTSSSPQPKK KPLDGEYFTL QIRGRERFEM FRELNEALEL KDAQAGKEPG 361 GSRAHSSHLK SKKGQSTSRH KKLMFKTEGP DSD Machine Learning in Bioinformatics of Protein Sequences Algorithms, Databases m o c.c and Resources for ntifi Modern Protein e ci s Bioinformatics d orl w w. w w m o d fr e d a o wnl Editor o D Lukasz Kurgan World Scientific NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI • TOKYO Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE Library of Congress Cataloging-in-Publication Data Names: Kurgan, Lukasz, editor. Title: Machine learning in bioinformatics of protein sequences : algorithms, databases and resources for modern protein bioinformatics / editor, Lukasz Kurgan. Description: First edition. | New Jersey : World Scientific, [2023] | Includes bibliographical references and index. Identifiers: LCCN 2022029972 | ISBN 9789811258572 (hardcover) | ISBN 9789811258589 (ebook for institutions) | ISBN 9789811258596 (ebook for individuals) Subjects: LCSH: Bioinformatics. | Amino acid sequence--Data processing. | Machine learning. | m Artificial intelligence--Biological applications. | Sequence alignment (Bioinformatics) co Classification: LCC QH324.25 .M33 2023 | DDC 570.285--dc23/eng/20220718 ntific. LC record available at https://lccn.loc.gov/2022029972 e ci s orld British Library Cataloguing-in-Publication Data w A catalogue record for this book is available from the British Library. w. w w m o ed fr Copyright © 2023 by World Scientific Publishing Co. Pte. Ltd. d a nlo All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, w electronic or mechanical, including photocopying, recording or any information storage and retrieval o D system now known or to be invented, without written permission from the publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. For any available supplementary material, please visit https://www.worldscientific.com/worldscibooks/10.1142/12899#t=suppl Desk Editor: Vanessa Quek Typeset by Stallion Press Email: enquiries@stallionpress.com Printed in Singapore Dedicated to my beloved Family, talented Magdalena and incredible Aleksander m o c c. ntifi e ci s d orl w w. w w m o d fr e d a o nl w o D TTThhhiiisss pppaaagggeee iiinnnttteeennntttiiiooonnnaaallllllyyy llleeefffttt bbblllaaannnkkk m o c c. ntifi e ci s d orl w w. w w m o d fr e d a o nl w o D 9"x6" b4730 Machine Learning in Bioinformatics of Protein Sequences © 2023 World Scientific Publishing Company https://doi.org/10.1142/9789811258589_fmatter Preface As of mid-2022, we have access to over 230 million unique protein m sequences, significant majority of which have unknown structure and func- o c c. tions. This number continues to grow rapidly and has more than tripled ntifi compared to just five years ago [1,2]. Structural biologists and bioinforma- e ci s ticians face an enormous challenge to structurally and functionally d orl characterize these hundreds of millions of sequences. Experimental w w. approaches to decipher protein structure and function do not keep up w w m with this rapid expansion of the protein sequence space, motivating the d fro development of fast computational tools that can help in filling the gap. e Machine learning (ML) plays a vital role in the development of these tools d a nlo since it provides a framework where the limited amount of experimentally w o solved data is used to design, develop and benchmark predictive algo- D rithms that are later used to make predictions for the millions of uncharacterized sequences. Hundreds of ML-based methods for the prediction of protein structure and function from the sequence were designed and released over the last few decades. They target a variety of structural aspects of proteins including prediction of tertiary structure, secondary structure, residue contacts, torsion angles, solvent accessibility, intrinsic disorder and flexibility. These methods also address numerous functional characteristics, such as prediction of binding to nucleic acids, proteins and lipids and identification of putative catalytic, cleavage and post-translational modification sites. Recent years have witnessed two significant advances in this area, the development of modern deep neural networks and the innovative representations of the protein sequences that that draw from the natural language processing (NLP) [3,4]. These advances have fueled the modern era in the vii viii Preface sequence-based prediction of protein structure and function, resulting in the release of accurate and impactful solutions, such as AlphaFold for the tertiary structure prediction [5], SignalP for the signal peptide prediction [6] and flDPnn for the disorder and disorder function prediction [7]. Besides having free and convenient access to these predictive resources, nowadays, users also benefit from useful resources including large databases of protein structure and function predictions, such as AlphaFold DB [8], MobiDB [9], and DescribePROT [10], and user-friendly platforms that ease the development of new predictive tools, such as iLearnPlus [11] and BioSeq-BLM [12]. This book aims to guide students and scientists working with proteins through the exciting and rapidly advancing world of modern ML tools and resources for the efficient and accurate prediction and characterization of functional and structural aspects of proteins. This edited volume includes m a mixture of introductory and advanced chapters written by well-published o c c. and accomplished experts. It covers a broad spectrum of predictions ntifi including tertiary structure, secondary structure, residue contacts, intrinsic e ci ds disorder, protein, peptide and nucleic acids-binding sites, hotspots, post- worl translational modification sites, and protein function. It spotlights w. cutting-edge topics, such as deep neural networks, NLP-based sequence w w m embedding, and prediction of residue contacts, tertiary structure, and d fro intrinsic disorder. Moreover, it introduces and discusses several practical e resources that include databases of predictions and software platforms for d a nlo the development of novel predictive tools, providing a holistic coverage of w o this vibrant area. D The book includes 13 chapters that are divided into four parts. The first part focuses on ML algorithms. It features Chapter 1 (lead author: Dr. Hongbin Shen) that introduces key types of deep neural networks, such as convolutional, recurrent, and transformer topologies. The authors describe different ways to encode protein sequences and overview applications of deep learners across several types of protein structure predictions including secondary structure, contact maps and tertiary structure. The second part addresses the inputs for the ML models and com- prises of four chapters. It focuses on recently developed NLP-based approaches with three chapters that provide background and detail different applications. Chapter 2 (lead author: Dr. Daisuke Kihara) provides a comprehensive introduction to the popular NLP-inspired sequence embedding approaches including Word2Vec, UDSMProt, UniRep, SeqVec (ELMo), ESM-1b and BERT. The authors cover relevant databases and 9"x6" b4730 Machine Learning in Bioinformatics of Protein Sequences Preface ix applications of these embeddings to the residue contact, secondary structure, and function predictions. Chapter 3 (lead author: Dr. Bin Liu) expands on this topic by discussing the applications of the NLP-based embedding to the prediction of protein folds, intrinsic disorder and protein-protein and protein-nucleic acids binding. It also introduces several popular ML-based predictors for these applications. Chapter 4 (lead author: Dr. Dukka KC) provides further details on several popular embedding methods and highlights their applications to the prediction of the post-translational modification sites and protein function. The key strength of this chapter is its comprehensive coverage of many ML algorithms in these two application areas. Finally, Chapter 5 (lead author: Dr. Shandar Ahmad) describes more traditional approaches to encoding protein sequences that include evolutionary profiles, biophysical properties and predicted structural features. It also sets these encodings in a practical con- m text of methods for the prediction of protein-protein and protein-nucleic o c c. acids interactions. ntifi The third part consists of six chapters that explain and describe e ci ds sequence-based predictors for specific protein structure and function char- worl acteristics. Starting with Chapter 6 (lead author: Dr. Liam J. McGuffin), it w. introduces the contact prediction area, which is arguably one of the main w w m innovations behind the success of the AlphaFold method. This chapter d fro defines residue contacts, reviews the history of this topic, outlines its impor- e tance and growing influence, and describes key methods for predicting d a nlo protein contacts and distance maps. The authors cover different ML algo- w o rithms, including hidden Markov models, support vector machines, D random forests, Bayesian methods, and modern end-to-end deep neural networks. Chapter 7 (lead author: Dr. Dong-Jun Yu) provides an in-depth treatment of the residue contacts prediction area, focusing on algorithmic details of selected recent ML-based predictors, while assuming basic knowl- edge of this area and ML algorithms. It also surveys popular predictors, particularly focusing on the deep learning-based models. Subsequently, Chapter 8 (lead author: Dr. Lukasz Kurgan) focuses on the intrinsic disorder prediction area. It provides historical perspective and introduces numerous useful resources including commonly used and accurate disorder and disorder function predictors. In particular, the authors focus on the deep network-based solutions, databases of disorder predictions, web- servers, and methods that provide quality assessment of disorder predictions. Chapter 9 (lead author: Dr. Yuedong Yang) defines protein-protein and protein-peptide interactions and introduces the corresponding prediction

Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics PDF

378 Pages·2022·44.045 MB·English

by Łukasz Kurgan

#Biology #Molecular: Bioinformatics

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics PDF Free - Full Version

by Łukasz Kurgan| 2022| 378 pages| 44.045| English

Download Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics by Łukasz Kurgan in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics

No description available for this book.

Detailed Information

Author:	Łukasz Kurgan
Publication Year:	2022
ISBN:	9789811258572
Pages:	378
Language:	English
File Size:	44.045
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics PDF?

Yes, on https://PDFdrive.to you can download Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics by Łukasz Kurgan completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics on my mobile device?

After downloading Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics?

Yes, this is the complete PDF version of Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics by Łukasz Kurgan. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.